From Simple Wiki

Back to Index Field Wizard

Template Control Screen[edit]

SimpleIndex Simple Setup Configuration Index Field Settings Template Control
Index Field Settings Template Control

The Template Control screen lets you create and test pattern matching templates used to extract data from OCR zones. Templates are also used in other fields to indicate pre-defined field values.

A list of valid templates for each field type is shown at the top. Select a template value and click Add to add it to the template.

You can type or use copy/paste to enter the Sample Text used to test pattern matching templates. Click the Test button to compare the template to the Sample Text. The first matching value will be displayed in the Result text box.

Regular Expressions[edit]

Check the Use Regular Expressions option to enable Regular Expressions (RegEx), which allow you to define much more complex pattern matching templates using a standardized description language. Regular Expressions are a widely used standard, similar to “grep” for those familiar with UNIX.

It is possible to mix templates, having some use Regular Expressions and others use the SimpleIndex template format. Simply precede any template with ^^^ to indicate that template is a regular expression.

This prefix will be added to the template automatically when the Use Regular Expressions box is checked.

More on Regular Expressions here.

SimpleIndex Template Format[edit]

SimpleIndex Templates represent a series of specific letter and number combinations that the field value must match.

The possible values for the template are:

  • *: any character
  • #: numbers only
  • A: letters only
  • X: any letter or number
  • ?: optional characters. When several ????’s are placed at the end of a template, SimpleIndex will accept any letters, numbers, or the characters ()-&%@, until a non-matching character is reached.
  • Other: character must match exactly
  • \: enter backslash before *, #, A, X, ? or \ to indicate an exact match for this character in the template instead of the variable value
  • |: use the pipe character to separate multiple search templates to allow searching of many variations on the field format

Some example templates are:

  • Invoice \#: #######: the phrase “Invoice #:” followed by a 7-digit invoice number
  • ###-##-####: social security number
  • ##/##/####|#/#/####|##/#/####|#/##/####: date with 4-digit year and 1 or 2-digit month and day
  • ABC**##??: Any letter, B and C only, any 2 characters, 2 numbers, 2 optional characters

Enter the template in the Sample Text box and click the Test button to see what match results from the sample text. There is some generic sample text provided that has examples of many common data elements like names, dates and numbers. To test the template with your own documents, copy and paste the OCR text into this window. The new sample text will be used next time you open the Template Editor.

There are also several built-in templates available to make it easy to find several common data elements:

  • %DATE% - find a date in any valid date format, including forms where the month is spelled out or abbreviated and 2 & 4 digit years.
  • %DATE2% - find a date with a 2-digit year.
  • %DATE4% - find a date with a 4-digit year.
  • %MONEY% - find an amount of money.
  • %PHONEUS% - find a US phone number in many common formats.
  • %SSN% - find a US social security number in ###-##-#### format.
  • %FIELD#% - get the template from another field value, usually an Autofill (5.9.15) field that lets you associate different templates with different documents, such as an invoice number template that is associated with a specific vendor name.

Strip Fixed Characters in Front of OCR Template[edit]

This option allows you to use text “markers” to determine the position of a field when there is no unique template or dictionary lookup possible. For instance, an invoice number may always follow the word “INVOICE” on certain documents. Checking this option will allow you to enter “INVOICE ####” as the OCR template, but only have the invoice number and not the word “INVOICE” show up in the field.

Strip Fixed Characters at End of OCR Template[edit]

Same as the previous option, but strips fixed characters from the end of the template instead of the beginning, in case the marker appears after the text you are trying to recognize.

Normalizing OCR Results for Template Matching[edit]

There are several options that help standardize the output from OCR to improve the effectiveness of template matching. OCR can often mis-read characters or vary the number of spaces between text in a form. To account for this use:

  • Spaces to Strip normalizes spacing. Removing all spaces, or converting whitespace to a single space, is the best way to avoid mismatches due to extra spaces and line breaks.
  • Strip Characters from Result is designed to remove "noise" characters often read from stray marks. Any invalid characters can be entered here and removed or replaced with another character automatically.
  • Case Fixing standardizes text to UPPERCASE, lowercase or Title Case to avoid template mismatches due to case-sensitivity.
  • Find and Replace lets you make specific replacements in the text to account for common OCR errors, or to normalize output to desired values.

Templates by Field Type[edit]

The template setting modifies the behavior of the field differently based on the Field Type. This section describes how the template is used for each.

Barcode, OCR and Template Fields[edit]

For Barcode, OCR, and Template field types, use either a SimpleIndex template or Regular Expressions.

Fixed field templates can also be used to build the template from other values, for example if you are using the Template Autofill feature.

Text, Numbers and Date Fields[edit]

For Text, Numbers, and Date field types, the Template represents a default value that will appear automatically as the field value, but may be changed by the user if necessary.

For Date field types, you may also enter a Template for automatic date formatting. Enter %MM/DD/YYYY% to format dates in Month/Day/Year format. Use %YYYY-MM-DD% to format for proper sort order in filenames. Any of date format masks used in Microsoft Office applications like Excel and Access may be used.

There is also a global date format option in the Advanced Settings screen that will reformat all date values in any field type, including Barcode and OCR.

These field types also accept the same constant values that Fixed fields use, such as %TODAY% for today's date. You should use these instead of a Fixed field if you want to allow the user to edit the calculated value. View the complete list.

For Retrieval configurations, you can enter <, >, <=, or >= in the Template for Date or Numbers fields to enable date or number range searches. To create a minimum and maximum search field, create 2 fields that are linked to the same database field and enter >= for the minimum value and <= for the maximum.

List Fields[edit]

For List field types, the database field or path to the text file containing the list values is entered in the List File/Field setting.

The Template field should be left blank.

Autofill Fields[edit]

For Autofill fields, the name of the corresponding database field for use in the lookup should be entered.

How to Use Autofill

Autonumber Fields[edit]

For Autonumber fields, you may enter any letter and number combination, as long as the last digit is numeric. The last number or numbers are used as the numeric value to increment, with the other characters remaining constant. It is recommended that you prefix the numeric value with enough 0’s to ensure all numbers are the same length and preserve their sort order.

How to Use Autonumbers

Fixed fields[edit]

For Fixed fields, the template represents a pre-set value that cannot be changed. There are several variables that may also be used to substitute a calculated value based on system settings, input file path, file properties, and other field values.

Detailed List of Fixed Field Options

OMR Fields[edit]

For OMR fields, enter the minimum number of black pixels in the zone for it to be considered "checked". Keep in mind that a typical 300dpi image will have 300x300 or 90,000 pixels per square inch.

Enter a negative number in the template to have the OMR region extracted to a separate image file whenever the threshold is met. This feature is useful for verifying and capturing signatures on documents. Use the Saved Region Filename setting to set the name for the extracted region files.

How to Use OMR