Index Field Wizard

From Simple Wiki
Revision as of 01:47, 13 January 2022 by Aaron (talk | contribs) (→‎Autonumber)

Back to Settings Wizard

SimpleIndex Simple Setup Wizard Configuration Process Indexing & File Naming
SimpleIndex Simple Setup Wizard Configuration Process Indexing & File Naming

Use the Add button to create a new index field, or select an existing field and click Edit to modify its settings.

Field Type[edit | edit source]

SimpleIndex Simple Setup Wizard Configuration Index Step Field Types
SimpleIndex Simple Setup Wizard Configuration Index Step Field Types

The field type determines which of the following screens will be displayed for advanced settings. Field types determine which data will be accepted by the field and which automation will be used to read the index value from documents.

Autofill[edit | edit source]

All fields of this type are automatically populated with values from your database once the Key Field has been matched.

The Template setting for this field must be set to the name of the corresponding field in your database.

For more information see the Autofill page.

Autonumber[edit | edit source]

Allows you to have a field with a numeric value that will increment automatically under certain conditions. The Template value for this field determines the seed number, which can be any combination of letters and numbers, as long as the last digit is numeric. Based on the value of the Autonumber Increment setting, the Autonumber can be set to increment every page, every blank page, every barcode, or at the end of each batch.

View the complete guide to using Autonumber fields.

Barcode[edit | edit source]

If a barcode is recognized, the value is inserted into this field. Use the Template setting to force the field to accept only barcodes that match the specified pattern. This also allows you to match multiple barcodes to their appropriate fields, and ignore barcodes that are not meant to be used as indexes. Use the Barcode tab to configure other barcode settings.

Date[edit | edit source]

Field is formatted as a date in YYYY-MM-DD format by default. This allows for use of dates in folders and filenames and for proper sorting. For more information see the Template and Date Formatting options. Valid dates from Fixed field templates can also be used.

Filename[edit | edit source]

Field is automatically populated with the original filename of the image from the Input folder. Does not include the input file path.

Fixed[edit | edit source]

Calculated value from the Template setting is used. There are many variables you can use to automatically insert, such as file property settings, all or part of the file and folder names, combinations of other field values, and system settings like the user ID, computer name, etc..

With a Fixed field the user cannot change the calculated value. Use a Text, Numbers or Date field to allow the user to modify a calculated value.

List[edit | edit source]

SimpleIndex Simple Setup Wizard Configuration Jobs Index Field List
SimpleIndex Simple Setup Wizard Configuration Jobs Index Field List

Possible index values are displayed in a drop-down list, allowing the user to select one or automatically fill in the field with matching records as they type. The list may be populated using either a text file or database.

To populate the list with a text file, create a file in Notepad that has a single entry on each line and enter the path to this text file in the List File/Field setting. If no text file is specified and you have a database configured, the list for this field is populated automatically with the values from the corresponding database field.

Numbers[edit | edit source]

Only numeric values are accepted. Valid numbers from Fixed field templates can also be used.

OCR[edit | edit source]

If an OCR value is recognized, it is inserted into this field. Use the Template setting with this field type to search the OCR region for the first string that matches the pattern. Use the List File/Field option to match OCR text against a list of possible values. Use the Zone OCR Settings tab to configure other OCR settings.

OMR[edit | edit source]

Use this type for check-box fields. Field is considered “checked” if the number of black pixels in the region is greater than the number entered in the Template setting.

OMR fields can also be used to extract a region from an image and save it to a separate file. Enter a negative number in the Template setting to save the region to a separate file if the number of black pixels is greater than the absolute value of this number.

Template[edit | edit source]

Forces the user to enter an index value that matches the pattern specified in the Template setting for this field. See Template for the formatting instructions.

Text[edit | edit source]

User may enter any text into the index field. Template setting is used as a default value. Fixed field templates can also be used to use a calculated value for the default.

Index Field[edit | edit source]

SimpleIndex Simple Setup Configuration Index Field Steps
SimpleIndex Simple Setup Configuration Index Field Steps

Enter the name or label to use to identify the field. File naming options can be selected here, but these options are more easily configured from the Index & File Naming screen so you can see how they interact with the other index fields.

For OCR and Barcode fields, the Text Matching Type option will be displayed. Select the desired option to display the corresponding Index Field Wizard page in the following step.

When you select Both, the template will be matched first and then the dictionary list is matched against the template match result. This can prevent false positives when dictionary terms can appear in other places on the document.

If a data source has been configured, the Database Mapping options will be displayed. Select the corresponding field in the database to use for data export.

Required[edit | edit source]

When this option is selected, the user will not be able to finalize a batch unless all images have been saved with a value for this field.

Folder[edit | edit source]

Adds this field to the File Naming Schema as a subfolder.

This option uses the index values to create subfolders in the Output folder.

If multiple folder fields are selected, nested subfolders are created in order from top to bottom.

Filename[edit | edit source]

Adds this field to the File Naming Schema as a filename part.

When this option is selected, the image files are exported using this index field value.

If multiple fields have this option checked, the filename will contain all the values in top to bottom order, separated by the Field Separator character.

Forward[edit | edit source]

This option “carries forward” the field value to subsequent images until a new saved value is encountered. Use this to index multi-page documents without having to re-type the index data for each page.

When unchecked, each page must be indexed individually.

When using coversheets created with SimpleCoversheet or another barcode application, the forward option will automatically apply the barcode values to all the pages between the coversheets.

Database Mapping[edit | edit source]

Use these settings to map the index field to a field in your database. Depending on the selected Database Mode, records will be added, modified or searched, and List fields will be populated with unique records from this field.

Database Field Name[edit | edit source]

Select the database fields that correspond to the fields you define under the Index tab. If there is no corresponding database field, then leave this value blank.

Editable[edit | edit source]

This option is only used in Update mode. For each field, select this option if you want to allow the user to edit the values in this field. Leave it unchecked if you want to use the existing values for reference or file naming only and not allow the user to modify its value.

Filter[edit | edit source]

This option allows you to define default search criteria for Retrieval and Update modes. Whenever the search screen is displayed, the value(s) entered here is displayed in the search criteria for that field. This makes it possible to add default filters to automatically search a certain subset of documents or make it easier to perform searches by partially filling search fields.

Zones & Advanced OCR Settings[edit | edit source]

SimpleIndex Simple Setup Configuration Index Field Zones & Advanced OCR Settings
SimpleIndex Simple Setup Configuration Index Field Zones & Advanced OCR Settings

Zones can be used for OCR, OMR and Barcode fields to define a region on the image that contains the field data. Zones can also be used to automatically zoom in on a region of the image when the field is selected.

Setting Zone Coordinates[edit | edit source]

Click the Set Zone button to set the zone coordinates for this field. This will show the Draw Field Zone window.

SimpleIndex Simple Setup Configuration Index Field Zones & Advanced OCR Settings Documentation
SimpleIndex Simple Setup Configuration Index Field Zones & Advanced OCR Settings Documentation

To set the zone, click the Open or Scan button to obtain a sample image. Click and drag the mouse to draw a box around the region you want to use for this field.

For multi-page files, use the Page buttons to change to the page you want or enter the page number in the box.

It is also possible to perform zone OCR on the last page of each document by entering a negative number for the Page on the wizard screen. Set to -1 to OCR the last page, or -2 for the next to last page, etc.

When finished, click Save to keep the new zone coordinates or Cancel to discard.

Text Source[edit | edit source]

Zone coordinates can indicate pixel coordinates in an image, or row and column numbers in a text file. Set the Text Source to Use Full Page Text to use existing text from PDF files, MS Office documents and full page OCR as the source text for this field.

You can also pick another field from the list to use that field's value as the source text for this field. This lets you capture a large block of text like an address block with Zone OCR, then setup fields for Name, Address, City, State, etc. that use the address block field as the Text Source.

Use the X, Y coordinates to indicate a row and column within the source text. Use Width and Height to indicate the number of columns and rows to capture. Entering all 0's will search the entire file.

Advanced OCR Field Settings[edit | edit source]

These settings let you format the OCR results prior to dictionary and template matching. This allows you to perform various text replacements, remove invalid characters, and standardize spacing and letter case.

Pages to Process[edit | edit source]

Using this option, you may limit the OCR to only certain pages within the batch. This option greatly speeds up the OCR process if you know the location of those pages in the batch that contain the index information you need. The options are:

  • Every Page – all pages are processed.
  • First Page Only – only the first page in the batch is processed.
  • Pages with Barcodes – only a page where a barcode is detected is processed. Use the Template and zone features to prevent detection of stray barcodes.
  • Pages After Barcode – use this option with separator sheets, like the ones created by SimpleCoversheet, where the first page of the document comes after a barcode separator sheet.
  • Pages After Blank – use this option with blank page separators to indicate the start of a new document on the following page.
  • Odd Pages – OCR only Odd numbered pages (1,3,5,etc.)
  • Even Pages – OCR only Even numbered pages (2,4,6,etc.)
  • Pages without Barcodes – only pages where a barcode is not detected are processed. Useful for capturing the same field value with OCR when a barcode is not present or unreadable.

Case Fixing[edit | edit source]

This option allows you to automatically case fix the OCR results, forcing the results to be all UPPER CASE, lower case, or Title Case (first letter of each word). If a Dictionary File is specified, the case used in that file will override this setting.

Strip spaces from result[edit | edit source]

This option strips any spaces from the OCR result. This is very useful when using template matching or dictionary lookups, because spaces are sometimes recognized by mistake, causing the match to not be found. The Spaces to Strip (5.12.11) option can be used to modify the behavior of this function to strip other classes of characters.

Strip Characters from Result[edit | edit source]

Enter a list of characters that you want to remove from OCR results prior to template and dictionary matching. You can also use this in place of templates by removing all unwanted characters from your OCR zone and leaving the results. This technique allows you to get a partial result when recognition mistakes take place, when templates or dictionaries will leave a blank field.

This setting can also be used with non-OCR fields to remove unwanted characters from barcodes, database fields, dates, etc.

Here are several helpful hints for using this setting:

  • Enter the values %LF% and %TAB% to remove line breaks and tab characters, since these cannot be typed.
  • There are several examples available in the drop-down menu with common lists of characters that can be selected automatically.
  • You can manually type or copy/paste values into this field.
  • A good technique to use is to copy and paste any extra characters that appear in that field during OCR until only valid characters remain.
  • Use Notepad to edit a long list of special characters or to save lists for later use.
  • Use the Character Map (in Start Menu/Accessories) to find special characters.
  • Enter %##% to replace a specific ASCII character with numeric value of ##. For example, %13% will remove line feeds.
  • Set the Replace Character option to replace stripped characters with another.

Replace Character[edit | edit source]

Enter a character or characters here that will replace those stripped using the Strip Characters From Result option. This allows you to replace common mistakes, such as I and 1 or O and 0, or substitute a space or dash for line feeds and other unwanted characters.

ASCII character codes may be entered in this field to allow special characters to be used for replacement. For example, the single space character can be entered as %32%, Line Feeds are %10% and Tabs are %9%. A full list of ASCII character codes can be found if you search the web for “ASCII Table”.

Character Substitution[edit | edit source]

This option allows you to define several specific 'find and replace' operations on images that will take place before template and dictionary matching. This is useful for correcting common OCR errors automatically, such as a "1" being recognized as "I". Substitutions can be single characters or whole words and phrases. It is also possible to replace unprintable characters such as tabs and line feeds by entering their ASCII character code (e.g. %10% for line feeds).

In previous versions, replacements were set globally and applied to all OCR fields. In version 8.1 the replacements are set on the field level so you can do different replacements for each field. For example, replacing all I’s with 1 is useful for a numeric field but not text.


Template Control[edit | edit source]

See Template.

Dictionary Matching[edit | edit source]

See Dictionary.