Zones & OCR Settings

From Simple Wiki
(Redirected from Strip spaces from result)

Back to Index Field Wizard

SimpleIndex Simple Setup Configuration Index Field Zones & Advanced OCR Settings
Zones & OCR Settings

Zones can be used for OCR, OMR and Barcode fields to define a region on the image that contains the field data. You can also Use Full Page Text to capture zones as row and column regions within existing text.

Zoom Locking lets you automatically zoom in on a region of the image when the field is selected and can be used with any Field Type.

Setting Zone Coordinates[edit | edit source]

Click the Set Zone button to set the zone coordinates for this field. This will show the Draw Field Zone window.

You can also set or update field coordinates during batch processing using the Mouse Action command. This is helpful when you need to adjust coordinates based on live samples.

SimpleIndex Simple Setup Configuration Index Field Zones & Advanced OCR Settings Documentation
Setting OCR Zone Coordinates

To set the zone, click the Open or Scan button to obtain a sample image. Click and drag the mouse to draw a box around the region you want to use for this field.

For multi-page files, use the Page buttons to change to the page you want or enter the page number in the box.

It is also possible to perform zone OCR on the last page of each document by entering a negative number for the Page on the wizard screen. Set to -1 to OCR the last page, or -2 for the next to last page, etc.

When finished, click Save to keep the new zone coordinates or Cancel to discard.

Text Source[edit | edit source]

Zone coordinates can indicate pixel coordinates in an image, or row and column numbers in a text file.

Set the Text Source to Use Full Page Text to use existing text from PDF files, MS Office documents, or full page OCR as the source text for this field. Images that have text files with the same filename when imported will also be treated as source text. More info and related settings can be found under Use Full Page Text.

You can also pick another field from the list to use that field's value as the source text for this field. This lets you capture a large block of text like an address block with Zone OCR, then setup fields for Name, Address, City, State, etc. that use the address block field as the Text Source. This field needs to be positioned above the current field since they read from top to bottom.

Use the X, Y coordinates to indicate a row and column within the source text. Use Width and Height to indicate the number of columns and rows to capture. Entering all 0's will search the entire file.

Advanced OCR Field Settings[edit | edit source]

These settings let you format the OCR results prior to dictionary and template matching. This allows you to perform various text replacements, remove invalid characters, and standardize spacing and letter case.

Pages to Process[edit | edit source]

Using this option, you may limit the OCR to only certain pages within the batch. This option greatly speeds up the OCR process if you know the location of those pages in the batch that contain the index information you need. The options are:

  • Every Page – all pages are processed.
  • First Page Only – only the first page in the batch is processed.
  • Pages with Barcodes – only a page where a barcode is detected is processed. Use the Template and zone features to prevent detection of stray barcodes.
  • Pages After Barcode – use this option with separator sheets, like the ones created by SimpleCoversheet, where the first page of the document comes after a barcode separator sheet.
  • Pages After Blank – use this option with blank page separators to indicate the start of a new document on the following page.
  • Odd Pages – OCR only Odd numbered pages (1,3,5,etc.)
  • Even Pages – OCR only Even numbered pages (2,4,6,etc.)
  • Pages without Barcodes – only pages where a barcode is not detected are processed. Useful for capturing the same field value with OCR when a barcode is not present or unreadable.

Case Fixing[edit | edit source]

Automatically case fix the OCR results, forcing the value to be all UPPER CASE, lower case, or Title Case (first letter of each word). If a Dictionary File is specified, the case used in that file will override this setting.

Strip spaces from result[edit | edit source]

This option strips any spaces from the OCR result. This is very useful when using template matching or dictionary lookups, because spaces are sometimes recognized by mistake, causing the match to not be found.

The Spaces to Strip option in the OCR Options can be used to modify the behavior of this function to strip other classes of characters.

Strip Characters from Result[edit | edit source]

Enter a list of characters that you want to remove from OCR results prior to template and dictionary matching. You can also use this in place of templates by removing all unwanted characters from your OCR zone and leaving the results. This technique allows you to get a partial result when recognition mistakes take place, when templates or dictionaries will leave a blank field.

This setting can also be used with non-OCR fields to remove unwanted characters from barcodes, autofill fields, dates, etc.

Here are several helpful hints for using this setting:

  • Enter the values %LF% and %TAB% to remove line breaks and tab characters, since these cannot be typed.
  • There are several examples available in the drop-down menu with common lists of characters that can be selected automatically.
  • You can manually type or copy/paste values into this field.
  • A good technique to use is to copy and paste any extra characters that appear in that field during OCR until only valid characters remain.
  • Use Notepad to edit a long list of special characters or to save lists for later use.
  • Use the Character Map (in Start Menu/Accessories) to find special characters.
  • Enter %##% to replace a specific ASCII character with numeric value of ##. For example, %13% will remove line feeds.
  • Set the Replace Character option to replace stripped characters with another.

Replace Multiple Characters[edit | edit source]

Enter a character or characters here that will replace those stripped using the Strip Characters From Result option. This allows you to replace common mistakes, such as I and 1 or O and 0, or substitute a space or dash for line feeds and other unwanted characters.

ASCII character codes may be entered in this field to allow special characters to be used for replacement. For example, the single space character can be entered as %32%, Line Feeds are %10% and Tabs are %9%. A full list of ASCII character codes can be found if you search the web for “ASCII Table”.

Common OCR Errors[edit | edit source]

Common OCR Errors

Add common character replacements to the list. Select the type of data and click Add.

Find and Replace Text[edit | edit source]

This option allows you to define several specific 'find and replace' operations on recognized text will take place before template and dictionary matching.

This is useful for correcting common OCR errors automatically, such as a "1" being recognized as "I". Substitutions can be single characters or whole words and phrases.

It is also possible to replace non-printable characters such as tabs and line feeds by entering their ASCII character code (e.g. %10% for line feeds).

Double-click an entry to edit the values.

Related Knowledge Base Articles[edit | edit source]