Advanced OCR Guide
Improve Accuracy & Automation with OCR Integration  
SimpleIndex can be used with your favorite OCR application, such as Nuance OmniPage or ABBYY FineReader Corporate, to provide superior OCR accuracy, Image+Hidden Text PDF output and a greater level of automation.  In this configuration, SimpleIndex is used to extract and verify index data, rename files and perform database interactions while a superior third party OCR application performs OCR in the background.

OmniPage Pro 15In this configuration, the OCR application is configured to monitor a "hot folder", performing OCR on any images scanned or copied to that location.  The processing steps for this configuration are:

  1. Images are scanned to a network folder using SimpleIndex or a digital copier
  2. Scanned images are OCR'd automatically by application using "hot folder" feature
  3. Images and OCR text are output as searchable PDF files to SimpleIndex Input folder
  4. SimpleIndex reads images and extracts index data from text
  5. User checks index data and makes any necessary corrections
  6. Images are exported as searchable PDF, TIFF or JPEG using index data for folder & filenames

There are several advantages to this approach versus performing all of these steps as part of a single SimpleIndex batch:

  • Greatly improved OCR accuracy and features
  • Searchable Image+Hidden Text PDF format with text lined up with corresponding image
    By default, SimpleIndex outputs hidden text without formatting
  • OCR processing may be performed on another computer
    Improve performance and limit user interruption
  • Interactive correction and training may be performed during OCR
    Correct only low-confidence characters and remember corrections to improve accuracy
  • Index extraction in SimpleIndex is instantaneous
    Operators don't need to wait for OCR
 
Performing Zone OCR with Third Party Applications  
When parsing existing text, OCR zone settings for X, Y, Width and Height represent the Starting Column, Starting Row, Column Width and Row Height respectively.  This defines a unique search area within the text.

Zone OCR functionality can be accomplished by defining zone templates within your OCR application and extracting only text areas required for indexing.  When this is done, each zone appears as a consecutive line in the OCR text.  The line numbers can then be used in the OCR zone coordinates to retrieve the corresponding value. 

For example, if you draw 3 zones for Account Number, Customer Name and Amount (each 1 line and appearing in this order on the page), the Account Number can be extracted with Y=1, Customer Name with Y=2 and Amount with Y=3.  In all cases the X and Width are set to 0 to capture the whole line, and Height is set to 1 to indicate a single line.

When using full page OCR data to extract specific fixed zones, it is best to allow for a few lines or columns of shifting due to skew and other variations between scans.  We recommend defining a region with a small but comfortable margin of error (2-4 lines/columns) and using the Template and Dictionary Matching features to capture the specific information you need.  Using Regular Expressions it is possible to define a template for virtually any type of data.

 
Other Tips for Third Party OCR  
Here are some helpful hints that will assist you in configuring third party OCR applications for use with SimpleIndex:
  • Be sure to use separate folders for your Input folder and OCR hot folder
  • When possible, insert document breaks and create multi-page files during scanning step
  • PDF OCR output is recommended, but TIFF Images + Text Files with identical names will work just as well
  • Uncheck Split multi-page input files to keep PDFs intact while processing
  • Check Split multi-page input files to extract images from PDFs and save as TIFF or JPEG
  • In above case, be sure the PDFXRES and PDFYRES settings in the SimpleIndex.ini file are set to the resolution used by your images to prevent distortion and incorrect page size/resolution properties
  • Scanning and indexing/correction will take place several minutes apart.  Your workflow should account for this to use operator time most efficiently.
 
Find Out More    
Product Information Index
Getting Started Guide
Sample Applications
Video Demos
Simple Software University
Frequently Asked Questions

Download a Demo    
Fully functional 30-day demos are available for all Simple Software applications. Click Here to access the demo downloads.

How to Buy    
SimpleIndex is sold by our local and online Authorized Dealers.  Click Here to get more details.

Become an Authorized Dealer  
SimpleIndex is a great addition to any system integrator's product line. Find out more!

Online Support Options    
Simple Software provides an interactive Frequently Asked Questions database and Live Support chat system, as well as free Training Videos.

QuickBooks Users    
SimpleIndex lets you scan and view customer documents right from within QuickBooks!  Click Here to find out more.


Affordable Forms Processing - Automatic Data Capture - Barcode Recognition - Batch Scanning - Bates Stamping - Distributed Document Capture - Document Imaging - Document Indexing - Inexpensive Document Management - OCR - PDF Conversion - Scanning Software - Zone OCR