Advanced OCR Guide
Advanced OCR Guide
Improve Accuracy & Automation with OCR Integration  
SimpleIndex can be used with your favorite OCR application, such as Nuance OmniPage or ABBYY FineReader Corporate, to provide superior OCR accuracy, Image+Hidden Text PDF output and a greater level of automation.  In this configuration, SimpleIndex is used to extract and verify index data, rename files and perform database interactions while a superior third party OCR application performs OCR in the background.

OmniPage Pro 15In this configuration, the OCR application is configured to monitor a "hot folder", performing OCR on any images scanned or copied to that location.  The processing steps for this configuration are:

  1. Images are scanned to a network folder using SimpleIndex or a digital copier
  2. Scanned images are OCR'd automatically by application using "hot folder" feature
  3. Images and OCR text are output as searchable PDF files to SimpleIndex Input folder
  4. SimpleIndex reads images and extracts index data from text
  5. User checks index data and makes any necessary corrections
  6. Images are exported as searchable PDF, TIFF or JPEG using index data for folder & filenames

There are several advantages to this approach versus performing all of these steps as part of a single SimpleIndex batch:

  • Greatly improved OCR accuracy and features
  • Searchable Image+Hidden Text PDF format with text lined up with corresponding image
    By default, SimpleIndex outputs hidden text without formatting
  • OCR processing may be performed on another computer
    Improve performance and limit user interruption
  • Interactive correction and training may be performed during OCR
    Correct only low-confidence characters and remember corrections to improve accuracy
  • Index extraction in SimpleIndex is instantaneous
    Operators don't need to wait for OCR
 
Performing Zone OCR with Third Party Applications  
When parsing existing text, OCR zone settings for X, Y, Width and Height represent the Starting Column, Starting Row, Column Width and Row Height respectively.  This defines a unique search area within the text.

Zone OCR functionality can be accomplished by defining zone templates within your OCR application and extracting only text areas required for indexing.  When this is done, each zone appears as a consecutive line in the OCR text.  The line numbers can then be used in the OCR zone coordinates to retrieve the corresponding value. 

For example, if you draw 3 zones for Account Number, Customer Name and Amount (each 1 line and appearing in this order on the page), the Account Number can be extracted with Y=1, Customer Name with Y=2 and Amount with Y=3.  In all cases the X and Width are set to 0 to capture the whole line, and Height is set to 1 to indicate a single line.

When using full page OCR data to extract specific fixed zones, it is best to allow for a few lines or columns of shifting due to skew and other variations between scans.  We recommend defining a region with a small but comfortable margin of error (2-4 lines/columns) and using the Template and Dictionary Matching features to capture the specific information you need.  Using Regular Expressions it is possible to define a template for virtually any type of data.

 
Other Tips for Third Party OCR  
Here are some helpful hints that will assist you in configuring third party OCR applications for use with SimpleIndex:
  • Be sure to use separate folders for your Input folder and OCR hot folder
  • When possible, insert document breaks and create multi-page files during scanning step
  • PDF OCR output is recommended, but TIFF Images + Text Files with identical names will work just as well
  • Uncheck Split multi-page input files to keep PDFs intact while processing
  • Check Split multi-page input files to extract images from PDFs and save as TIFF or JPEG
  • In above case, be sure the PDFXRES and PDFYRES settings in the SimpleIndex.ini file are set to the resolution used by your images to prevent distortion and incorrect page size/resolution properties
  • Scanning and indexing/correction will take place several minutes apart.  Your workflow should account for this to use operator time most efficiently.
 
Find Out More    
Product Information Index
Getting Started Guide
Sample Applications
Video Demos
Simple Software University
Frequently Asked Questions
Other Simple Software Products

Get a Web Demo    

Get a free online demo with a scanning specialist who can configure SimpleIndex on your computer remotely. Sign up now!

Download a Demo   
Fully functional 30-day demos are available for all Simple Software applications. Download Now!


Watch the Video!

 
Online Video Library  
Video Index
Training Videos
Zone OCR
Barcode Recognition
PDF OCR Text Parsing
How Many Clicks?

Applications by Industry  
See how SimpleIndex can be used in your business.
Health Care
Financial
Education
Legal
Mortgage
 
How to Buy    
Solutions start at just $500!  Buy SimpleIndex online or from an Authorized Dealer in your area. View the Price List.

Online Support Options    
Simple Software provides an interactive Frequently Asked Questions database and Live Support chat system, as well as free Training Videos.

QuickBooks Users    

SimpleQB lets you scan and view documents from QuickBooks and import transaction data from OCR, barcodes or a database.  More on SimpleQB.

Become an Authorized Dealer  
SimpleIndex is a great addition to any system integrator's product line.
Become an Authorized Dealer.


OCR - Dynamic OCR - OCR Indexing - Affordable Forms Processing - Automatic Data Capture - Document OCR - Invoice OCR - OCR Scanning - Zone OCR - OCR Form Processing - Optical Character Recognition - Document Imaging OCR