|
Improve Accuracy & Automation with OCR Integration
|
|
SimpleIndex
can be used with your favorite OCR application, such as
Nuance OmniPage or
ABBYY
FineReader Corporate, to provide superior OCR accuracy, Image+Hidden
Text PDF output
and a greater level of automation. In this configuration,
SimpleIndex is used to extract and verify index data, rename files and
perform database interactions while a superior third party OCR
application performs OCR in the background.
In
this configuration, the OCR application is configured to monitor a "hot
folder", performing OCR on any images scanned or copied to that
location. The processing steps for this configuration are:
- Images are scanned to a network folder using SimpleIndex or a
digital copier
- Scanned images are OCR'd automatically by application using "hot
folder" feature
- Images and OCR text are output as searchable PDF files to
SimpleIndex Input folder
- SimpleIndex reads images and extracts index data from text
- User checks index data and makes any necessary corrections
- Images are exported as searchable PDF, TIFF or JPEG using index
data for folder & filenames
There are several advantages to this approach versus performing all
of these steps as part of a single SimpleIndex batch:
- Greatly improved OCR accuracy and features
- Searchable Image+Hidden Text PDF format with text lined up with
corresponding image
By default, SimpleIndex outputs hidden text without formatting
- OCR processing may be performed on another computer
Improve performance and limit user interruption
- Interactive correction and training may be performed during OCR
Correct only low-confidence characters and remember corrections
to improve accuracy
- Index extraction in SimpleIndex is instantaneous
Operators don't need to wait for OCR
|
|
|
Performing Zone OCR with Third Party Applications |
|
|
When parsing existing text, OCR zone settings for X, Y, Width and Height
represent the Starting Column, Starting Row, Column Width and Row Height
respectively. This defines a unique search area within the text.
Zone OCR functionality can be accomplished by defining zone templates
within your OCR application and extracting only text areas required for
indexing. When this is done, each zone appears as a consecutive
line in the OCR text. The line numbers can then be used in the OCR
zone coordinates to retrieve the corresponding value.
For example, if you draw 3 zones for Account Number, Customer Name
and Amount (each 1 line and appearing in this order on the page), the
Account Number can be extracted with Y=1, Customer Name with Y=2 and
Amount with Y=3. In all cases the X and Width are set to 0 to
capture the whole line, and Height is set to 1 to indicate a single
line.
When using full page OCR data to extract specific fixed zones, it is
best to allow for a few lines or columns of shifting due to skew and
other variations between scans. We recommend defining a region
with a small but comfortable margin of error (2-4 lines/columns) and
using the Template and Dictionary
Matching features to capture the specific information you need.
Using Regular Expressions
it is possible to define a template for virtually any type of data. |
|
|
Other Tips for Third Party OCR
|
|
Here are some helpful hints that will assist you in configuring third
party OCR applications for use with SimpleIndex:
- Be sure to use separate folders for your Input folder and OCR hot
folder
- When possible, insert document breaks and create multi-page files
during scanning step
- PDF OCR output is recommended, but TIFF Images + Text Files with
identical names will work just as well
- Uncheck Split multi-page input files to keep PDFs intact
while processing
- Check Split multi-page input files to extract images from
PDFs and save as TIFF or JPEG
- In above case, be sure the PDFXRES and PDFYRES settings in the
SimpleIndex.ini file are set to the resolution used by your images to
prevent distortion and incorrect page size/resolution properties
- Scanning and indexing/correction will take place several minutes
apart. Your workflow should account for this to use operator time
most efficiently.
|
|
|
|
Download a Demo
|
|
|
Fully functional 30-day demos are available for all Simple
Software applications. Click Here to access the
demo downloads. |
|
|
How to Buy
|
|
|
SimpleIndex is sold by our local and online
Authorized Dealers. Click Here to get more
details. |
|
|
Become an Authorized Dealer |
|
|
SimpleIndex
is a great addition to any system integrator's product line. Find out more! |
|
|
QuickBooks Users
|
|
|
SimpleIndex lets you scan and view customer
documents right from within QuickBooks!
Click Here to find out more. |
|
|