Traditional Zone OCR
Zone OCR is used to read document indexes or tags from text on the page.
Zone OCR is a great way to automate the data entry associated with
scanning documents.
However, there are several limitations to TRADITIONAL zone OCR that must be overcome:
- Index information must be in the exact same place on every page
- Documents shift and skew during scanning, causing the zones to not line up
- If surrounding lines or text on the document are too close, they can encroach on the zone
SimpleIndex® Dynamic OCR
SimpleIndex overcomes these limitations by using Dynamic OCR technology to locate the
desired text even when it moves around on the page. Our simplified version of
Dynamic OCR works great for many types of documents at a fraction of the cost of other solutions.
- Index information can appear anywhere on any page
- Unwanted characters are automatically ignored
- Find unique patterns of letters and numbers using Template Matching (Social Security #, Date, etc.)
- Use Dictionary Matching to find a value from a list of possible values (Vendor Name, Document Type, etc.)
Dynamic OCR Examples
In the video we see how SimpleIndex approaches a typical Zone OCR
example. With SimpleIndex you can use large zones that give a
wide margin for error. Template and Dictionary matching are then
used to extract the 7-digit Account Number, 6-digit Order Number and
Company Name. SimpleIndex discards the surrounding text and keeps
the correct value.
Another common example is finding a unique identifier, for example a social security number,
that could appear anywhere on the page. Simply enter the template ###-##-#### and SimpleIndex
will search the full OCR text until it finds a match. Since only one social
security number is likely to appear on the page, a match on this pattern
is almost certainly the required value.
With dictionary matching, you can give SimpleIndex a list of possible
values and it will automatically search the zone or page for each possible value
until it finds a match.
Many dynamic forms processing applications can be implemented using
these simple algorithms. This makes SimpleIndex far more versatile than
other zone OCR solutions that require the index value to be in the exact
same location on every page. Yet SimpleIndex costs only a fraction of
the price!
SimpleIndex's dynamic forms processing can greatly speed up data entry
by eliminating a good percentage of indexing work. For many this can put
the labor cost of scanning within their reach.
Dynamic OCR can also be
applied to MS Office and
PDF files, creating a fully automated process for intelligently
indexing and reorganizing electronic documents.
Support for Regular Expressions

SimpleIndex OCR has a simple built-in template format, as well as support
for Regular Expressions. Regular Expressions (RegEx
for short) let you define complex search patterns to extract matching
values from the text. This greatly enhances the functionality of
the dynamic OCR in SimpleIndex, making it capable of finding
variable-length fields with no distinct pattern.
Regular Expressions
are a commonly used in text parsing applications. The Perl
programming language makes extensive use of RegEx, as do UNIX utilities
like "grep". Many programmers and IT personnel are already
familiar with RegEx and can create complex expressions without specific
training.
Click here for a reference guide to Regular Expressions
Version 7 Builds on SimpleIndex's Powerful Dynamic OCR
Version 7 includes the industry leading ABBYY FineReader
® OCR engine for dramatically improved OCR
accuracy and speed. Other OCR enhancements in version 7 include:
- Native support for PDF files without conversion
- Searchable PDF Image + Hidden Text output
- Interactive template builder and tester
- Improved OCR languge support
- OCR Zones can be created on any page of a multi-page file
- Pre-defined templates for data types such as dates, dollar amounts, etc.