|
Traditionally, zone OCR solutions require you to specify a region on the
page where index information will be found. This region is recognized
and the result is inserted into an index field. The problem with
traditional zone OCR is that if the region is moved slightly due to
variations in scanning, the result could contain extra neighboring
characters or cut off desired characters. This limits the usefulness of
traditional zone OCR to documents where the index value is in the exact
same place every time and has plenty of white space around it.
SimpleIndex's OCR contains many advanced features to overcome the
inherent limitations of zone OCR. This is done by providing template and
dictionary matching for OCR fields. These features search the OCR
results for a certain pattern or list of possible values and return only
the matching data. This allows you to draw your OCR zones much larger
than normal, ensuring that no matter how much the data shifts around it
will always be contained within that region.
It is even possible to draw your zone around the entire page and find
key information that is not printed in any fixed location. For example,
a doctor’s office may receive lab reports from many different labs. Each
report is formatted differently, but each contains the patient’s name
somewhere on it. Using the dictionary matching feature with a patient
name list, SimpleIndex can identify the correct patient for each lab
automatically.
When implementing OCR for document automation, carefully consider the
data you are trying to recognize. Is the text legible? Does it appear in
a fixed location? Does it conform to a unique pattern that won’t be
found anywhere else on the page? Is there a list available with all the
possible values for this field? Answer these questions and you will know
which OCR approach is best for your application. |