OCR: Difference between revisions

From Simple Wiki
No edit summary
No edit summary
 
(7 intermediate revisions by 2 users not shown)
Line 4: Line 4:


* [[Zone OCR]] read data in a specific location
* [[Zone OCR]] read data in a specific location
* [[Handprint Recognition]] using [[ICR]] technology
* [[Cloud OCR]] with Amazon AWS Textract
* [[Template]] matching to match unique patterns
* [[Template]] matching to match unique patterns
* [[Dictionary]] matching to match a list of possible values
* [[Dictionary]] matching to match a list of possible values
* [[OCR Options]] configuring OCR job settings
* [[OCR Options]] OCR job settings that apply to all fields
* [[File_Formats#Full_Page_OCR_Formats|File Formats]] that can be output by OCR
* [[File_Formats#Full_Page_OCR_Formats|File Formats]] that can be output by OCR
* [[Languages]] supported by OCR
* [[Languages]] supported by OCR
* [[FineReader]] versus [[Tesseract]] OCR engines
* [[FineReader]] versus [[Tesseract]] OCR engines
* [[Searchable PDF]] with [[MRC]] compression
* [[Searchable PDF]] with [[MRC]] compression
* [[OCR to Field]] for [[point and click OCR]] during [[verification]]


== OCR Overview ==
== OCR Overview ==
Line 23: Line 26:


When implementing OCR for document automation, carefully consider the data you are trying to recognize. Is the text legible?  Does it appear in a fixed location?  Does it conform to a unique pattern that won’t be found anywhere else on the page?  Is there a list available with all the possible values for this field?  Answer these questions, and you will know which OCR approach is best for your application.
When implementing OCR for document automation, carefully consider the data you are trying to recognize. Is the text legible?  Does it appear in a fixed location?  Does it conform to a unique pattern that won’t be found anywhere else on the page?  Is there a list available with all the possible values for this field?  Answer these questions, and you will know which OCR approach is best for your application.
== Licensing ==
The [[Tesseract]] OCR engine is included with all versions of SimpleIndex.
The [[FineReader]] OCR and [[ICR]] [[Handprint Recognition]] engine is included with the OCR add-on or Professional license.
[[Unattended Processing]] with OCR requires a Server license based on annual processing volume, in increments of 1 Million pages per year.
[[Cloud OCR]] requires an add-on license and an Amazon AWS account. While the SimpleIndex [[Cloud OCR]] license has no page limit, standard AWS Textract processing charges will apply.
== Creating OCR Configurations Training Video ==
Takes a look under the hood of the Zone OCR sample job to see how it is configured. Learn to draw OCR zones and create basic templates.
<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><youtube>edpKxcMipOI</youtube></div>
== Related Knowledge Base Articles ==
* [https://www.simpleindex.com/knowledge-base/how-can-i-improve-recognition-rates-for-my-ocr-fields/ How can I improve recognition rates for my OCR fields?]
* [https://www.simpleindex.com/knowledge-base/can-simpleindex-create-searchable-pdf-imagetext-files-with-hidden-text/ Can SimpleIndex create searchable PDF Image+Text files with hidden text?]

Latest revision as of 15:30, 5 April 2024

OCR is a key function of SimpleIndex, with a number of features and configuration options to consider.

OCR Features & Settings[edit | edit source]

OCR Overview[edit | edit source]

Zone OCR solutions traditionally require you to specify a region on the page where index information is found. This region is recognized and the result is inserted into an index field. The problem with traditional zone OCR is that if the region is moved slightly due to variations in scanning, the result could contain extra neighboring characters or cut off desired characters. This limits the usefulness of traditional zone OCR to documents where the index value is in the exact same place every time and has plenty of white space around it.

SimpleIndex’s OCR contains many advanced features to overcome the inherent limitations of zone OCR. This is done by providing template and dictionary matching for OCR fields. These features search the OCR results for a certain pattern or list of possible values and return only the matching data. This allows you to draw your OCR zones much larger than normal, ensuring that no matter how much the data shifts around it will always be contained within that region.

It is even possible to search the entire page and find key information that is not printed in any fixed location. For example, a doctor’s office may receive lab reports from many different labs. Each report is formatted differently, but each contains the patient’s name somewhere on it. Using the dictionary matching feature with a patient name list, SimpleIndex can identify the correct patient for each lab automatically.

For data that has no predictable location or format, point and click OCR can be used to capture the information by clicking or drawing a box around the text on the image.

When implementing OCR for document automation, carefully consider the data you are trying to recognize. Is the text legible? Does it appear in a fixed location? Does it conform to a unique pattern that won’t be found anywhere else on the page? Is there a list available with all the possible values for this field? Answer these questions, and you will know which OCR approach is best for your application.

Licensing[edit | edit source]

The Tesseract OCR engine is included with all versions of SimpleIndex.

The FineReader OCR and ICR Handprint Recognition engine is included with the OCR add-on or Professional license.

Unattended Processing with OCR requires a Server license based on annual processing volume, in increments of 1 Million pages per year.

Cloud OCR requires an add-on license and an Amazon AWS account. While the SimpleIndex Cloud OCR license has no page limit, standard AWS Textract processing charges will apply.

Creating OCR Configurations Training Video[edit | edit source]

Takes a look under the hood of the Zone OCR sample job to see how it is configured. Learn to draw OCR zones and create basic templates.

Related Knowledge Base Articles[edit | edit source]