OCR Form Processing
Capture data from scanned forms or PDFs with OCR and save it to CSV, XML or any SQL database. Automate PDF forms by capturing data from filled-in forms or filling in blank PDF forms from any data source.
Change the OCR Font or Type
I’m using full page OCR. The information is all appearing in the txt file but it is losing format about half way through. Data to the right is ending up at the end of the txt doc. Can this be fixed?
SimpleIndex version 7 solves this problem with the incorporation of the FineReader OCR engine. Full text in PDFs will now flow with the formatting of the PDF.
Legacy Versions: SimpleIndex can also be used with other OCR applications and servers to improve accuracy, formatting and performance. Use the OCR applications to convert the scanned images to text or searchable PDF, and SimpleIndex can extract index values from the text and automatically sort and organize the files.
- Published in OCR
If I have a form which is filled manually by hand, can SimpleIndex read the data from it?
No, SimpleIndex cannot read handwriting. You would have to type this information in manually.
Find out more about ICR (Handprint Recognition) software on the SimpleOCR ICR Guide.
- Published in OCR
How do you train the OCR engine for better accuracy?
Training has been removed with version 7 due to the addition of the ABBYY FineReader OCR engine.
- Published in OCR
How do you configure full text searching in Retrieval mode?
On the Database tab there dropdown in the lower portion of the panel for Full Text OCR Field. Put the name of the field that will store the full-text data there. This must be configured both for Insert and Retrieval mode configurations. The database field needs to be sufficient length to store the entire text of your document. Of course, the Insert Mode configuration must have “Enable Full Page OCR” checked to generate full text data from images. Text from MS Office documents, PDF files and existing OCR text files can be used without setting this option. When designing your Retrieval Mode configuration, create a Text field to use for full text search queries. On the Database tab, set the corresponding “Database Field Name” to the full text database field. When searching on your full text field, SimpleIndex finds the text you enter no matter where it appears in the document. It is able to match partial words. It does not perform boolean or natural language search
- Published in Database & Retrieval, OCR
How can I improve recognition rates for my OCR fields?
There are several things you can do to improve accuracy for OCR. -Scan at 300dpi, black & white for best results. -Adjust the scan settings to remove background noise and improve the definition of characters. -For Zone OCR, field recognition can often vary based on the surrounding white space and text in the zone. Try varying the size of the zone to achieve optimal results. -For template matching, make sure all variations of the field format are included in the template list. -For dictionary matching, add common variations and OCR mistakes to the “thesaurus” list. -On the Zones & OCR tab (accessed from the Job Options) you can adjust the Max Errors setting to allow for more mistakes in the dictionary matching process. -Use the Strip Spaces, Strip Characters, Replace Characters and Case Fixing options to standardize the field format prior to matching. Please refer to the manual for details on how to configure these options. Find out more about Optical Character Recog
- Published in OCR
Can OCR text be saved to MS Word or HTML formats?
Yes. On the Zones & OCR tab of the Job Options, there is a dropdown list for “Full-page OCR file type”. By default it is set to TEXT, but can be changed to WORD, HTML or PDF.
If the output file type is set to PDF, OCR text will be embedded as hidden text in the PDF file.
Find out more about Optical Character Recognition on the SimpleOCR Guide.
- Published in Licensing & Installation, OCR
Can SimpleIndex create searchable PDF Image+Text files with hidden text?
If you enable full-page OCR and output to PDF, the full-page OCR text will be inserted as invisible text on each page.
With the addition of the FineReader Engine in version 7, SimpleIndex now creates PDF files with fully searchable text formatted to flow with the image of the document.
Find out more about Optical Character Recognition on the SimpleOCR Guide.
- Published in Export, OCR, Office PDF Text Processing