Create a full text index of your scanned documents and electronic files with SimpleIndex. Use full page OCR for scanned images or extract existing text from PDF files, MS Office documents, HTML and other text-based file formats. Save the extracted text to any SQL database to make them searchable in your custom applications, or use the built-in search function to find and view documents.
I’m using full page OCR. The information is all appearing in the txt file but it is losing format about half way through. Data to the right is ending up at the end of the txt doc. Can this be fixed?
SimpleIndex version 7 solves this problem with the incorporation of the FineReader OCR engine. Full text in PDFs will now flow with the formatting of the PDF.
Legacy Versions: SimpleIndex can also be used with other OCR applications and servers to improve accuracy, formatting and performance. Use the OCR applications to convert the scanned images to text or searchable PDF, and SimpleIndex can extract index values from the text and automatically sort and organize the files.
- Published in OCR
How do you configure full text searching in Retrieval mode?
On the Database tab there dropdown in the lower portion of the panel for Full Text OCR Field. Put the name of the field that will store the full-text data there. This must be configured both for Insert and Retrieval mode configurations. The database field needs to be sufficient length to store the entire text of your document. Of course, the Insert Mode configuration must have “Enable Full Page OCR” checked to generate full text data from images. Text from MS Office documents, PDF files and existing OCR text files can be used without setting this option. When designing your Retrieval Mode configuration, create a Text field to use for full text search queries. On the Database tab, set the corresponding “Database Field Name” to the full text database field. When searching on your full text field, SimpleIndex finds the text you enter no matter where it appears in the document. It is able to match partial words. It does not perform boolean or natural language search
- Published in Database & Retrieval, OCR
Can OCR text be saved to MS Word or HTML formats?
Yes. On the Zones & OCR tab of the Job Options, there is a dropdown list for “Full-page OCR file type”. By default it is set to TEXT, but can be changed to WORD, HTML or PDF.
If the output file type is set to PDF, OCR text will be embedded as hidden text in the PDF file.
Find out more about Optical Character Recognition on the SimpleOCR Guide.
- Published in Licensing & Installation, OCR
Can SimpleIndex create searchable PDF Image+Text files with hidden text?
If you enable full-page OCR and output to PDF, the full-page OCR text will be inserted as invisible text on each page.
With the addition of the FineReader Engine in version 7, SimpleIndex now creates PDF files with fully searchable text formatted to flow with the image of the document.
Find out more about Optical Character Recognition on the SimpleOCR Guide.
- Published in Export, OCR, Office PDF Text Processing