This demonstrates the PDF OCR text processing capabilities of SimpleIndex by extracting the Document Number, Date, Document Type, Customer and Total from a number of Estimates and Invoices.
All of this information is read automatically using the existing text layer of a computer generated PDF, such as those created using PDF printer drivers. Template and dictionary matching algorithms are used to locate and extract the correct data values from the text.
Since the existing text is being used, OCR is not performed. This makes processing much faster and 100% accurate. OCR can be used to get text from scanned PDF files with no existing text.
|How do you configure OCR to read index information from MS Office or PDF documents?|
|Can SimpleIndex read barcodes off of PDF files in a folder?|
|Can SimpleIndex create searchable PDF Image+Text files with hidden text?|
|How do you configure full text searching in Retrieval mode?|
|Can OCR text be saved to MS Word or HTML formats?|
|I'm using full page OCR. The information is all appearing in the txt file but it is losing format about half way through. Data to the right is ending up at the end of the txt doc. Can this be fixed?|
|Is it possible to search for and retrieve documents with Google desktop search?|
|How are Simple Software products licensed?|
|How can I improve recognition rates for my OCR fields?|
|What is the point of SimpleQC?|