Some documents are difficult or impossible to automate with OCR. For example, documents with non-standard layouts, unconstrained handwriting or very poor scan quality. In applications like invoice processing, fully automating the data entry can require expensive software and weeks of consulting. Even after all that expense, many users miss the interface and data validations that their accounting software entry screens provide.
In cases like this, SimpleIndex can help improve data entry efficiency while archiving your scanned originals at the same time. Here’s how it works:
- Scan a batch of documents for data entry
- Place the SimpleIndex window side-by-side with your data entry window
- Enter the data normally, reading from the scanned image in SimpleIndex
- Press the hotkey combo to transfer the data to SimpleIndex
- Save the image and repeat with the next one
In this configuration, SimpleIndex captures an image of the data entry window, then uses OCR to read the data and index the image. Since the data entry screen has a consistent layout and clear, readable fonts, it can be reliably recognized with OCR.
There are several advantages to this approach:
- Configuration and training takes hours not weeks
- Scanned images are indexed with no extra work
- All the advantages of digital docs–security, searching, sharing, etc.
- Use all the data validation features of your software
- No flipping through paper documents
- Operator keeps eyes on the screen and hands on the keyboard
- Data entry can be done remotely
- Data entry performance improves and files are archived at the same time
The Syntax or Type of Regular Expression/RegEx that SimpleIndex uses is .NET
Is there a way to just use part of a bar code or OCR value? For example, extract “50” from the value “124450”
To do this example, create a barcode field (Field 1 for example) and a 2nd field with type “Fixed”. In the template for the 2nd field, enter %FIELD1[5,2]% to get “50” from “124450”.
%FIELD1% would get the entire value for Field #1, the barcode field. By adding the [5,2] you tell SimpleIndex to start at the 5th character (5) and take 2 characters from the value (50).
Training has been removed with version 7 due to the addition of the ABBYY FineReader OCR engine.
There are several things you can do to improve accuracy for OCR. -Scan at 300dpi, black & white for best results. -Adjust the scan settings to remove background noise and improve the definition of characters. -For Zone OCR, field recognition can often vary based on the surrounding white space and text in the zone. Try varying the size of the zone to achieve optimal results. -For template matching, make sure all variations of the field format are included in the template list. -For dictionary matching, add common variations and OCR mistakes to the “thesaurus” list. -On the Zones & OCR tab (accessed from the Job Options) you can adjust the Max Errors setting to allow for more mistakes in the dictionary matching process. -Use the Strip Spaces, Strip Characters, Replace Characters and Case Fixing options to standardize the field format prior to matching. Please refer to the manual for details on how to configure these options. Find out more about Optical Character Recog