Use the clipboard to OCR data captured from application images and index documents automatically.
This is used to changed the default OCR recognition font or type from the default, which is “To Be Detected”. This can be used to look for a specific type of OCR font and is especially useful for recognizing things like Dotmatrix, OCR A and OCR B.
Instructions for setting OCR Font:
1. Right click on the .sic file and select Open With a text editor (Notepad, Wordpad, etc.)
2. Find <OCR_TEXT_TYPE>. If you can’t find <OCR_TEXT_TYPE> then add the following as the last row in the text file:
3. Change the number in between: <OCR_TEXT_TYPE>#</OCR_TEXT_TYPE>
4. Number of desired font:
- 0 Normal
- 1 Typewriter
- 2 Dotmatrix
- 3 Index
- 5 OCR A
- 6 OCR B
- 7 MICR E13B
- 8 MICR CMC7
- 9 Gothic
- 10 To Be Detected
5. Close and save file
SimpleIndex uses the .NET regular expressions library.
For more information see the Regular Expressions Wiki Page.
Is there a way to just use part of a bar code or OCR value? For example, extract “50” from the value “124450”
To do this example, create a barcode field (Field 1 for example) and a 2nd field with type “Fixed”. In the template for the 2nd field, enter %FIELD1[5,2]% to get “50” from “124450”.
%FIELD1% would get the entire value for Field #1, the barcode field. By adding the [5,2] you tell SimpleIndex to start at the 5th character (5) and take 2 characters from the value (50).
There are several things you can do to improve accuracy for OCR.
- Scan at 300dpi, black & white for best results.
- Adjust the scan settings to remove background noise and improve the definition of characters.
- For Zone OCR, field recognition can often vary based on the surrounding white space and text in the zone. Try varying the size of the zone to achieve optimal results.
- For template matching, make sure all variations of the field format are included in the template list.
- For dictionary matching, add common variations and OCR mistakes to the “thesaurus” list.
- On the Zones & OCR tab (accessed from the Job Options) you can adjust the Max Errors setting to allow for more mistakes in the dictionary matching process.
- Use the Strip Spaces, Strip Characters, Replace Characters and Case Fixing options to standardize the field format prior to matching.
Please refer to the SimpleIndex Wiki for details on how to configure these options.
- SimpleIndex.com – Zone OCR
- SimpleIndex.com – Dynamic OCR
- SimpleOCR.com – OCR Guide
- SimpleIndex Wiki – OCR
- SimpleIndex Wiki – OCR Options
- SimpleIndex Wiki – Zone OCR
- SimpleIndex Wiki – Full Page OCR
- SimpleIndex Wiki – Zones & OCR Settings
- SimpleIndex Wiki – OCR to Field
- SimpleIndex Wiki – OCR Text View
- SimpleIndex Wiki – Template & Dictionary Matching OCR
- SimpleIndex Wiki – OMR and OCR Document Separation