High speed scanning and OCR software designed to automate document capture and indexing for businesses.
Change the Dictionary Separator Value
This is used to change the dictionary separator value when doing thesaurus matching from the default character of | to any character(s) that you want. This can be useful in cases where the values you would like in your list or dictionary might include the pipe character or “|” or “Shift Backslash”
This setting is also used as the delimiter when parsing multiple index field values from bar codes (e.g. field1|field2|field3).
Instructions for changing the dictionary separator value:
- Right click on the Job Configuration file that you would like to suppress the prompt on and select Open With>Notepad
- Search the XML settings text open in Notepad for this term:
<OCR_DICT_SEPARATOR> - Change the value in-between from “|” to any other single character that you want.
- For TAB separation use %TAB%

Change the OCR Font or Type
This is used to changed the default OCR recognition font or type from the default, which is “To Be Detected”. This can be used to look for a specific type of OCR font and is especially useful for recognizing things like Dotmatrix, OCR A and OCR B.
Instructions for setting OCR Font:
1. Right click on the .sic file and select Open With a text editor (Notepad, Wordpad, etc.)
2. Find <OCR_TEXT_TYPE>. If you can’t find <OCR_TEXT_TYPE> then add the following as the last row in the text file:
<OCR_TEXT_TYPE>#</OCR_TEXT_TYPE>
3. Change the number in between: <OCR_TEXT_TYPE>#</OCR_TEXT_TYPE>

4. Number of desired font:
- 0 Normal
- 1 Typewriter
- 2 Dotmatrix
- 3 Index
- 5 OCR A
- 6 OCR B
- 7 MICR E13B
- 8 MICR CMC7
- 9 Gothic
- 10 To Be Detected
5. Close and save file
I’m using full page OCR. The information is all appearing in the txt file but it is losing format about half way through. Data to the right is ending up at the end of the txt doc. Can this be fixed?
SimpleIndex version 7 solves this problem with the incorporation of the FineReader OCR engine. Full text in PDFs will now flow with the formatting of the PDF.
Legacy Versions: SimpleIndex can also be used with other OCR applications and servers to improve accuracy, formatting and performance. Use the OCR applications to convert the scanned images to text or searchable PDF, and SimpleIndex can extract index values from the text and automatically sort and organize the files.
- Published in OCR
How do you train the OCR engine for better accuracy?
Training has been removed with version 7 due to the addition of the ABBYY FineReader OCR engine.
- Published in OCR
How do you configure full text searching in Retrieval mode?
On the Database tab there dropdown in the lower portion of the panel for Full Text OCR Field. Put the name of the field that will store the full-text data there. This must be configured both for Insert and Retrieval mode configurations. The database field needs to be sufficient length to store the entire text of your document.
Of course, the Insert Mode configuration must have “Enable Full Page OCR” checked to generate full text data from images. Text from MS Office documents, PDF files and existing OCR text files can be used without setting this option.
When designing your Retrieval Mode configuration, create a Text field to use for full text search queries. On the Database tab, set the corresponding “Database Field Name” to the full text database field.
When searching on your full text field, SimpleIndex finds the text you enter no matter where it appears in the document. It is able to match partial words. It does not perform boolean or natural language searches. The text entered must match the document text exactly.
- Published in Database & Retrieval, OCR
How can I improve recognition rates for my OCR fields?
There are several things you can do to improve accuracy for OCR.
-Scan at 300dpi, black & white for best results.
-Adjust the scan settings to remove background noise and improve the definition of characters.
-For Zone OCR, field recognition can often vary based on the surrounding white space and text in the zone. Try varying the size of the zone to achieve optimal results.
-For template matching, make sure all variations of the field format are included in the template list.
-For dictionary matching, add common variations and OCR mistakes to the “thesaurus” list.
-On the Zones & OCR tab (accessed from the Job Options) you can adjust the Max Errors setting to allow for more mistakes in the dictionary matching process.
-Use the Strip Spaces, Strip Characters, Replace Characters and Case Fixing options to standardize the field format prior to matching.
Please refer to the manual for details on how to configure these options.
Find out more about Optical Character Recognition on the SimpleOCR Guide. You may also check out our Advanced OCR Guide to find out how to use third-party OCR applications with SimpleIndex.
- Published in OCR
Can OCR text be saved to MS Word or HTML formats?
Yes. On the Zones & OCR tab of the Job Options, there is a dropdown list for “Full-page OCR file type”. By default it is set to TEXT, but can be changed to WORD, HTML or PDF.
If the output file type is set to PDF, OCR text will be embedded as hidden text in the PDF file.
Find out more about Optical Character Recognition on the SimpleOCR Guide.
- Published in Licensing & Installation, OCR
Can SimpleIndex create searchable PDF Image+Text files with hidden text?
If you enable full-page OCR and output to PDF, the full-page OCR text will be inserted as invisible text on each page.
With the addition of the FineReader Engine in version 7, SimpleIndex now creates PDF files with fully searchable text formatted to flow with the image of the document.
Find out more about Optical Character Recognition on the SimpleOCR Guide.
- Published in Export, OCR, Office PDF Text Processing