Optical Character Recognition
Please refer to the Wiki Documentation for the complete Languages reference.
SimpleSoftware OCR engines are using two different systems for language support. In the end languages supported by your OCR is based on your version of SimpleIndex installed, any addons (SimpleIndex Server, SimpleCoversheet, and so on) do not add any additional language support.
All SimpleSoftware products have Tesseract 3.02 OCR languages support. You can learn more about it and download additional language libraries HERE. And you can check and add more OCR languages libraries supported with Tesseract on your station here:
C:\Program Files (x86)\SimpleIndex\Tesseract\v3.02\tessdata
SimpleIndex Pro and SimpleIndex OCR are using FineReader engine. It has one of the largest libraries of supported OCR languages. You can check OCR languages supported with FineReader on your station here:
C:\Program Files (x86)\SimpleIndex\OCRLanguages.txt
English New Zealand
English South Africa
English United Kingdom
English United States
German New Spelling
German New Spelling Law
German New Spelling Medical
Malay Brunei Darussalam
Russian Old Spelling
Spanish Costa Rica
Spanish Dominican Republic
Spanish El Salvador
Spanish Modern Sort
Spanish Puerto Rico
Spanish Traditional Sort
Please refer to the Wiki Documentation for the complete SimpleView reference.
SimpleQC is now SimpleView with many enhancements. In a nutshell it is designed to let you quickly browse folders containing multi-page TIFF or PDF documents. The two main uses for this are:
1 Review scanned documents for Quality Control
Occasionally a scanned document will be too light or too dark to be read. This can happen quite often with some types of paper. Use SimpleView to find these pages quickly and rescan them. You can also correct page order, rotation, skew, etc.
2 Use as a document viewer
SimpleIndex and other scanning applications create folders and files on your hard drive or network to store documents. Use SimpleView to quickly browse image thumbnails by folder and filename. Auto-rotate, enhance and OCR images as needed.
SimpleView is different from other thumbnail viewers because:
-It loads multi-page TIFF files very quickly
-It displays thumbnails for files as well as pages within multi-page files on the same screen
-It has many functions for document QC such as auto-selecting even and odd pages or files for rotation, rescan pages
-It displays thumbnails for PDF files and displays them in the Acrobat viewer
-With Acrobat Standard or Pro you can enable editing & signing of PDF files
-Viewing of office documents and electronic formats are also available
Yes. On the OCR step of the Job Settings Wizard you can select the text output format need in the “Full-page OCR file type” drop down. By default it is set to PDF, but can be changed to Text (txt), Word (docx), Rich Text (rtf), Open Office (odt), Excel (xlsx), PowerPoint (pptx), ePub Zip (epub), FictionBook (fb2), HTML (htm), XML (xml) or Alto XML (alto.xml).
If the output file type is set to PDF, OCR text will be embedded as hidden text in the PDF file.
Yes, it can. You can configure this setting in the Job Settings Wizard by going to the OCR step and checking “Enable full-page OCR”. There are many settings in the OCR step that you can used to customize the output and recognition of images.
SimpleIndex has two different OCR engines (Standard and Professional) that can be used to produced PDF Image + Text files or Searchable PDFs.