SIGN IN YOUR ACCOUNT TO HAVE ACCESS TO DIFFERENT FEATURES

Login with Google
CREATE AN ACCOUNT FORGOT YOUR PASSWORD?

FORGOT YOUR DETAILS?

AAH, WAIT, I REMEMBER NOW!

CREATE ACCOUNT

ALREADY HAVE AN ACCOUNT?

Login with Google

QUESTIONS? CALL: 865-637-8986
  • SIGN UP
  • LOGIN

SimpleIndex

  • LEARN MORE
    • GENERAL INFO
      • Getting Started
      • How To Scan Documents
      • Barcode Scanning Guide
      • Searching & Viewing
      • News & Updates
      • Schedule a Web Demo
    • FEATURES
      • Streamlined Interface
      • TWAIN and ISIS Scanning
      • Zone OCR and Dynamic OCR
      • Database Integration
      • Required Documents Check
      • Automated Processing & 1-Click Interface
      • SharePoint Document Scanning
    • –
      • Document Classification
      • PDF & MS Office Text Parsing
      • Barcode Recognition
      • Optical Mark Recognition
      • Match Documents to Existing Data
      • Imprinting & Watermarking
      • Screenshot OCR
  • SOLUTIONS
    • General
      • All-In-One Scanning & Sorting Tool
      • Affordable Document Management
      • Instant Integration
      • Network Scanners & Copiers
      • Remote Document Capture
      • Reduce Click Charges for Data Capture
    • Specific
      • Sales Tax Exemption Forms
      • Federal Tax Returns
      • Invoice Processing
      • Material Safety Data Sheets (MSDS)
      • Patent ID and Title Extraction
      • Mortgage & Loan Documents
    • Feature Demos
      • Zone OCR with Template Matching
      • Full-Page OCR & Multi-User Workflow
      • PDF Text Processing
      • Organize Office Documents
      • Integration with RPA Bots
      • Compare with Other Solutions
  • SUITE
    • SimpleCoversheet – Print Bar Codes
    • SimpleExport – Data File Converter
    • SimpleView – Search, View & Edit
    • SimpleQB – QuickBooks Integrator
    • SimpleOCR – Freeware OCR
    • Buy Suite Apps
    • Buy Suite Bundles
  • DOWNLOAD
  • SHOP
    • COMPARE VERSIONS
    • SIMPLEINDEX WORKSTATION
      • Machine License
      • Concurrent User
      • Subscription License
    • SIMPLEINDEX SERVER
    • SUITE APPLICATIONS
    • SUITE BUNDLES
    • MAINTENANCE & RENEWALS
    • FIND A DEALER
      • Dealer Locator
      • Become a Dealer
    • CONTACT SALES
  • SUPPORT
    • WIKI HELP
    • KNOWLEDGE BASE
    • SIMPLEINDEX UNIVERSITY
      • SimpleIndex University – 100 Series
      • SimpleIndex University – 200 Series
      • SimpleIndex University – 300 Series
    • PRIVACY POLICY
    • CONTACT SUPPORT
  • My Account
    • Downloads
  • MY CART
    No products in cart.
  • Home
  • Page

Zone OCR is used to read document indexes or tags from text on the page. It is a great way to automate the data entry associated with scanning documents.

However, there are several limitations to zone OCR that must be overcome:

  • Index information must be in the exact same place on every page
  • Documents shift and skew during scanning, causing the zones to not line up
  • If surrounding lines or text on the document are too close, they can encroach on the zone

Zone OCR and Dynamic OCR

Monday, 07 November 2022 by Simple Software

Many document scanning solutions use Zone OCR to obtain index data from the page.

SimpleIndex improves upon this time-tested but ultimately limited model with its Dynamic OCR feature.

Let’s look at the difference between the two methods:

Zone OCR

Zone OCR is used to read document indexes or tags from text on the page. It is a great way to automate the data entry associated with scanning documents.

However, there are several limitations to zone OCR that must be overcome:

  • Index information must be in the exact same place on every page
  • Documents shift and skew during scanning, causing the zones to not line up
  • If surrounding lines or text on the document are too close, they can encroach on the zone

Dynamic OCR

SimpleIndex overcomes these limitations by using Dynamic OCR technology to locate the desired text even when it moves around on the page. Our simplified version of Dynamic OCR works great for many types of documents at a fraction of the cost of other solutions.

  • Index information can appear anywhere on any page
  • Unwanted characters are automatically ignored
  • Find unique patterns of letters and numbers using Template Matching
    (Social Security #, Date, etc.)
  • Use Dictionary Matching to find a value from a list of possible values
    (Vendor Name, Document Type, etc.)

Download document scanning and OCR software.

Dynamic OCR Examples

In the video we see how SimpleIndex approaches a typical Zone OCR example. With SimpleIndex you can use large zones that give a wide margin for error. Template and Dictionary matching are then used to extract the 7-digit Account Number, 6-digit Order Number and Company Name. SimpleIndex discards the surrounding text and keeps the correct value.

Another common example is finding a unique identifier, for example a social security number, that could appear anywhere on the page. Simply enter the template ###-##-#### and SimpleIndex will search the full OCR text until it finds a match. Since only one social security number is likely to appear on the page, a match on this pattern is almost certainly the required value.

With dictionary matching, you can give SimpleIndex a list of possible values and it will automatically search the zone or page for each possible value until it finds a match.

Many dynamic forms processing applications can be implemented using these simple algorithms. This makes SimpleIndex far more versatile than other zone OCR solutions that require the index value to be in the exact same location on every page. Yet SimpleIndex costs only a fraction of the price!

SimpleIndex‘s dynamic forms processing can greatly speed up data entry by eliminating a good percentage of indexing work. For many this can put the labor cost of scanning within their reach.

MS Office Document OCR Text Parsing Video

Dynamic OCR can also be applied to MS Office and PDF files, creating a fully automated process for intelligently indexing and reorganizing electronic documents.

Amazon AWS Textract Cloud OCR Batch Processing

Amazon AWS Textract Cloud OCR

With Textract you can capture data from almost any type of form, including handwritten ones! Textract identifies labeled text anywhere on the document and returns the label text along with the corresponding value. Map the labels to index fields in SimpleIndex and you are ready to capture that data no matter where it appears on the page.

Textract uses machine learning with a huge model based on the billions of pages processed using Textract to provide the most accurate OCR and form field extraction solution available.

By default, Textract is only available as an API and requires custom coding to integrate it into your document workflows. SimpleIndex turns it into a fully-featured document batch document and data processing app that is ready to use out-of-the-box.

Since there are no templates to configure or train, setup can be done in hours instead of days or weeks months required by other enterprise data capture solutions.

Pay-as-you-go pricing makes SimpleIndex with Textract the most affordable way to batch process forms for projects with less than 50,000 pages per year to process, especially if you need to read handwriting or have forms with many layout variations.

Wiki: How to configure AWS Textract OCR in SimpleIndex

Support for Regular Expressions

Use Regular Expressions to extract index data from OCR text, PDF and Office documents.

SimpleIndex OCR has a simple built-in template format, as well as support for Regular Expressions. Regular Expressions (RegEx for short) let you define complex search patterns to extract matching values from the text.  This greatly enhances the functionality of the dynamic OCR in SimpleIndex, making it capable of finding variable-length fields with no distinct pattern.

Regular Expressions are a commonly used in text parsing applications. The Perl programming language makes extensive use of RegEx, as do UNIX utilities like “grep”. Many programmers and IT personnel are already familiar with RegEx and can create complex expressions without specific training.

Click here for a reference guide to Regular Expressions

Download document scanning and OCR software.

New OCR Features in Version 10

SimpleIndex 10 includes major upgrades to the OCR and Bar Code engines 

  • Amazon Textract Cloud OCR option added, with settings for Text, Forms and Invoice & Receipt extraction.
  • FineReader Engine has been upgraded to version 11. Offers improved accuracy and speed when processing large documents.
  • Full-page OCR to Word (docx), Rich Text (rtf), Open Office (odt), Excel (xlsx), PowerPoint (pptx), ePub Zip (epub), FictionBook (fb2), HTML (htm), XML (xml), Alto XML (alto.xml).
  • MRC Compression for PDF files (Mixed Raster Content).
  • OCR language pack includes all available Tesseract languages including Hindi, Tamil, Arabic, Chinese, Thai, Vietnamese, Japanese, Korean, Indonesian, Hebrew and many more.

How to Configure SimpleIndex OCR

Our Wiki help has extensive information on how to configure OCR for various document and data capture scenarios.

  • Zone OCR read data in a specific location
  • Template matching to match unique patterns
  • Dictionary matching to match a list of possible values
  • OCR Options OCR job settings that apply to all fields
  • File Formats that can be output by OCR
  • Languages supported by OCR
  • FineReader versus Tesseract OCR engines
  • Searchable PDF with MRC compression
  • OCR to Field for point and click OCR during verification
  • Cloud OCR using Textract

Watch this Simple Software University training video to see how to configure and run an OCR job with SimpleIndex.

Download document scanning and OCR software.

 

KB Articles for Optical Character Recognition (OCR)

  • Language Pack for Standard/Tesseract OCR
  • Languages Supported in SimpleSoftware OCR Engines
  • What is Document Imaging?
  • Change the Dictionary Separator Value
  • Change the OCR Font or Type
  • Regular Expression (RegEx) - Syntax or Type
  • Autonumber Increment Value
  • I'm using full page OCR. The information is all appearing in the txt file but it is losing format about half way through. Data to the right is ending up at the end of the txt doc. Can this be fixed?
  • Is there a way to just use part of a bar code or OCR value? For example, extract "50" from the value "124450"
  • If I have a form which is filled manually by hand, can SimpleIndex read the data from it?
Automatic Data Capture, Batch Scanning, Document Classification, Document Imaging, File Indexing, Invoice OCR, OCR, Office PDF Text Processing, Optical Character Recognition, RegEx, Screenshot OCR, Search, Text Processing, Watermark PDF Files, Workflow Software, Zone OCR
Automatic Data CaptureBatch ScanningDocument ClassificationDocument ImagingFile IndexingInvoice OCROCROffice PDF Text ProcessingOptical Character RecognitionRegExScreenshot OCRSearchText ProcessingWatermark PDF FilesWorkflow SoftwareZone OCR
Read more
No Comments

Language Pack for Standard/Tesseract OCR

Monday, 01 November 2021 by Alex Stewart

All versions of the SimpleIndex software include OCR with the Standard/Tesseract OCR engine. The SimpleIndex download only includes a limited set of languages with the installation. If the language you would like to OCR with SimpleIndex isn’t one of the languages included then you can download your required language(s). Once you do this you will be able to pick the language that you want to read with the Standard/Tesseract OCR engine.

  1. Go to the Tesseract Language Download Site
  2. Select the language you want and download or download all the language
  3. Copy the language files (unzip if downloading more than one language) to this folder: C:\Program Files (x86)\SimpleIndex\Tesseract\v3.04\tessdata
  4. Close and Reopen SimpleIndex and the downloaded languages will now be selectable
Invoice OCROCROCR Form ProcessingOCR ScanningServer OCRZone OCR
Read more
No Comments

Languages Supported in SimpleSoftware OCR Engines

Monday, 02 December 2019 by Simple Software

SimpleSoftware OCR engines are using two different systems for language support. In the end languages supported by your OCR is based on your version of SimpleIndex installed, any addons (SimpleIndex Server, SimpleCoversheet, and so on) do not add any additional language support.

All SimpleSoftware products have Tesseract 3.02 OCR languages support. You can learn more about it and download additional language libraries HERE. And you can check and add more OCR languages libraries supported with Tesseract on your station here:

C:\Program Files (x86)\SimpleIndex\Tesseract\v3.02\tessdata

SimpleIndex Pro and SimpleIndex OCR are using FineReader engine. It has one of the largest libraries of supported OCR languages. You can check OCR languages supported with FineReader on your station here:

C:\Program Files (x86)\SimpleIndex\OCRLanguages.txt

Abkhaz
Adyghe
Afrikaans
Agul
Albanian
Altaic
Armenian Eastern
Armenian Grabar
Armenian Western
Awar
Aymara
Azeri Cyrillic
Azeri Latin
Bashkir
Basque
Belarusian
Bemba
Blackfoot
Breton
Bugotu
Bulgarian
Buryat
Catalan
Chamorro
Chechen
Chukcha
Chuvash
Corsican
Crimean Tatar
Croatian
Crow
Czech
Danish
Dargwa
Dungan
Dutch Belgian
Dutch Standard
English
English Australian
English Belize
English Canadian
English Caribbean
English Ireland
English Jamaica
English Law
English Medical
English New Zealand
English Philippines
English South Africa
English Trinidad
English United Kingdom
English United States
English Zimbabwe
Eskimo Cyrillic
Eskimo Latin
Esperanto
Estonian
Even
Evenki
Faeroese
Fijian
Finnish
French
French Belgian
French Canadian
French Luxembourg
French Monaco
French Standard
French Swiss
Frisian
Friulian
Gaelic Scottish
Gagauz
Galician
Ganda
German
German Austrian
German Law
German Liechtenstein
German Luxembourg
German Medical
German New Spelling
German New Spelling Law
German New Spelling Medical
German Standard
German Swiss
Greek
Guarani
Hani
Hausa
Hawaiian
Hungarian
Icelandic
Ido
Indonesian
Ingush
Interlingua
Irish
Italian
Italian Standard
Italian Swiss
Kabardian
Kalmyk
Karachay Balkar
Karakalpak
Kasub
Kawa
Kazakh
Khakas
Khanty
Kikuyu
Kirgiz
Kongo
Koryak
Kpelle
Kumyk
Kurdish

Lak
Lappish
Latin
Latvian
Latvian Gothic
Lezgin
Lithuanian
Lithuanian Classic
Luba
Macedonian
Malagasy
Malay Brunei Darussalam
Malay Malaysian
Malinke
Maltese
Mansi
Maori
Mari
Maya
Miao
Minankabaw
Mohawk
Mongol
Mordvin
Nahuatl
Nenets
Nivkh
Nogay
Norwegian Bokmal
Norwegian Nynorsk
Null
Nyanja
Occidental
Ojibway
Old English
Old French
Old German
Old Italian
Old Spanish
Ossetic
Papiamento
Pidgin English
Polish
Portuguese Brazilian
Portuguese Standard
Provencal
Quechua
Rhaeto Romanic
Romanian
Romanian Moldavia
Romany
Ruanda
Rundi
Russian
Russian Moldavia
Russian Old Spelling
Samoan
Selkup
Serbian Cyrillic
Serbian Latin
Shona
Sioux
Slovak
Slovenian
Somali
Sorbian
Sotho
Spanish
Spanish Argentina
Spanish Bolivia
Spanish Chile
Spanish Colombia
Spanish Costa Rica
Spanish Dominican Republic
Spanish Ecuador
Spanish El Salvador
Spanish Guatemala
Spanish Honduras
Spanish Mexican
Spanish Modern Sort
Spanish Nicaragua
Spanish Panama
Spanish Paraguay
Spanish Peru
Spanish Puerto Rico
Spanish Traditional Sort
Spanish Uruguay
Spanish Venezuela
Sunda
Swahili
Swazi
Swedish
Swedish Finland
Tabassaran
Tagalog
Tahitian
Tajik
Tatar
Tinpo
Tongan
Tswana
Tun
Turkish
Turkmen
Tuvin
Udmurt
Uighur Cyrillic
Uighur Latin
Ukrainian
Uzbek Cyrillic
Uzbek Latin
Visayan
Welsh
Wolof
Xhosa
Yakut
Yiddish
Zapotec
Zulu

Invoice OCROCROCR Form ProcessingOCR ScanningServer OCRZone OCR
Read more
No Comments

Change the Dictionary Separator Value

Monday, 29 July 2019 by Simple Software

This is used to change the dictionary separator value when doing thesaurus matching from the default character of | to any character(s) that you want. This can be useful in cases where the values you would like in your list or dictionary might include the pipe character or “|” or “Shift Backslash”

This setting is also used as the delimiter when parsing multiple index field values from bar codes (e.g. field1|field2|field3).

Instructions for changing the dictionary separator value:

  1. Right click on the Job Configuration file that you would like to suppress the prompt on and select Open With>Notepad
  2. Search the XML settings text open in Notepad for this term:
    <OCR_DICT_SEPARATOR>
  3. Change the value in-between from “|” to any other single character that you want.
  4. For TAB separation use %TAB%
This image has an empty alt attribute; its file name is Separator1.jpg

Bar Code ScanningBar CodesBarcode OCRBarcode Reading SoftwareBarcode Recognition SoftwareOCROCR Form ProcessingOCR ScanningPDF Barcode RecognitionZone OCR
Read more
No Comments

Change the OCR Font or Type

Monday, 29 July 2019 by Simple Software

This is used to changed the default OCR recognition font or type from the default, which is “To Be Detected”. This can be used to look for a specific type of OCR font and is especially useful for recognizing things like Dotmatrix, OCR A and OCR B.

Instructions for setting OCR Font:

1.  Right click on the .sic file and select Open With a text editor (Notepad, Wordpad, etc.)

2.  Find <OCR_TEXT_TYPE>.  If you can’t find <OCR_TEXT_TYPE> then add the following as the last row in the text file:  

<OCR_TEXT_TYPE>#</OCR_TEXT_TYPE>

3.  Change the number in between:  <OCR_TEXT_TYPE>#</OCR_TEXT_TYPE>

4.  Number of desired font:            

  • 0  Normal
  • 1  Typewriter 
  • 2  Dotmatrix 
  • 3  Index
  • 5  OCR A  
  • 6  OCR B 
  • 7  MICR E13B  
  • 8  MICR CMC7   
  • 9  Gothic       
  • 10  To Be Detected

     5.  Close and save file

Clipboard OCROCROCR Form ProcessingOCR ScanningScreen Scraping OCRScreenshot OCRTIFF PDF AnnotationsZone OCR
Read more
No Comments

Regular Expression (RegEx) – Syntax or Type

Monday, 29 July 2019 by Simple Software

SimpleIndex uses the .NET regular expressions library.

.NET uses the JavaScript/ECMAScript regular expression syntax format.

For more information see the Regular Expressions Wiki Page.

Barcode OCRClipboard OCRInvoice OCROCROCR Form ProcessingOCR ScanningScreen Scraping OCRScreenshot OCRTWAIN Scanning SoftwareUnattended ProcessingZone OCR
Read more
No Comments

I’m using full page OCR. The information is all appearing in the txt file but it is losing format about half way through. Data to the right is ending up at the end of the txt doc. Can this be fixed?

Wednesday, 28 February 2018 by dwilder

SimpleIndex version 7 solves this problem with the incorporation of the FineReader OCR engine. Full text in PDFs will now flow with the formatting of the PDF.

Legacy Versions: SimpleIndex can also be used with other OCR applications and servers to improve accuracy, formatting and performance. Use the OCR applications to convert the scanned images to text or searchable PDF, and SimpleIndex can extract index values from the text and automatically sort and organize the files.

Full Text IndexingOCROCR Form ProcessingOCR ScanningOffice PDF Text ProcessingPDF Data Extraction SoftwareText ProcessingUnattended ProcessingZone OCR
Read more
  • Published in OCR
No Comments

Is there a way to just use part of a bar code or OCR value? For example, extract “50” from the value “124450”

Wednesday, 28 February 2018 by dwilder

To do this example, create a barcode field (Field 1 for example) and a 2nd field with type “Fixed”. In the template for the 2nd field, enter %FIELD1[5,2]% to get “50” from “124450”.

%FIELD1% would get the entire value for Field #1, the barcode field. By adding the [5,2] you tell SimpleIndex to start at the 5th character (5) and take 2 characters from the value (50).

Find out more about barcode scanning on our Barcode Scanning Guide and read up on Optical Character Recognition on the SimpleOCR scanning solutions guide.

Automatic Data CaptureAutomatic Indexing SoftwareBar Code ScanningBar CodesBarcode OCRBarcode Reading SoftwareBarcode Recognition SoftwareClipboard OCRDocument ImagingDocument ScanningImage ScanningInvoice OCRKeyword IndexingOCROCR Form ProcessingOCR ScanningOffice PDF Document IndexingPDF Barcode RecognitionPDF417QR CodeQuickBooks Document ManagementScanned Document IndexingScreen Scraping OCRScreenshot OCRTWAIN Scanning SoftwareZone OCR
Read more
  • Published in Bar Codes, OCR, Office PDF Text Processing
No Comments

How do you train the OCR engine for better accuracy?

Wednesday, 28 February 2018 by dwilder

Training has been removed with version 7 due to the addition of the ABBYY FineReader OCR engine.

Invoice OCROCROCR Form ProcessingOCR ScanningScreen Scraping OCRScreenshot OCRTWAIN Scanning SoftwareUnattended ProcessingZone OCR
Read more
  • Published in OCR
No Comments

How do you configure full text searching in Retrieval mode?

Wednesday, 28 February 2018 by dwilder

On the Database tab there dropdown in the lower portion of the panel for Full Text OCR Field. Put the name of the field that will store the full-text data there. This must be configured both for Insert and Retrieval mode configurations. The database field needs to be sufficient length to store the entire text of your document.

Of course, the Insert Mode configuration must have “Enable Full Page OCR” checked to generate full text data from images. Text from MS Office documents, PDF files and existing OCR text files can be used without setting this option.

When designing your Retrieval Mode configuration, create a Text field to use for full text search queries. On the Database tab, set the corresponding “Database Field Name” to the full text database field.

When searching on your full text field, SimpleIndex finds the text you enter no matter where it appears in the document. It is able to match partial words. It does not perform boolean or natural language searches. The text entered must match the document text exactly.

DatabaseDocument Management SoftwareDocument RetrievalFile IndexingFull Text IndexingMS AccessMySQLOCROCR Form ProcessingOCR ScanningODBCOffice PDF Text ProcessingOraclePaperless OfficePDF Archive Scanning SoftwarePDF Data Extraction SoftwareQuickBooks Document ManagementSearchServer OCRSharePoint ScanningSQL ServerText ProcessingUnattended ProcessingWorkflow SoftwareZone OCR
Read more
  • Published in Database & Retrieval, OCR
No Comments

How can I improve recognition rates for my OCR fields?

Wednesday, 28 February 2018 by dwilder

There are several things you can do to improve accuracy for OCR.

  • Scan at 300dpi, black & white for best results.
  • Adjust the scan settings to remove background noise and improve the definition of characters.
  • For Zone OCR, field recognition can often vary based on the surrounding white space and text in the zone. Try varying the size of the zone to achieve optimal results.
  • For template matching, make sure all variations of the field format are included in the template list.
  • For dictionary matching, add common variations and OCR mistakes to the “thesaurus” list.
  • On the Zones & OCR tab (accessed from the Job Options) you can adjust the Max Errors setting to allow for more mistakes in the dictionary matching process.
  • Use the Strip Spaces, Strip Characters, Replace Characters and Case Fixing options to standardize the field format prior to matching.

Please refer to the SimpleIndex Wiki for details on how to configure these options.

Related Links

  • SimpleIndex.com – Zone OCR
  • SimpleIndex.com – Dynamic OCR
  • SimpleOCR.com – OCR Guide
  • SimpleIndex Wiki – OCR
  • SimpleIndex Wiki – OCR Options
  • SimpleIndex Wiki – Zone OCR
  • SimpleIndex Wiki – Full Page OCR
  • SimpleIndex Wiki – Zones & OCR Settings
  • SimpleIndex Wiki – OCR to Field
  • SimpleIndex Wiki – OCR Text View
  • SimpleIndex Wiki – Template & Dictionary Matching OCR
  • SimpleIndex Wiki – OMR and OCR Document Separation

Clipboard OCRInvoice OCROCROCR Form ProcessingOCR ScanningScreen Scraping OCRScreenshot OCRTWAIN Scanning SoftwareUnattended ProcessingZone OCR
Read more
  • Published in OCR
No Comments

Can OCR text be saved to Office, Text, HTML or other formats?

Wednesday, 28 February 2018 by dwilder

Yes.  On the OCR step of the Job Settings Wizard you can select the text output format need in the “Full-page OCR file type” drop down. By default it is set to PDF, but can be changed to Text (txt), Word (docx), Rich Text (rtf), Open Office (odt), Excel (xlsx), PowerPoint (pptx), ePub Zip (epub), FictionBook (fb2), HTML (htm), XML (xml) or Alto XML (alto.xml).

If the output file type is set to PDF, OCR text will be embedded as hidden text in the PDF file.

Related Links

  • SimpleIndex.com – Zone OCR and Dynamic OCR
  • SimpleIndex Wiki – Full Page OCR Formats
Full Text IndexingOCROCR Form ProcessingOCR ScanningOffice PDF Text ProcessingPDF Data Extraction SoftwareText ProcessingUnattended ProcessingZone OCR
Read more
  • Published in Licensing & Installation, OCR
No Comments

Can SimpleIndex create searchable PDF Image+Text files with hidden text?

Wednesday, 28 February 2018 by dwilder

Yes, it can.  You can configure this setting in the Job Settings Wizard by going to the OCR step and checking “Enable full-page OCR”.  There are many settings in the OCR step that you can used to customize the output and recognition of images.


SimpleIndex has two different OCR engines (Standard and Professional) that can be used to produced PDF Image + Text files or Searchable PDFs.

Related Links

  • SimpleIndex.com – OCR Languages
  • SimpleOCR.com – OCR Guide
  • SimpleIndex Wiki – OCR
  • SimpleIndex Wiki – Searchable PDF
  • SimpleIndex Wiki – OCR Options
  • SimpleIndex Wiki – FineReader
  • SimpleIndex Wiki – MRC
  • SimpleIndex Wiki – Tesseract
  • SimpleIndex Wiki – Languages

Full Text IndexingOCROCR Form ProcessingOCR ScanningOffice PDF Text ProcessingPDF Data Extraction SoftwareText ProcessingUnattended ProcessingZone OCR
Read more
  • Published in Export, OCR, Office PDF Text Processing
No Comments

Indexing from Applications with Screen OCR

Monday, 29 January 2018 by Simple Software

Some documents are difficult or impossible to automate with OCR. For example, documents with non-standard layouts, unconstrained handwriting or very poor scan quality. In applications like invoice processing, fully automating the data entry can require expensive software and weeks of consulting. Even after all that expense, many users miss the interface and data validations that their accounting software entry screens provide.

In cases like this, SimpleIndex can help improve data entry efficiency while archiving your scanned originals at the same time. Here’s how it works:

  • Scan a batch of documents for data entry
  • Place the SimpleIndex window side-by-side with your data entry window
  • Enter the data normally, reading from the scanned image in SimpleIndex
  • Press the hotkey combo to transfer the data to SimpleIndex
  • Save the image and repeat with the next one

In this configuration, SimpleIndex captures an image of the data entry window, then uses OCR to read the data and index the image. Since the data entry screen has a consistent layout and clear, readable fonts, it can be reliably recognized with OCR.

There are several advantages to this approach:

  • Configuration and training takes hours not weeks
  • Scanned images are indexed with no extra work
  • All the advantages of digital docs–security, searching, sharing, etc.
  • Use all the data validation features of your software
  • No flipping through paper documents
  • Operator keeps eyes on the screen and hands on the keyboard
  • Data entry can be done remotely
  • Data entry performance improves and files are archived at the same time

FAQ Related to Screenshot OCR

  • What are SimpleIndex Specifications?
  • Regular Expression (RegEx) - Syntax or Type
  • Is there a way to just use part of a bar code or OCR value? For example, extract "50" from the value "124450"
  • How do you train the OCR engine for better accuracy?
  • How can I improve recognition rates for my OCR fields?
Database Autofill, Document Automation, OCR, Optical Character Recognition, RPA, Screen Scraping OCR, Screenshot OCR, Zone OCR
Database AutofillDocument AutomationOCROptical Character RecognitionRPAScreen Scraping OCRScreenshot OCRZone OCR
Read more
No Comments

Full-Page OCR Indexing Demo

Saturday, 13 January 2018 by Simple Software

This sample job demonstrates the ability for SimpleIndex to convert scanned documents to searchable PDF files and extract index data from the OCR text. It also demonstrates the multi-user workflow capabilities.

Step 1 uses a full-page OCR process on each image.

Field data is extracted from the full-page text using template and dictionary matching algorithms.

This is done in Pre-Index mode to allow unattended processing.

Data is saved to a database so it can be reviewed and corrected in Step 2.

Step 2 uses Database Update mode to find images with missing index values and allow the user to manually enter the correct data.

Step 3 uses a SimpleSearch configuration to search and view the indexed images, including full text searches.

Find Out More

  • Download or get an Online Demo
  • Dynamic OCR Features in SimpleIndex
  • Full-Page OCR Wiki Pages
  • OCR Features and Settings Wiki Pages
  • OCR Software Guide on SimpleOCR

FAQ Related to Full-Page OCR

  • Zone OCR and Dynamic OCR
  • SimpleIndex 10.1 with Textract!
  • Accounts Payable Automation with RPA
  • Language Pack for Standard/Tesseract OCR
  • Languages Supported in SimpleSoftware OCR Engines
  • How to activate SimpleExport?
  • Regular Expression (RegEx) - Syntax or Type
  • SimpleQB - QuickBooks Company File Warning
1-Click Processing, Database, Document Retrieval, File Indexing, Invoice OCR, OCR, OCR Scanning, Search, Unattended, Unattended Processing, Zone OCR
1-Click ProcessingDatabaseDocument RetrievalFile IndexingInvoice OCROCROCR ScanningSearchUnattendedUnattended ProcessingZone OCR
Read more
No Comments

Zone OCR with Template Matching

Friday, 12 January 2018 by Simple Software

This video shows the Zone OCR Invoice Processing sample job. Zone OCR is the traditional method for extracting index data from printed text appearing in fixed locations on every page.

The video also shows how Zone OCR is enhanced with SimpleIndex‘s Template Matching and Dictionary Matching features, giving you much more margin for error than other solutions.

Find Out More

  • Download or get an Online Demo
  • Dynamic OCR Features in SimpleIndex
  • OCR Features and Settings Wiki Pages
  • OCR Software Guide on SimpleOCR

FAQ Related to Zone OCR

  • Zone OCR and Dynamic OCR
  • Language Pack for Standard/Tesseract OCR
  • Languages Supported in SimpleSoftware OCR Engines
  • Change the Dictionary Separator Value
  • Change the OCR Font or Type
  • Regular Expression (RegEx) - Syntax or Type
  • I'm using full page OCR. The information is all appearing in the txt file but it is losing format about half way through. Data to the right is ending up at the end of the txt doc. Can this be fixed?
  • Is there a way to just use part of a bar code or OCR value? For example, extract "50" from the value "124450"
File Indexing, Invoice OCR, OCR, Zone OCR
File IndexingInvoice OCROCRZone OCR
Read more
No Comments

Compare Leading Solutions

Tuesday, 07 November 2017 by dwilder

The best way to see how the SimpleIndex processing workflow compares to other leading desktop scanning solutions is to see the same process performed side-by-side in each program. Below are videos we recorded of the same batch of documents being scanned and indexed in Kofax Express™, Kodak Capture Pro™, PaperVision™ Capture Express and Office Gemini DiamondVision™. In each one we configured the software to perform the same tasks:

  • Scan a batch of 10 pages
  • Capture a 7-digit account number using Zone OCR
  • Correct any fields that fail to recognize
  • Use a database lookup to populate additional index fields
  • Export the batch to PDF files

Using our standard benchmark batch* we recorded the following processing times:

  • SimpleIndex: 0:45
  • Kodak Capture Pro: 1:50
  • Kofax Express: 2:20
  • PaperVision Capture Desktop: 3:00
  • DiamondVision: 3:20

As you will see in the videos below, SimpleIndex provides the most efficient scanning and indexing workflow of any major document capture application.

SimpleIndex™

Kodak Capture Pro™

Kofax Express™

PaperVision™ Capture Desktop

Note: This video depicts PaperVision Capture Desktop, a now discontinued software that has since been replaced by the similarly functioning updated version of PaperFlow.

Office Gemini DiamondVision™

Testing Methods

The benchmark times were recorded using all available software shortcuts, and by performing data entry and user interactions as fast as possible. The same scanner and computer hardware was used for each test. Much care was taken to ensure that each application yielded the most accurate OCR results possible given the sample documents.

Unfortunately none our competitors could accurately capture the account number on all 10 pages. The extra time to correct these errors accounts for 15-30% of the difference in processing times. The difference in accuracy is due in large part to SimpleIndex‘s pattern matching OCR feature, which the other programs lack.

Keep in mind these videos were recording using the latest version available at the time this test was taken. Results may vary with with later versions.

Batch Scanning, Database, Database Autofill, Document Imaging, Document Scanning, Fast Scanning, Front End Scanning, Image Scanning, Keyword Indexing, PDF, Scanned Document Indexing, Workflow, Zone OCR
Batch ScanningDatabaseDatabase AutofillDocument ImagingDocument ScanningFast ScanningFront End ScanningImage ScanningKeyword IndexingPDFScanned Document IndexingWorkflowZone OCR
Read more
No Comments

Video Demos

Tuesday, 07 November 2017 by dwilder

These videos demonstrate several ways SimpleIndex® can automatically index different types of documents. If you are new to SimpleIndex, watching these videos is the easiest way to see what it can do. You can follow along using the sample files included in the SimpleIndex Trial.

  • Zone OCR with template matching
  • Document barcode recognition
  • PDF OCR text parsing
  • Sort and index MS Office documents
  • Indexing with full-text OCR
  • Running jobs from an icon

The sample files are copied to your Configuration Folder when you run the SimpleIndex Trial for the first time. If you can’t find the samples, copy them with the Global Settings Wizard in the File menu.

Compare Major Scanning Solutions

Compare the SimpleIndex scanning and indexing workflow to 4 leading desktop document imaging applications–Kofax Express™, Kodak Capture Pro™, PaperVision™ Capture Express and Office Gemini DiamondVision™.

Compare SimpleIndex to the competition

University of SimpleSoftware

Extensive online training videos for the SimpleSoftware product line are available at the University of SimpleSoftware. Live versions of each class can also be scheduled with our support staff.

Visit the Simple Software University

Integrated Solutions Built with SimpleIndex

Batch Scanning to Encompass VideoSimpleInvoice

Uses the OCR and dictionary matching functionality of the SimpleIndex scanning and indexing software to automatically scan, name, and organize incoming invoices into your chosen folder structure of searchable PDF files.

SimpleQB

Scan invoices, OCR the key data and automatically receive bills in QuickBooks accounting software. SimpleQB can transfer transaction data from SimpleIndex to QuickBooks, automating your scanning and data entry tasks simultaneously.

LoanStacker for Mortgages

Use OCR with a preconfigured dictionary file to recognize over 300 mortgage origination and closing documents. Automate scanning to popular mortgage applications like Calyx Point and EllieMae Encompass.

Find out more by going to LoanStacker.com.

SimpleIndex with Contentverse Document Management

SimpleIndex is the perfect front-end scanning tool for your document management system. These videos show several ways that SimpleIndex can be configured to automate document capture with the CompuThink Contentverse document management solution.

SharePoint Scanning

Automatically organize files and set custom column metadata in SharePoint 2010 using SimpleIndex index fields.

Screenshot OCR

Use screen captures to get index data from any application.

Patent ID and Title Extraction

Out-of-the-box configuration extracts the Patent ID Number and Title from any US patent application.

Zone OCR with Template Matching

This video shows the Zone OCR Invoice Processing sample job. Zone OCR is the traditional method for extracting index data from printed text that appears in a fixed location on every page.

The video also shows how Zone OCR is enhanced with SimpleIndex‘s Template Matching and Dictionary Matching features, giving you much more margin for error than other solutions.

Watch the Zone OCR Video

Document Barcode Recognition

This video shows how barcode recognition can be used with our 1-click processing feature to index files quickly, easily and accurately.

With a single click a batch of documents is imported, barcodes are recognized and files are exported to organized folders and filenames as well as a SimpleSearch document database.

In the second part of the video, a SimpleSearch configuration is used to search and view the files processed in the first part.

Watch the Barcode Recognition Video

PDF OCR Text Parsing

This video demonstrates the PDF OCR text processing capabilities of SimpleIndex by extracting the Document Number, Date, Document Type, Customer and Total from a number of Estimates and Invoices.

All of this information is read automatically using the existing text layer of a computer generated PDF, such as those created using PDF printer drivers. Template and dictionary matching algorithms are used to locate and extract the correct data values from the text.

Since the existing text is being used, OCR is not performed. This makes processing much faster and 100% accurate. OCR can be used to get text from scanned PDF files with no existing text.

Watch the PDF OCR Text Parsing Video

Sort and Index MS Office Documents

This video shows the Read My Documents sample configuration.

Word documents, Excel spreadsheets and PowerPoint presentations are automatically sorted using the SimpleIndex template and dictionary matching algorithms.

The files are reorganized using the Sales Rep, Customer, Document Type and Date extracted from the text.

SimpleSearch is then used to search and view the sorted files.

Watch the MS Office OCR Text Parsing Video

Full Page OCR Invoice Processing

This job configuration uses a 3-step process to automate the OCR processing. First, full-page OCR is performed on each image. Field data is extracted from the full-page OCR using template and dictionary matching algorithms. This is done in Pre-Index mode to allow unattended processing. Data is saved to a database so it can be reviewed and corrected in Step 2.

Step 2 uses Database Update mode to find images with missing index values and allows the user to manually enter the correct data.

Step 3 uses a SimpleSearch configuration to search and view the indexed images, including full text searches.

Watch the Full Page OCR Video

Running Jobs from an Icon

One of the most powerful features of SimpleIndex is its ability to be launched from a command line. This allows you to save job configurations to an icon that can be launched by double-clicking it. Processing can be fully automated so that it runs minimized in the taskbar and requires no user interaction whatsoever.

This video shows what happens when you run the various sample jobs in this way.

Watch the 1-Click Processing Video

KB Articles for Optical Character Recognition

  • Language Pack for Standard/Tesseract OCR
  • Languages Supported in SimpleSoftware OCR Engines
  • What is Document Imaging?
  • Change the Dictionary Separator Value
  • Change the OCR Font or Type
  • Regular Expression (RegEx) - Syntax or Type
  • Autonumber Increment Value
  • I'm using full page OCR. The information is all appearing in the txt file but it is losing format about half way through. Data to the right is ending up at the end of the txt doc. Can this be fixed?
  • Is there a way to just use part of a bar code or OCR value? For example, extract "50" from the value "124450"
  • If I have a form which is filled manually by hand, can SimpleIndex read the data from it?
1-Click Processing, Barcode Recognition Software, Command-Line, Contentverse, File Indexing, Invoice OCR, Mortgage, OCR, Office PDF Text Processing, QuickBooks Document Management, Scanning Software, Screen Scraping OCR, Screenshot OCR, SharePoint Scanning, TWAIN Scanning Software, Zone OCR
1-Click ProcessingBarcode Recognition SoftwareCommand-LineContentverseFile IndexingInvoice OCRMortgageOCROffice PDF Text ProcessingQuickBooks Document ManagementScanning SoftwareScreen Scraping OCRScreenshot OCRSharePoint ScanningTWAIN Scanning SoftwareZone OCR
Read more
No Comments

Search

Contact Us Today!

=

Search Knowledge Base

Recent KB Articles

  • SimpleIndex Standard Workstation
  • SimpleIndex Barcode Workstation
  • SimpleIndex OCR Workstation
  • SimpleIndex Professional Workstation
  • Simple Software Server Processing Add-on for SimpleIndex
  • SimpleIndex Barcode Server 1M
  • SimpleIndex Capture Suite
  • SimpleIndex Barcode Recognition Add-on Workstation

Feature Cloud

Document Classification Bar Codes Remote Capture Scanning Software Automatic Data Capture Distributed Scanning Office PDF Text Processing Barcode Printing SharePoint Scanning Fast Scanning PDF Forms Personal Document Management Front End Scanning Barcode Recognition Software Clipboard OCR Document Retrieval Batch Scanning Server OCR OCR CSV Document Automation SAGE Document Imaging Keyword Indexing Contentverse ISIS Driver Barcode Reading Software Watermark Document Management Software Command-Line File Indexing PDF PDF Data Extraction Software Document Managment TWAIN Optical Mark Recognition PDF Barcode Recognition QR Code Command Line Interface Search TIFF PDF Annotations SimpleQB Oracle XSLT Data Conversion Software Required Documents Auditing

Online Support Options

Check our Wiki Help, Knowledge Base and Training Videos, or Contact Support if you still need Help

How to Buy

Solutions start at just $500! Buy SimpleIndex online or from an Authorized Dealer in your area.

Authorized Dealers

Authorized DealersSimpleIndex is a great addition to any system integrator's product line. Become an Authorized Dealer.

Get a Web Demo

Get a free online demo with a scanning specialist who can configure SimpleIndex on your computer remotely.
Sign up for a demo now!

Download a Trial

SimpleIndex Trial30-day trial downloads are available for all Simple Software applications.
Download Now!

SimpleIndex Applications

SimpleIndex Applications Packaged apps built with SimpleIndex.
SimpleInvoice for AP
Sales Tax Manager
Mortgage LoanStacker
MSDS and Patents
SimpleIndex

© 2022 Meta Enterprises, LLC | Knoxville, Tennessee | A Family Owned Company
© 2022 SimpleSoftware | Consulting Services in the Field of Software as a Service

TOP
Manage Cookie Consent
We use cookies to optimize our website and our service.
Functional cookies Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Manage options Manage services Manage vendors Read more about these purposes
View preferences
{title} {title} {title}
});