Screen Scraping OCR Pages

Change the OCR Font or Type

Monday, 29 July 2019 by Simple Software

Please refer to the Wiki Documentation for the complete OCR Options reference.

This is used to changed the default OCR recognition font or type from the default, which is “To Be Detected”. This can be used to look for a specific type of OCR font and is especially useful for recognizing things like Dotmatrix, OCR A and OCR B.

Instructions for setting OCR Font:

1. Right click on the .sic file and select Open With a text editor (Notepad, Wordpad, etc.)

2. Find <OCR_TEXT_TYPE>. If you can’t find <OCR_TEXT_TYPE> then add the following as the last row in the text file:

<OCR_TEXT_TYPE>#</OCR_TEXT_TYPE>

3. Change the number in between: <OCR_TEXT_TYPE>#</OCR_TEXT_TYPE>

4. Number of desired font:

0 Normal
1 Typewriter
2 Dotmatrix
3 Index
5 OCR A
6 OCR B
7 MICR E13B
8 MICR CMC7
9 Gothic
10 To Be Detected

5. Close and save file

No Comments

Regular Expression (RegEx) – Syntax or Type

Monday, 29 July 2019 by Simple Software

Please refer to the Wiki Documentation for the complete Regular Expressions reference.

SimpleIndex uses the .NET regular expressions library.

.NET uses the JavaScript/ECMAScript regular expression syntax format.

For more information see the Regular Expressions Wiki Page.

Barcode OCR Clipboard OCR Invoice OCR OCR OCR Form Processing OCR Scanning Screen Scraping OCR Screenshot OCR TWAIN Scanning Software Unattended Processing Zone OCR

No Comments

Is there a way to just use part of a bar code or OCR value? For example, extract “50” from the value “124450”

Wednesday, 28 February 2018 by dwilder

Please refer to the Wiki Documentation for the complete Bar Code Recognition reference.

To do this example, create a barcode field (Field 1 for example) and a 2nd field with type “Fixed”. In the template for the 2nd field, enter %FIELD1[5,2]% to get “50” from “124450”.

%FIELD1% would get the entire value for Field #1, the barcode field. By adding the [5,2] you tell SimpleIndex to start at the 5th character (5) and take 2 characters from the value (50).

Find out more about barcode scanning on our Barcode Scanning Guide and read up on Optical Character Recognition on the SimpleOCR scanning solutions guide.

Published in Bar Codes, OCR, Office PDF Text Processing

No Comments

How do you train the OCR engine for better accuracy?

Wednesday, 28 February 2018 by dwilder

Training has been removed with version 7 due to the addition of the ABBYY FineReader OCR engine.

Invoice OCR OCR OCR Form Processing OCR Scanning Screen Scraping OCR Screenshot OCR TWAIN Scanning Software Unattended Processing Zone OCR

Published in OCR

No Comments

How can I improve recognition rates for my OCR fields?

Wednesday, 28 February 2018 by dwilder

There are several things you can do to improve accuracy for OCR.

Scan at 300dpi, black & white for best results.
Adjust the scan settings to remove background noise and improve the definition of characters.
For Zone OCR, field recognition can often vary based on the surrounding white space and text in the zone. Try varying the size of the zone to achieve optimal results.
For template matching, make sure all variations of the field format are included in the template list.
For dictionary matching, add common variations and OCR mistakes to the “thesaurus” list.
On the Zones & OCR tab (accessed from the Job Options) you can adjust the Max Errors setting to allow for more mistakes in the dictionary matching process.
Use the Strip Spaces, Strip Characters, Replace Characters and Case Fixing options to standardize the field format prior to matching.

Please refer to the SimpleIndex Wiki for details on how to configure these options.

Indexing from Applications with Screen OCR

Monday, 29 January 2018 by Simple Software

Recognize and extract screen data with SimpleIndex screen OCR Some documents are difficult or impossible to automate with OCR. For example, documents with non-standard layouts, unconstrained handwriting or very poor scan quality. In applications like invoice processing, fully automating the data entry can require expensive software and weeks of consulting. Even after all that expense, many users miss the interface and data validations that their accounting software entry screens provide.

In cases like this, SimpleIndex can help improve data entry efficiency while archiving your scanned originals at the same time. Here’s how it works:

Scan a batch of documents for data entry
Place the SimpleIndex window side-by-side with your data entry window
Enter the data normally, reading from the scanned image in SimpleIndex
Press the hotkey combo to transfer the data to SimpleIndex
Save the image and repeat with the next one

In this configuration, SimpleIndex captures an image of the data entry window, then uses OCR to read the data and index the image. Since the data entry screen has a consistent layout and clear, readable fonts, it can be reliably recognized with OCR.

There are several advantages to this approach:

Configuration and training takes hours not weeks
Scanned images are indexed with no extra work
All the advantages of digital docs–security, searching, sharing, etc.
Use all the data validation features of your software
No flipping through paper documents
Operator keeps eyes on the screen and hands on the keyboard
Data entry can be done remotely
Data entry performance improves and files are archived at the same time

Learn More:

Scan, file, and process document data quickly and efficiently with Simple Software's tailored OCR automation and one-click processing that fits your unique business needs

Use SimpleIndex OCR to convert scanned and digital images to searchable PDF files for automated sorting, filing, and export to applications such as Word, Excel, PowerPoint, etc.

FAQ Related to Screenshot OCR

Database Autofill, Document Automation, OCR, offline OCR, on-prem OCR, on-site OCR, One-time payment OCR, Optical Character Recognition, RPA, Screen Scraping OCR, Screenshot OCR, Self-hosted OCR, Subscription free OCR, Sunshine OCR, Zone OCR

Database Autofill Document Automation OCR offline OCR on-prem OCR on-site OCR One-time payment OCR Optical Character Recognition RPA Screen Scraping OCR Screenshot OCR Self-hosted OCR Subscription free OCR Sunshine OCR Zone OCR

No Comments

Video Demos

Tuesday, 07 November 2017 by dwilder

These videos demonstrate several ways SimpleIndex^® can automatically index different types of documents. If you are new to SimpleIndex, watching these videos is the easiest way to see what it can do. You can follow along using the sample files included in the SimpleIndex Trial.

The sample files are copied to your Configuration Folder when you run the SimpleIndex Trial for the first time. If you can’t find the samples, copy them with the Global Settings Wizard in the File menu.

Compare Major Scanning Solutions

Compare the SimpleIndex scanning and indexing workflow to 4 leading desktop document imaging applications–Kofax Express™, Kodak Capture Pro™, PaperVision™ Capture Express and Office Gemini DiamondVision™.

Compare SimpleIndex to the competition

University of SimpleSoftware

Extensive online training videos for the SimpleSoftware product line are available at the University of SimpleSoftware. Live versions of each class can also be scheduled with our support staff.

Visit the Simple Software University

Integrated Solutions Built with SimpleIndex

SimpleInvoice

Uses the OCR and dictionary matching functionality of the SimpleIndex scanning and indexing software to automatically scan, name, and organize incoming invoices into your chosen folder structure of searchable PDF files.

SimpleQB

Scan invoices, OCR the key data and automatically receive bills in QuickBooks accounting software. SimpleQB can transfer transaction data from SimpleIndex to QuickBooks, automating your scanning and data entry tasks simultaneously.

LoanStacker for Mortgages

Use OCR with a preconfigured dictionary file to recognize over 300 mortgage origination and closing documents. Automate scanning to popular mortgage applications like Calyx Point and EllieMae Encompass.

Find out more by going to LoanStacker.com.

SimpleIndex with Contentverse Document Management

SimpleIndex is the perfect front-end scanning tool for your document management system. These videos show several ways that SimpleIndex can be configured to automate document capture with the CompuThink Contentverse document management solution.

SharePoint Scanning

Automatically organize files and set custom column metadata in SharePoint 2010 using SimpleIndex index fields.

Screenshot OCR

Use screen captures to get index data from any application.

Patent ID and Title Extraction

Out-of-the-box configuration extracts the Patent ID Number and Title from any US patent application.

Zone OCR with Template Matching

This video shows the Zone OCR Invoice Processing sample job. Zone OCR is the traditional method for extracting index data from printed text that appears in a fixed location on every page.

The video also shows how Zone OCR is enhanced with SimpleIndex‘s Template Matching and Dictionary Matching features, giving you much more margin for error than other solutions.

Watch the Zone OCR Video

Document Barcode Recognition

This video shows how barcode recognition can be used with our 1-click processing feature to index files quickly, easily and accurately.

With a single click a batch of documents is imported, barcodes are recognized and files are exported to organized folders and filenames as well as a SimpleSearch document database.

In the second part of the video, a SimpleSearch configuration is used to search and view the files processed in the first part.

Watch the Barcode Recognition Video

PDF OCR Text Parsing

This video demonstrates the PDF OCR text processing capabilities of SimpleIndex by extracting the Document Number, Date, Document Type, Customer and Total from a number of Estimates and Invoices.

All of this information is read automatically using the existing text layer of a computer generated PDF, such as those created using PDF printer drivers. Template and dictionary matching algorithms are used to locate and extract the correct data values from the text.

Since the existing text is being used, OCR is not performed. This makes processing much faster and 100% accurate. OCR can be used to get text from scanned PDF files with no existing text.

Watch the PDF OCR Text Parsing Video

Sort and Index MS Office Documents

This video shows the Read My Documents sample configuration.

Word documents, Excel spreadsheets and PowerPoint presentations are automatically sorted using the SimpleIndex template and dictionary matching algorithms.

The files are reorganized using the Sales Rep, Customer, Document Type and Date extracted from the text.

SimpleSearch is then used to search and view the sorted files.

Watch the MS Office OCR Text Parsing Video

Full Page OCR Invoice Processing

This job configuration uses a 3-step process to automate the OCR processing. First, full-page OCR is performed on each image. Field data is extracted from the full-page OCR using template and dictionary matching algorithms. This is done in Pre-Index mode to allow unattended processing. Data is saved to a database so it can be reviewed and corrected in Step 2.

Step 2 uses Database Update mode to find images with missing index values and allows the user to manually enter the correct data.

Step 3 uses a SimpleSearch configuration to search and view the indexed images, including full text searches.

Watch the Full Page OCR Video

Running Jobs from an Icon

One of the most powerful features of SimpleIndex is its ability to be launched from a command line. This allows you to save job configurations to an icon that can be launched by double-clicking it. Processing can be fully automated so that it runs minimized in the taskbar and requires no user interaction whatsoever.

This video shows what happens when you run the various sample jobs in this way.

Watch the 1-Click Processing Video

Learn More:

KB Articles for Optical Character Recognition

1-Click Processing, Barcode Recognition Software, Command-Line, Contentverse, File Indexing, Invoice OCR, Mortgage, OCR, Office PDF Text Processing, offline OCR, on-prem OCR, on-site OCR, One-time payment OCR, QuickBooks Document Management, Scanning Software, Screen Scraping OCR, Screenshot OCR, Self-hosted OCR, SharePoint Scanning, Subscription free OCR, Sunshine OCR, TWAIN Scanning Software, Zone OCR

No Comments

SIGN IN YOUR ACCOUNT TO HAVE ACCESS TO DIFFERENT FEATURES

FORGOT YOUR DETAILS?

CREATE ACCOUNT

Related Links

Learn More:

FAQ Related to Screenshot OCR

Compare Major Scanning Solutions

University of SimpleSoftware

Integrated Solutions Built with SimpleIndex

Zone OCR with Template Matching

Document Barcode Recognition

PDF OCR Text Parsing

Sort and Index MS Office Documents

Full Page OCR Invoice Processing

Running Jobs from an Icon

Learn More:

KB Articles for Optical Character Recognition