SIGN IN YOUR ACCOUNT TO HAVE ACCESS TO DIFFERENT FEATURES

Login with Google


CREATE AN ACCOUNT FORGOT YOUR PASSWORD?

FORGOT YOUR DETAILS?

AAH, WAIT, I REMEMBER NOW!

CREATE ACCOUNT

ALREADY HAVE AN ACCOUNT?

Login with Google

QUESTIONS? CALL: 865-637-8986
  • SIGN UP
  • LOGIN

SimpleIndex

  • LEARN MORE
    • GENERAL INFO
      • Getting Started
      • How To Scan Documents
      • Barcode Scanning Guide
      • Searching & Viewing
      • News & Updates
      • Schedule a Web Demo
    • FEATURES
      • Streamlined Interface
      • TWAIN and ISIS Scanning
      • Zone OCR and Dynamic OCR
      • Database Integration
      • Required Documents Check
      • Automated Processing & 1-Click Interface
      • SharePoint Document Scanning
    • –
      • Document Classification
      • PDF & MS Office Text Parsing
      • Barcode Recognition
      • Optical Mark Recognition
      • Match Documents to Existing Data
      • Imprinting & Watermarking
      • Screenshot OCR
  • SOLUTIONS
    • General
      • All-In-One Scanning & Sorting Tool
      • Affordable Document Management
      • Instant Integration
      • Network Scanners & Copiers
      • Remote Document Capture
      • Reduce Click Charges for Data Capture
    • Specific
      • Sales Tax Exemption Forms
      • Federal Tax Returns
      • Invoice Processing
      • Material Safety Data Sheets (MSDS)
      • Patent ID and Title Extraction
      • Mortgage & Loan Documents
    • Feature Demos
      • Zone OCR with Template Matching
      • Full-Page OCR & Multi-User Workflow
      • PDF Text Processing
      • Organize Office Documents
      • Integration with RPA Bots
      • Compare with Other Solutions
  • SUITE
    • SimpleCoversheet – Print Bar Codes
    • SimpleExport – Data File Converter
    • SimpleView – Search, View & Edit
    • SimpleQB – QuickBooks Integrator
    • SimpleOCR – Freeware OCR
    • Buy Suite Apps
    • Buy Suite Bundles
  • DOWNLOAD
  • SHOP
    • COMPARE VERSIONS
    • SIMPLEINDEX WORKSTATION
      • Machine License
      • Concurrent User
      • Subscription License
    • SIMPLEINDEX SERVER
    • SUITE APPLICATIONS
    • SUITE BUNDLES
    • MAINTENANCE & RENEWALS
    • FIND A DEALER
      • Dealer Locator
      • Become a Dealer
    • CONTACT SALES
  • SUPPORT
    • WIKI HELP
    • KNOWLEDGE BASE
    • SIMPLEINDEX UNIVERSITY
      • SimpleIndex University – 100 Series
      • SimpleIndex University – 200 Series
      • SimpleIndex University – 300 Series
    • PRIVACY POLICY
    • CONTACT SUPPORT
  • My Account
    • Downloads
  • MY CART
    No products in cart.
  • Home
  • Page

An essential first step to processing mixed batches with many types of documents is classification. Document Classification methods quickly sort documents by type using key content and layout attributes to identify them.

The most popular document classification systems are advanced AI-based machine learning algorithms that automatically learn how to classify documents based on samples and user feedback. These systems are very powerful but also very expensive. Only large organizations processing millions of pages each year can afford these enterprise solutions.

SimpleIndex naturally has a simpler way to do classification based on keyword patterns in the document text. Simply create a list of document types and assign one or more unique keywords or phrases that will only appear in that document type to each. Logical operators for AND, OR and NOT prevent false matches by requiring multiple keywords for matching or excluding documents that contain certain phrases.

Keyword-based classification works for the vast majority of applications at a fraction of the cost of AI classification.

After classification, SimpleIndex can automatically launch separate document indexing workflows for each document type found in the classified batch. This is especially useful when documents have different metadata requirements or business workflows associated with them.

Zone OCR and Dynamic OCR

Monday, 07 November 2022 by Simple Software

Many document scanning solutions use Zone OCR to obtain index data from the page.

SimpleIndex improves upon this time-tested but ultimately limited model with its Dynamic OCR feature.

Let’s look at the difference between the two methods:

Zone OCR

Zone OCR is used to read document indexes or tags from text on the page. It is a great way to automate the data entry associated with scanning documents.

However, there are several limitations to zone OCR that must be overcome:

  • Index information must be in the exact same place on every page
  • Documents shift and skew during scanning, causing the zones to not line up
  • If surrounding lines or text on the document are too close, they can encroach on the zone

Dynamic OCR

SimpleIndex overcomes these limitations by using Dynamic OCR technology to locate the desired text even when it moves around on the page. Our simplified version of Dynamic OCR works great for many types of documents at a fraction of the cost of other solutions.

  • Index information can appear anywhere on any page
  • Unwanted characters are automatically ignored
  • Find unique patterns of letters and numbers using Template Matching
    (Social Security #, Date, etc.)
  • Use Dictionary Matching to find a value from a list of possible values
    (Vendor Name, Document Type, etc.)

Download document scanning and OCR software.

Dynamic OCR Examples

In the video we see how SimpleIndex approaches a typical Zone OCR example. With SimpleIndex you can use large zones that give a wide margin for error. Template and Dictionary matching are then used to extract the 7-digit Account Number, 6-digit Order Number and Company Name. SimpleIndex discards the surrounding text and keeps the correct value.

Another common example is finding a unique identifier, for example a social security number, that could appear anywhere on the page. Simply enter the template ###-##-#### and SimpleIndex will search the full OCR text until it finds a match. Since only one social security number is likely to appear on the page, a match on this pattern is almost certainly the required value.

With dictionary matching, you can give SimpleIndex a list of possible values and it will automatically search the zone or page for each possible value until it finds a match.

Many dynamic forms processing applications can be implemented using these simple algorithms. This makes SimpleIndex far more versatile than other zone OCR solutions that require the index value to be in the exact same location on every page. Yet SimpleIndex costs only a fraction of the price!

SimpleIndex‘s dynamic forms processing can greatly speed up data entry by eliminating a good percentage of indexing work. For many this can put the labor cost of scanning within their reach.

MS Office Document OCR Text Parsing Video

Dynamic OCR can also be applied to MS Office and PDF files, creating a fully automated process for intelligently indexing and reorganizing electronic documents.

Amazon AWS Textract Cloud OCR Batch Processing

Amazon AWS Textract Cloud OCR

With Textract you can capture data from almost any type of form, including handwritten ones! Textract identifies labeled text anywhere on the document and returns the label text along with the corresponding value. Map the labels to index fields in SimpleIndex and you are ready to capture that data no matter where it appears on the page.

Textract uses machine learning with a huge model based on the billions of pages processed using Textract to provide the most accurate OCR and form field extraction solution available.

By default, Textract is only available as an API and requires custom coding to integrate it into your document workflows. SimpleIndex turns it into a fully-featured document batch document and data processing app that is ready to use out-of-the-box.

Since there are no templates to configure or train, setup can be done in hours instead of days or weeks months required by other enterprise data capture solutions.

Pay-as-you-go pricing makes SimpleIndex with Textract the most affordable way to batch process forms for projects with less than 50,000 pages per year to process, especially if you need to read handwriting or have forms with many layout variations.

Wiki: How to configure AWS Textract OCR in SimpleIndex

Support for Regular Expressions

Use Regular Expressions to extract index data from OCR text, PDF and Office documents.

SimpleIndex OCR has a simple built-in template format, as well as support for Regular Expressions. Regular Expressions (RegEx for short) let you define complex search patterns to extract matching values from the text.  This greatly enhances the functionality of the dynamic OCR in SimpleIndex, making it capable of finding variable-length fields with no distinct pattern.

Regular Expressions are a commonly used in text parsing applications. The Perl programming language makes extensive use of RegEx, as do UNIX utilities like “grep”. Many programmers and IT personnel are already familiar with RegEx and can create complex expressions without specific training.

Click here for a reference guide to Regular Expressions

Download document scanning and OCR software.

New OCR Features in Version 10

SimpleIndex 10 includes major upgrades to the OCR and Bar Code engines 

  • Amazon Textract Cloud OCR option added, with settings for Text, Forms and Invoice & Receipt extraction.
  • FineReader Engine has been upgraded to version 11. Offers improved accuracy and speed when processing large documents.
  • Full-page OCR to Word (docx), Rich Text (rtf), Open Office (odt), Excel (xlsx), PowerPoint (pptx), ePub Zip (epub), FictionBook (fb2), HTML (htm), XML (xml), Alto XML (alto.xml).
  • MRC Compression for PDF files (Mixed Raster Content).
  • OCR language pack includes all available Tesseract languages including Hindi, Tamil, Arabic, Chinese, Thai, Vietnamese, Japanese, Korean, Indonesian, Hebrew and many more.

How to Configure SimpleIndex OCR

Our Wiki help has extensive information on how to configure OCR for various document and data capture scenarios.

  • Zone OCR read data in a specific location
  • Template matching to match unique patterns
  • Dictionary matching to match a list of possible values
  • OCR Options OCR job settings that apply to all fields
  • File Formats that can be output by OCR
  • Languages supported by OCR
  • FineReader versus Tesseract OCR engines
  • Searchable PDF with MRC compression
  • OCR to Field for point and click OCR during verification
  • Cloud OCR using Textract

Watch this Simple Software University training video to see how to configure and run an OCR job with SimpleIndex.

Download document scanning and OCR software.

 

KB Articles for Optical Character Recognition (OCR)

  • Language Pack for Standard/Tesseract OCR
  • Languages Supported in SimpleSoftware OCR Engines
  • What is Document Imaging?
  • Change the Dictionary Separator Value
  • Change the OCR Font or Type
  • Regular Expression (RegEx) - Syntax or Type
  • Autonumber Increment Value
  • I'm using full page OCR. The information is all appearing in the txt file but it is losing format about half way through. Data to the right is ending up at the end of the txt doc. Can this be fixed?
  • Is there a way to just use part of a bar code or OCR value? For example, extract "50" from the value "124450"
  • If I have a form which is filled manually by hand, can SimpleIndex read the data from it?
Automatic Data Capture, Batch Scanning, Document Classification, Document Imaging, File Indexing, Invoice OCR, OCR, Office PDF Text Processing, Optical Character Recognition, RegEx, Screenshot OCR, Search, Text Processing, Watermark PDF Files, Workflow Software, Zone OCR
Automatic Data CaptureBatch ScanningDocument ClassificationDocument ImagingFile IndexingInvoice OCROCROffice PDF Text ProcessingOptical Character RecognitionRegExScreenshot OCRSearchText ProcessingWatermark PDF FilesWorkflow SoftwareZone OCR
Read more
No Comments

Large documents (>500 pg) Slow to Process – Workaround

Thursday, 06 February 2020 by Cary Wiedman

When working with PDF image files containing a high number of pages (typically in excess of 500, but can vary by file and PC running the job) SimpleIndex may run into performance issues as it attempts to hold all of those pages in memory and perform the requested operations (full-text OCR in particular can tax a system in these circumstances).

A workaround in this scenario is to convert the large PDF to a folder of smaller PDFs files that can be managed more easily. In order to minimize the impact on production and tax the user(s) with extra steps, you can use a third-party splitting tool that can be called from the Command Line. One such option that has worked well is PDFSplitter from CoolUtils

One way to automate this process is to use PDFSplitter’s command line ability in conjunction with SimpleIndex’s Pre-processing function. For simplicity let’s consider a 600 page PDF with a filename generated at the time of scanning using indexes provided on a coversheet or keyed by an operator. The goal now is to take that large file and perform a full-text conversion on it.

Our SimpleIndex job, Full Page OCR.sic let’s say, launches and before getting to work calls PDFSplitter from the Pre-processing step with a command such as

PDFSplitter.exe C:\Images\Smith – John – Medical History.pdf C:\Images\Pages\ -cp 100

PDFSplitter will run and break that document every 100 pages creating 6 PDFs in the folder C:\Images\Pages. It maintains the original filename, simply adding “001-100” and so on to the name. After PDFSplitter is complete the Full Page OCR job begins its process and, given that the original filename is still part of the split files’ naming schema, it can produce one full-text PDF in the final output folder.

Automatic Indexing SoftwareCommand Line InterfaceCommand-LineFile IndexingOffice PDF Document IndexingOffice PDF Text ProcessingPDFPDF FormsScanned Document Indexing
Read more
No Comments

How to activate SimpleView?

Wednesday, 04 September 2019 by Simple Software

Activation Instructions

SimpleView Option A – New SimpleIndex Installation:

If you are installing SimpleView on the Windows computer for the first time first download SimpleIndex from the SimpleIndex Demo Installation Link.

Once the SimpleIndex software has been downloaded install the software from the downloaded installation file.

During the installation process you will be asked to enter your Serial Code or Serial Codes.

Single Serial Code:

Multiple Serial Codes (separate with a comma):

After you have entered your Serial Code(s) click Next to move through the installation process.

Once the installation is complete you will receive the following Window:

SimpleView Option B – SimpleView Already Installed:

If you have already installed the SimpleView software then all you need to do is Activate the demo.

Click the SimpleView icon from the SimpleIndex software or from your Windows Start menu.

Enter your Serial Number into the “Enter Serial Number to Activate” field in the Activation Window.

Click the Activate button to activate the license.

You will receive a confirmation that the license was properly activated and your license type will be displayed next to the “License Type:” section of the Activation Window.

SimpleView Option C – SimpleView Installed on Computer Not Connected to the Internet:

If you would like to install SimpleView on a computer that doesn’t have an internet connection an Offline Activation will need to be done.

First fully install the SimpleView software without activation.

Click the SimpleView icon from the SimpleIndex software or from your Windows Start menu.

Enter your Serial Number into the “Enter Serial Number to Activate” field in the Activation Window.

Click the “Offline Activation” button.

Click OK in the “SimpleView Offline Activation” window, which asks you to call or email for an Offline Activation.

Select the license version that you ordered in the “SimpleView Version” drop down.

Then either call (865) 637-8986 option 2 or email support@simpleindex.com with the Authorization Request Code.  We will the provide you with the Activation Key.

Enter the Activation Key and then click the Offline Activation button.

Maintenance is optional, but covers tech support and upgrades for the software. Please consider purchasing maintenance if you haven’t already. Please refer to Simple Software Maintenance Agreement for more information.

Automatic Indexing SoftwareFile IndexingOffice PDF Document IndexingOffice PDF Text ProcessingPDFPDF FormsScanned Document Indexing
Read more
No Comments

How to activate any Add-on or Upgrade to SimpleIndex?

Friday, 30 August 2019 by Simple Software

SimpleIndex Add-on Option A – New SimpleIndex Installation:

If you are installing SimpleIndex on the Windows computer for the first time first download SimpleIndex from the SimpleIndex Demo Installation Link.

Once the SimpleIndex software has been downloaded install the software from the downloaded installation file.

During the installation process you will be asked to enter your Serial Code or Serial Codes.

Single Serial Code:

Multiple Serial Codes (separate with a comma):

After you have entered your Serial Code(s) click Next to move through the installation process.

Once the installation is complete you will receive the following Window:

When you click Finish you will receive the Global Settings Wizard window to configure the general settings for SimpleIndex on the installed computer.

Move through the prompts to configure the Global Settings Wizard.  Once complete you will receive a confirmation that the License was properly activated before the software opens.

SimpleIndex Add-on Option B – SimpleIndex Already Installed:

If you have already installed the SimpleIndex software then all you need to do is Activate the demo.

Click the SimpleIndex icon on your desktop or from your Windows Start menu.

Once SimpleIndex is open go to the Help menu and Select Activate/Transfer License.

Enter your Serial Number into the “Enter Serial Number to Activate” field in the Activation Window.

Click the Activate button to activate the license.

You will receive a confirmation that the license was properly activated and your license type will be displayed next to the “License Type:” section of the Activation Window.

SimpleIndex Add-on Option C – SimpleIndex Installed on Computer Not Connected to the Internet:

If you have installed SimpleIndex on a computer that doesn’t have an internet connection an Offline Activation will need to be done.

First fully install the SimpleIndex software without activation.

Once it has been fully installed click the SimpleIndex icon on your desktop or from your Windows Start menu.

Image of the SimpleIndex Shortcut Icon

Once SimpleIndex is open go to the Help menu and select Activate/Transfer License.

Enter your Serial Number into the “Enter Serial Number to Activate” field in the Activation Window.

Click the “Offline Activation” button.

Click OK in the “SimpleIndex Offline Activation” window, which asks you to call or email for an Offline Activation.

Select the license version that you ordered in the “SimpleIndex Version” drop down.

Then either call (865) 637-8986 option 2 or email support@simpleindex.com with the Authorization Request Code.  We will the provide you with the Activation Key.

Enter the Activation Key and then click the Offline Activation button.

Maintenance is optional, but covers tech support and upgrades for the software. Please consider purchasing maintenance if you haven’t already. Please refer to Simple Software Maintenance Agreement for more information.

Automatic Indexing SoftwareFile IndexingOCROffice PDF Document IndexingOffice PDF Text ProcessingPDFPDF FormsScanned Document IndexingScanning Software
Read more
No Comments

Check and Repair All PDF Files

Monday, 29 July 2019 by Simple Software

You can set SimpleIndex to assume that it needs to check every PDF file and fix it.

Go to this location in the Windows Registry:

Computer\HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\SimpleIndex\Misc

Create a New String Value called “FixAllPDF” and set the value to 1

Office PDF Document IndexingOffice PDF Text ProcessingOffice to PDFPDFPDF Archive Scanning SoftwarePDF Barcode RecognitionPDF Data Extraction SoftwarePDF FormsUnattended Processing
Read more
No Comments

Keep Pages in Original Order when Bookmarking

Monday, 29 July 2019 by Simple Software

If you want to keep all the pages in the same order that they were imported, even though they all go with different bookmarks then do the following.

1.  Open the configuration in Notepad.
2.  Search for <BOOKMARK_PAGE_ORDER>
3.  Change this line from “false” to “true”:  <BOOKMARK_PAGE_ORDER>true</BOOKMARK_PAGE_ORDER>
4.  Save and close.

Office PDF Document IndexingOffice PDF Text ProcessingOffice to PDFPDFPDF Archive Scanning SoftwarePDF Barcode RecognitionPDF BookmarkingPDF Data Extraction SoftwarePDF FormsUnattended Processing
Read more
No Comments

Is SimpleIndex for Windows only? I’m a Mac user.

Wednesday, 28 February 2018 by dwilder

Unfortunately SimpleIndex is for Windows only. This is true of most high speed document scanning applications, due to the fact that most document scanners only have Windows drivers.

However, SimpleIndex can output to databases and file shares on a Mac server. The fact that it does not have its own proprietary file system and database makes it a very good choice for Mac networks, since only the scanning workstation needs to be a PC.

Likewise, many users have reported great success running Parallels or Bootcamp on their Mac to allow the use of a Windows OS

Automatic Indexing SoftwareFile IndexingOCROffice PDF Document IndexingOffice PDF Text ProcessingPDFPDF FormsScanned Document IndexingScanning Software
Read more
  • Published in Licensing & Installation
No Comments

Is it possible to search for and retrieve documents with Windows desktop search?

Wednesday, 28 February 2018 by dwilder

Windows Search works great with SimpleIndex because all index data can be saved to the folder and file names as well as the file properties, and OCR text can be saved to hidden layers in PDF files. Windows Search will read all of these elements when building its index and will return any matching files when you search.

Using Windows Search on a file server allows for instantaneous searching across terabytes of documents and text for all of the users on your network.

IFilters allow Windows Search to search within file contents.

Here are three popular PDF IFilters that will enable text searching for PDF files:

  • Foxit PDF IFilter (commercial)
  • TET PDF IFilter (free/commercial)
  • Adobe PDF IFilter (32-bit / 64-bit) (free)

If you have issues with PDF text searching in Windows 10, this article has detailed instructions for resolving PDF IFilter issues:

https://fixedit.itxpress.biz/2018/07/05/searching-pdfs-in-windows-10/

ContentverseDocument Management SoftwareDocument RetrievalFile IndexingMicrosoft Word Data ExtractionOffice PDF Document IndexingOffice PDF Text ProcessingPaperless OfficePaperVisionPDF Archive Scanning SoftwareQuickBooks Document ManagementSearchServer OCRText ProcessingUnattended Processing
Read more
  • Published in Database & Retrieval, Export, Office PDF Text Processing
No Comments

How much do Simple Software products cost?

Wednesday, 28 February 2018 by dwilder

Click here for the latest pricing and online ordering information. You can also purchase full service solutions from one of our Authorized Dealers.

Click here for a PDF version of the price list and a feature matrix that shows which features are included in each version.

All applications are activated online by entering a serial number in the demo. The serial is emailed to you once your order is processed.

Automatic Indexing SoftwareFile IndexingOCROffice PDF Document IndexingOffice PDF Text ProcessingPDFPDF FormsScanned Document IndexingScanning SoftwareUnattended Processing
Read more
  • Published in Licensing & Installation, LoanStacker, SimpleCoversheet, SimpleExport, SimpleQB, SimpleSend, SimpleView
No Comments

On what versions of Windows does SimpleIndex run?

Wednesday, 28 February 2018 by dwilder

SimpleIndex will run on Windows 10, 8, 7, Vista, 2008 editions.

It would not run on Windows ME or NT.

SimpleIndex 8.3 and below are compatible with Windows XP, Server 2003 and Windows 2000.

Automatic Indexing SoftwareFile IndexingOffice PDF Document IndexingOffice PDF Text ProcessingPDFPDF Forms
Read more
  • Published in Licensing & Installation
No Comments

I’m using full page OCR. The information is all appearing in the txt file but it is losing format about half way through. Data to the right is ending up at the end of the txt doc. Can this be fixed?

Wednesday, 28 February 2018 by dwilder

SimpleIndex version 7 solves this problem with the incorporation of the FineReader OCR engine. Full text in PDFs will now flow with the formatting of the PDF.

Legacy Versions: SimpleIndex can also be used with other OCR applications and servers to improve accuracy, formatting and performance. Use the OCR applications to convert the scanned images to text or searchable PDF, and SimpleIndex can extract index values from the text and automatically sort and organize the files.

Full Text IndexingOCROCR Form ProcessingOCR ScanningOffice PDF Text ProcessingPDF Data Extraction SoftwareText ProcessingUnattended ProcessingZone OCR
Read more
  • Published in OCR
No Comments

How do you configure full text searching in Retrieval mode?

Wednesday, 28 February 2018 by dwilder

On the Database tab there dropdown in the lower portion of the panel for Full Text OCR Field. Put the name of the field that will store the full-text data there. This must be configured both for Insert and Retrieval mode configurations. The database field needs to be sufficient length to store the entire text of your document.

Of course, the Insert Mode configuration must have “Enable Full Page OCR” checked to generate full text data from images. Text from MS Office documents, PDF files and existing OCR text files can be used without setting this option.

When designing your Retrieval Mode configuration, create a Text field to use for full text search queries. On the Database tab, set the corresponding “Database Field Name” to the full text database field.

When searching on your full text field, SimpleIndex finds the text you enter no matter where it appears in the document. It is able to match partial words. It does not perform boolean or natural language searches. The text entered must match the document text exactly.

DatabaseDocument Management SoftwareDocument RetrievalFile IndexingFull Text IndexingMS AccessMySQLOCROCR Form ProcessingOCR ScanningODBCOffice PDF Text ProcessingOraclePaperless OfficePDF Archive Scanning SoftwarePDF Data Extraction SoftwareQuickBooks Document ManagementSearchServer OCRSharePoint ScanningSQL ServerText ProcessingUnattended ProcessingWorkflow SoftwareZone OCR
Read more
  • Published in Database & Retrieval, OCR
No Comments

How do you configure OCR to read index information from MS Office or PDF documents?

Wednesday, 28 February 2018 by dwilder

MS Office and PDF files generated by software or PDF printer drivers already have the text you need to recognize in the file. Scanned documents need to use OCR to read text from an image of the page. With Office and PDF files, SimpleIndex can just read the text, which is much faster and accurate than image OCR.

To recognize index fields from the document text, first create OCR fields on the Index tab as you would normally. Next, on the Zones & OCR options tab, check the “Use Full Page OCR for this Field” option for each OCR field. This tells SimpleIndex to process the existing file text.

If the index value is a unique pattern of digits or list of possible values, use Template or Dictionary matching to locate the value within the text. Please see the manual for details on Template and Dictionary matching.

If the value appears in a specific location in each file, coordinates can be used to locate it. When processing text, the X, Y, Width and Height settings correspond to line and column numbers within the file text. This is explained in greater depth in the manual.

SimpleIndex will assume that any TXT file with the same name as a file being processed is the OCR text for that file, so this method can work with any type of file.

Find out more about Optical Character Recognition on the SimpleOCR Guide.

Microsoft Word Data ExtractionMS OfficeOffice PDF Document IndexingOffice PDF Text ProcessingOffice to PDFPaperless OfficePDFPDF Archive Scanning SoftwarePDF Barcode RecognitionPDF Data Extraction SoftwarePDF FormsText ProcessingUnattended Processing
Read more
  • Published in OCR, Office PDF Text Processing
No Comments

Can OCR text be saved to Office, Text, HTML or other formats?

Wednesday, 28 February 2018 by dwilder

Yes.  On the OCR step of the Job Settings Wizard you can select the text output format need in the “Full-page OCR file type” drop down. By default it is set to PDF, but can be changed to Text (txt), Word (docx), Rich Text (rtf), Open Office (odt), Excel (xlsx), PowerPoint (pptx), ePub Zip (epub), FictionBook (fb2), HTML (htm), XML (xml) or Alto XML (alto.xml).

If the output file type is set to PDF, OCR text will be embedded as hidden text in the PDF file.

Related Links

  • SimpleIndex.com – Zone OCR and Dynamic OCR
  • SimpleIndex Wiki – Full Page OCR Formats
Full Text IndexingOCROCR Form ProcessingOCR ScanningOffice PDF Text ProcessingPDF Data Extraction SoftwareText ProcessingUnattended ProcessingZone OCR
Read more
  • Published in Licensing & Installation, OCR
No Comments

Can SimpleIndex create searchable PDF Image+Text files with hidden text?

Wednesday, 28 February 2018 by dwilder

Yes, it can.  You can configure this setting in the Job Settings Wizard by going to the OCR step and checking “Enable full-page OCR”.  There are many settings in the OCR step that you can used to customize the output and recognition of images.


SimpleIndex has two different OCR engines (Standard and Professional) that can be used to produced PDF Image + Text files or Searchable PDFs.

Related Links

  • SimpleIndex.com – OCR Languages
  • SimpleOCR.com – OCR Guide
  • SimpleIndex Wiki – OCR
  • SimpleIndex Wiki – Searchable PDF
  • SimpleIndex Wiki – OCR Options
  • SimpleIndex Wiki – FineReader
  • SimpleIndex Wiki – MRC
  • SimpleIndex Wiki – Tesseract
  • SimpleIndex Wiki – Languages

Full Text IndexingOCROCR Form ProcessingOCR ScanningOffice PDF Text ProcessingPDF Data Extraction SoftwareText ProcessingUnattended ProcessingZone OCR
Read more
  • Published in Export, OCR, Office PDF Text Processing
No Comments

The All-In-One Scanning & Sorting Tool

Tuesday, 30 January 2018 by Simple Software

SimpleIndex® has the ability to perform a wide variety of scanning and document organization tasks quickly and easily.

This makes it a must-have tool for IT departments and consultants who often need to:

  • Scan and store various documents on network shares or cloud storage
  • Organize existing MS Office, PDF and other files on your network
  • Attach files to records in a custom database
  • Integrate scanning into custom business applications
  • Add document capture to Robotic Process Automation bots
  • Automate data entry from paper or electronic documents
  • Reduce click charges for centralized scanning departments

The missing piece of the IT document management puzzle

With a cost and setup time that is negligible compared to other enterprise capture platforms, SimpleIndex makes sense even when your company already has one. For example:

  • When SimpleIndex has unique features your other software doesn’t, like Office and PDF text parsing, PDF Bookmarking or Electronic Imprinting.
  • When setting up a new workflow requires extensive setup time and management approvals to use the central system.
  • When click charges make a project prohibitively expensive.

The wide variety of scanning and indexing tasks you can do with SimpleIndex make it an incredibly useful tool to have in your arsenal.

Find Out More

  • Download or get an Online Demo
  • SimpleIndex Feature Guide
  • Wiki Manual Pages
  • Compare Versions & Licensing

KB Articles for Document Management

  • Oracle database is slow to respond
  • What is Document Imaging?
  • Using alternate database schemas
  • Multiple Sort Fields on Search
  • Access Database Connection String
  • How do I delete an image and it's database entry?
  • Is it possible to search for and retrieve documents with Windows desktop search?
  • Will your SimpleQB allow me to scan in old invoices or bank statements directly into QuickBooks?
  • How do I use the Media Wizard to create searchable DVDs or thumb drives?
  • How do I export index data to a database?
1-Click Processing, Bar Code Scanning, Command-Line, Database, Document Automation, Document Capture Solution, Document Imaging, Document Management Software, Imprinting, Imprinting & Watermarking, MS Office, OCR, Office PDF Text Processing, PDF Bookmarking, RPA, Scanning Software, TWAIN, TWAIN & ISIS Scanning
1-Click ProcessingBar Code ScanningCommand-LineDatabaseDocument AutomationDocument Capture SolutionDocument ImagingDocument Management SoftwareImprintingImprinting & WatermarkingMS OfficeOCROffice PDF Text ProcessingPDF BookmarkingRPAScanning SoftwareTWAINTWAIN & ISIS Scanning
Read more
No Comments

Organize Office Documents with Text Parsing

Tuesday, 23 January 2018 by Simple Software

This video shows the Sort My Documents sample job included with the SimpleIndex trial download. It shows how you can organize office documents automatically by parsing the file’s text for relevant metadata and keywords. You can then use those keywords to tag documents with metadata and create standardized folders and filenames.

Organize Office Document Automatically with Text Parsing

First we sort Word documents, Excel spreadsheets and PowerPoint presentations automatically using the SimpleIndex template and dictionary matching algorithms that match patterns and keywords in the parsed text.

Then the files are organized into folders and filenames using the Sales Rep, Customer, Document Type and Date values extracted from the text.

Organize Office Documents for Cloud Storage

You can also upload organized files to SharePoint or Cloud Storage platforms without the chaos and disorganization you inevitably get when users create their own folders and filenames.

Organize Office Documents for Document Management

In the video, we use SimpleSearch to search and view the sorted files. But you can just as easily use any third party document management system or custom database to perform keyword or full-text searching.

You can use the SimpleView embedded viewer to view Office documents, PDF files and images in a common interface. In the video we use the full version of Word, Excel, and PowerPoint to edit Office documents right from the search screen.

Find Out More

  • Download or get an Online Demo
  • MS Office Text Processing Features in SimpleIndex
  • MS Office Features and Settings Wiki Pages
  • OCR Features and Settings Wiki Pages
  • OCR Software Guide on SimpleOCR

FAQ Related to Organizing Office Documents

  • Features
  • Take control of Sales Tax exemption forms
  • Instant Integration With Any Application
  • Document Classification
  • Zone OCR and Dynamic OCR
  • Exclude Index Field from Index Log
  • Change the Font Size of Index Fields
  • Large documents (>500 pg) Slow to Process - Workaround
Document Classification, Full Text Indexing, MS Office, Office PDF Document Indexing, Office PDF Text Processing, Office to PDF, Paperless Office, Search, SharePoint Migration, SharePoint Scanning, Text Processing

Document ClassificationFull Text IndexingMS OfficeOffice PDF Document IndexingOffice PDF Text ProcessingOffice to PDFPaperless OfficeSearchSharePoint MigrationSharePoint ScanningText Processing
Read more
No Comments

Streamlined Interface

Tuesday, 23 January 2018 by Simple Software

Maximum Data, Minimum Clicks

As with any repetitive task, a few seconds saved scanning and filing a single document quickly adds up to dozens or hundreds of hours over the course of a long project or daily routine. The most import part of planning your document capture project is to find the most efficient way to file them correctly. Creating an efficient workflow will save you countless hours of labor over the life of your project.

SimpleIndex is faster and easier because it is designed to perform all of the steps necessary to scan or import documents, process, verify and export them in one continuous workflow rather than requiring the user to click extra buttons each time to initiate the next step. When taken to the extreme, SimpleIndex is capable of performing all of these tasks automatically with just a single mouse click.

SimpleIndex does this by saving all of the settings for a document capture workflow to a file that can be opened just like an Office document. This file is configured by the administrator so the user doesn’t have to see any of the technical details. Very rarely does the operator need to be able to change, for instance, the export file format and file naming scheme. So why do some applications show you a complicated export settings screen every time you try to save a batch? It is this attention to detail that allows SimpleIndex to process the same batch 35-75% faster than its competitors.

SimpleIndex also has the ability to pre-set index values and run jobs using the Command Line Interface. More on this design feature can be found on our Getting Started page.

Index Automation Features

The two main methods for automating indexing are Barcode Recognition and Optical Character Recognition (OCR).

Barcode recognition is faster and more accurate, but your documents must contain a barcode on the document or a cover page for this to work.

OCR is able to read printed data directly from the page, which means most documents can be processed as-is. However it is not 100% accurate and usually requires some human review. Handwriting can be recognized as well, using the Cloud OCR option.

If your index data already exists in another database, SimpleIndex has features that can make use of this data to automate processing. The Index Autofill feature matches data read from barcodes or OCR to data in your database, verifying the correct value is read and populating additional search fields automatically.

Paper and Electronic Documents

Traditional document capture is focused on digitizing paper documents with a document scanner. However, more and more documents are living their best lives as native PDF and Word files, never once having to enter our physical realm.

SimpleIndex is designed to handle both scanned physical documents and electronic files in their native format seamlessly. The OCR function will use existing text from any PDF file or Office document when it is available, or automatically OCR scanned images when it isn’t.

Use the built-in SimpleView viewer to view most common file types, or use the PDF editor and word processor of your choice to provide full editing capabilities embedded right within the SimpleIndex application.

It can also simultaneously scan and import documents from a hotfolder into a single batch. So if, for example, you receive both paper and email invoices, you can process your day’s work all at once with just one click!

Using Pre-Indexed Batches

The Pre-Index Batch feature of SimpleIndex is what enables 1-click scanning and indexing, as well as command line and unattended processing.

Pre-indexing lets you set fixed values for index fields and apply them to a whole batch. These can be combined with automatic values from barcode recognition, OCR and Autofill to create fully automated batch processes that can be launched from your custom application, a desktop shortcut, scheduled server task or even linked to the scan button on your scanner.

KB Articles for Streamlined Interface

  • Features
  • Take control of Sales Tax exemption forms
  • Reduce Click Charges for Data Capture
  • Instant Integration With Any Application
  • Indexing Solutions with Barcode Recognition
  • Automated Processing & 1-Click Interface
  • Full-Page OCR Indexing Demo
  • Video Demos
  • Network Scanners & Copiers
  • The All-In-One Scanning & Sorting Tool
Automatic Data Capture, Barcode Recognition Software, Batch Scanning, Command Line Interface, Database, Document Automation, Document Classification, Document Imaging, Fast Scanning, OCR, Office PDF Text Processing, RPA, Scanning Software, Solution, TWAIN & ISIS Scanning, Unattended, Workflow, Workflow Software
Automatic Data CaptureBarcode Recognition SoftwareBatch ScanningCommand Line InterfaceDatabaseDocument AutomationDocument ClassificationDocument ImagingFast ScanningOCROffice PDF Text ProcessingRPAScanning SoftwareSolutionTWAIN & ISIS ScanningUnattendedWorkflowWorkflow Software
Read more
No Comments

PDF Text Processing Demo

Friday, 12 January 2018 by Simple Software

This sample job demonstrates the PDF text processing capabilities of SimpleIndex by extracting the Document Number, Date, Document Type, Customer and Total from a number of documents without OCR, by processing the text layer of PDF files.

Adobe Acrobat PDF Text ProcessingComputer-generated PDF files, such as those created using PDF printer drivers, already contain digitized text. SimpleIndex reads the text and performs Template and Dictionary Matching to locate and extract the correct data values from the text.

Since the existing text is being used, OCR is not performed. This makes processing much faster and 100% accurate, especially compared to solutions using zone OCR.

While this demo runs interactively, text processing jobs can run in unattended mode since the data does not need to be verified.

Full-Page OCR can also be used to get text from scanned PDF files with no existing text. SimpleIndex will also detect when a PDF file has existing text and only perform OCR on the documents that need it to improve performance.

Find Out More

  • Download or get an Online Demo
  • PDF Text Processing Features in SimpleIndex
  • PDF Features and Settings Wiki Pages
  • Full-Page OCR Wiki Pages
  • OCR Features and Settings Wiki Pages
  • OCR Software Guide on SimpleOCR

FAQ Related to PDF Text Processing

  • Features
  • Patent ID and Title Extraction
  • Take control of Sales Tax exemption forms
  • Instant Integration With Any Application
  • Affordable Document Management
  • Indexing Solutions with Barcode Recognition
  • Document Classification
  • Zone OCR and Dynamic OCR
OCR, Office PDF Document Indexing, Office PDF Text Processing, PDF, PDF Archive Scanning Software, PDF Data Extraction Software, Text Processing, Unattended Processing
OCROffice PDF Document IndexingOffice PDF Text ProcessingPDFPDF Archive Scanning SoftwarePDF Data Extraction SoftwareText ProcessingUnattended Processing
Read more
No Comments

Video Demos

Tuesday, 07 November 2017 by dwilder

These videos demonstrate several ways SimpleIndex® can automatically index different types of documents. If you are new to SimpleIndex, watching these videos is the easiest way to see what it can do. You can follow along using the sample files included in the SimpleIndex Trial.

  • Zone OCR with template matching
  • Document barcode recognition
  • PDF OCR text parsing
  • Sort and index MS Office documents
  • Indexing with full-text OCR
  • Running jobs from an icon

The sample files are copied to your Configuration Folder when you run the SimpleIndex Trial for the first time. If you can’t find the samples, copy them with the Global Settings Wizard in the File menu.

Compare Major Scanning Solutions

Compare the SimpleIndex scanning and indexing workflow to 4 leading desktop document imaging applications–Kofax Express™, Kodak Capture Pro™, PaperVision™ Capture Express and Office Gemini DiamondVision™.

Compare SimpleIndex to the competition

University of SimpleSoftware

Extensive online training videos for the SimpleSoftware product line are available at the University of SimpleSoftware. Live versions of each class can also be scheduled with our support staff.

Visit the Simple Software University

Integrated Solutions Built with SimpleIndex

Batch Scanning to Encompass VideoSimpleInvoice

Uses the OCR and dictionary matching functionality of the SimpleIndex scanning and indexing software to automatically scan, name, and organize incoming invoices into your chosen folder structure of searchable PDF files.

SimpleQB

Scan invoices, OCR the key data and automatically receive bills in QuickBooks accounting software. SimpleQB can transfer transaction data from SimpleIndex to QuickBooks, automating your scanning and data entry tasks simultaneously.

LoanStacker for Mortgages

Use OCR with a preconfigured dictionary file to recognize over 300 mortgage origination and closing documents. Automate scanning to popular mortgage applications like Calyx Point and EllieMae Encompass.

Find out more by going to LoanStacker.com.

SimpleIndex with Contentverse Document Management

SimpleIndex is the perfect front-end scanning tool for your document management system. These videos show several ways that SimpleIndex can be configured to automate document capture with the CompuThink Contentverse document management solution.

SharePoint Scanning

Automatically organize files and set custom column metadata in SharePoint 2010 using SimpleIndex index fields.

Screenshot OCR

Use screen captures to get index data from any application.

Patent ID and Title Extraction

Out-of-the-box configuration extracts the Patent ID Number and Title from any US patent application.

Zone OCR with Template Matching

This video shows the Zone OCR Invoice Processing sample job. Zone OCR is the traditional method for extracting index data from printed text that appears in a fixed location on every page.

The video also shows how Zone OCR is enhanced with SimpleIndex‘s Template Matching and Dictionary Matching features, giving you much more margin for error than other solutions.

Watch the Zone OCR Video

Document Barcode Recognition

This video shows how barcode recognition can be used with our 1-click processing feature to index files quickly, easily and accurately.

With a single click a batch of documents is imported, barcodes are recognized and files are exported to organized folders and filenames as well as a SimpleSearch document database.

In the second part of the video, a SimpleSearch configuration is used to search and view the files processed in the first part.

Watch the Barcode Recognition Video

PDF OCR Text Parsing

This video demonstrates the PDF OCR text processing capabilities of SimpleIndex by extracting the Document Number, Date, Document Type, Customer and Total from a number of Estimates and Invoices.

All of this information is read automatically using the existing text layer of a computer generated PDF, such as those created using PDF printer drivers. Template and dictionary matching algorithms are used to locate and extract the correct data values from the text.

Since the existing text is being used, OCR is not performed. This makes processing much faster and 100% accurate. OCR can be used to get text from scanned PDF files with no existing text.

Watch the PDF OCR Text Parsing Video

Sort and Index MS Office Documents

This video shows the Read My Documents sample configuration.

Word documents, Excel spreadsheets and PowerPoint presentations are automatically sorted using the SimpleIndex template and dictionary matching algorithms.

The files are reorganized using the Sales Rep, Customer, Document Type and Date extracted from the text.

SimpleSearch is then used to search and view the sorted files.

Watch the MS Office OCR Text Parsing Video

Full Page OCR Invoice Processing

This job configuration uses a 3-step process to automate the OCR processing. First, full-page OCR is performed on each image. Field data is extracted from the full-page OCR using template and dictionary matching algorithms. This is done in Pre-Index mode to allow unattended processing. Data is saved to a database so it can be reviewed and corrected in Step 2.

Step 2 uses Database Update mode to find images with missing index values and allows the user to manually enter the correct data.

Step 3 uses a SimpleSearch configuration to search and view the indexed images, including full text searches.

Watch the Full Page OCR Video

Running Jobs from an Icon

One of the most powerful features of SimpleIndex is its ability to be launched from a command line. This allows you to save job configurations to an icon that can be launched by double-clicking it. Processing can be fully automated so that it runs minimized in the taskbar and requires no user interaction whatsoever.

This video shows what happens when you run the various sample jobs in this way.

Watch the 1-Click Processing Video

KB Articles for Optical Character Recognition

  • Language Pack for Standard/Tesseract OCR
  • Languages Supported in SimpleSoftware OCR Engines
  • What is Document Imaging?
  • Change the Dictionary Separator Value
  • Change the OCR Font or Type
  • Regular Expression (RegEx) - Syntax or Type
  • Autonumber Increment Value
  • I'm using full page OCR. The information is all appearing in the txt file but it is losing format about half way through. Data to the right is ending up at the end of the txt doc. Can this be fixed?
  • Is there a way to just use part of a bar code or OCR value? For example, extract "50" from the value "124450"
  • If I have a form which is filled manually by hand, can SimpleIndex read the data from it?
1-Click Processing, Barcode Recognition Software, Command-Line, Contentverse, File Indexing, Invoice OCR, Mortgage, OCR, Office PDF Text Processing, QuickBooks Document Management, Scanning Software, Screen Scraping OCR, Screenshot OCR, SharePoint Scanning, TWAIN Scanning Software, Zone OCR
1-Click ProcessingBarcode Recognition SoftwareCommand-LineContentverseFile IndexingInvoice OCRMortgageOCROffice PDF Text ProcessingQuickBooks Document ManagementScanning SoftwareScreen Scraping OCRScreenshot OCRSharePoint ScanningTWAIN Scanning SoftwareZone OCR
Read more
No Comments
  • 1
  • 2

Search

Contact Us Today!

=

Search Knowledge Base

Recent KB Articles

  • SimpleIndex OCR Workstation
  • SimpleIndex Barcode Server 1M
  • Simple Software Server Processing Add-on for SimpleIndex
  • SimpleIndex Barcode Workstation
  • Prompt Before Appending with Identical Filename
  • SimpleIndex Professional Workstation
  • SimpleIndex - Affordable document scanning and OCR
  • SimpleIndex Standard Workstation

Feature Cloud

Full-Text Search Bar Code Printing QR Code File Indexing XSLT Data Conversion Software XML Scanned Document Indexing Required Documents Auditing Optical Mark Recognition MS Office Metadata Command Line Interface Document Scanning Keyword Indexing QuickBooks Online Fast Scanning PDF Data Extraction Software SharePoint Migration ODBC OMR PDF SimpleSend Solution RegEx Unattended Processing Office to PDF XSLT Distributed Scanning Barcode Recognition Software Command-Line PDF Archive Scanning Software Front End Scanning Database OCR Invoice Scanning Software Business Process Automation OCR Form Processing Automatic Data Capture TWAIN TWAIN & ISIS Scanning Watermark PDF Files SimpleCoversheet Document Managment Imprinting & Watermarking Barcode Printing

Online Support Options

Check our Wiki Help, Knowledge Base and Training Videos, or Contact Support if you still need Help

How to Buy

Solutions start at just $500! Buy SimpleIndex online or from an Authorized Dealer in your area.

Authorized Dealers

Authorized DealersSimpleIndex is a great addition to any system integrator's product line. Become an Authorized Dealer.

Get a Web Demo

Get a free online demo with a scanning specialist who can configure SimpleIndex on your computer remotely.
Sign up for a demo now!

Download a Trial

SimpleIndex Trial30-day trial downloads are available for all Simple Software applications.
Download Now!

SimpleIndex Applications

SimpleIndex Applications Packaged apps built with SimpleIndex.
SimpleInvoice for AP
Sales Tax Manager
Mortgage LoanStacker
MSDS and Patents
SimpleIndex

© 2022 Meta Enterprises, LLC | Knoxville, Tennessee | A Family Owned Company
© 2022 SimpleSoftware | Consulting Services in the Field of Software as a Service

TOP
Manage Cookie Consent
We use cookies to optimize our website and our service.
Functional cookies Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Manage options Manage services Manage vendors Read more about these purposes
View preferences
{title} {title} {title}
});