SIGN IN YOUR ACCOUNT TO HAVE ACCESS TO DIFFERENT FEATURES

Login with Google
CREATE AN ACCOUNT FORGOT YOUR PASSWORD?

FORGOT YOUR DETAILS?

AAH, WAIT, I REMEMBER NOW!

CREATE ACCOUNT

ALREADY HAVE AN ACCOUNT?

Login with Google

QUESTIONS? CALL: 865-637-8986
  • SIGN UP
  • LOGIN

SimpleIndex

  • LEARN MORE
    • GENERAL INFO
      • Getting Started
      • How To Scan Documents
      • Barcode Scanning Guide
      • Searching & Viewing
      • News & Updates
      • Schedule a Web Demo
    • FEATURES
      • Streamlined Interface
      • TWAIN and ISIS Scanning
      • Zone OCR and Dynamic OCR
      • Database Integration
      • Required Documents Check
      • Automated Processing & 1-Click Interface
      • SharePoint Document Scanning
    • –
      • Document Classification
      • PDF & MS Office Text Parsing
      • Barcode Recognition
      • Optical Mark Recognition
      • Match Documents to Existing Data
      • Imprinting & Watermarking
      • Screenshot OCR
  • SOLUTIONS
    • General
      • All-In-One Scanning & Sorting Tool
      • Affordable Document Management
      • Instant Integration
      • Network Scanners & Copiers
      • Remote Document Capture
      • Reduce Click Charges for Data Capture
    • Specific
      • Sales Tax Exemption Forms
      • Federal Tax Returns
      • Invoice Processing
      • Material Safety Data Sheets (MSDS)
      • Patent ID and Title Extraction
      • Mortgage & Loan Documents
    • Feature Demos
      • Zone OCR with Template Matching
      • Full-Page OCR & Multi-User Workflow
      • PDF Text Processing
      • Organize Office Documents
      • Integration with RPA Bots
      • Compare with Other Solutions
  • SUITE
    • SimpleCoversheet – Print Bar Codes
    • SimpleExport – Data File Converter
    • SimpleView – Search, View & Edit
    • SimpleQB – QuickBooks Integrator
    • SimpleOCR – Freeware OCR
    • Buy Suite Apps
    • Buy Suite Bundles
  • DOWNLOAD
  • SHOP
    • COMPARE VERSIONS
    • SIMPLEINDEX WORKSTATION
      • Machine License
      • Concurrent User
      • Subscription License
    • SIMPLEINDEX SERVER
    • SUITE APPLICATIONS
    • SUITE BUNDLES
    • MAINTENANCE & RENEWALS
    • FIND A DEALER
      • Dealer Locator
      • Become a Dealer
    • CONTACT SALES
  • SUPPORT
    • WIKI HELP
    • KNOWLEDGE BASE
    • SIMPLEINDEX UNIVERSITY
      • SimpleIndex University – 100 Series
      • SimpleIndex University – 200 Series
      • SimpleIndex University – 300 Series
    • PRIVACY POLICY
    • CONTACT SUPPORT
  • My Account
    • Downloads
  • MY CART
    No products in cart.
  • Home
  • Simple Software Knowledge Base - Article

Automatically extract key data from MS Word documents using advanced pattern matching algorithms. Use that data to organize files automatically into standardized folders and filenames, or export it to CSV, XML or any SQL database.

Index With Non-Latin Character Sets

Monday, 29 July 2019 by Simple Software

By default SimpleIndex uses the ANSI character set to display and edit captured OCR data, index field values and full-text OCR. This works for all languages based on the Latin alphabet (English, French, Spanish, German, etc.)

To index documents in other languages like Chinese, Japanese, Russian, Arabic and other non-Latin alphabets, set the default character set using this registry key. If the key is not set correctly then Unicode text will show up as ??????????.

Use Notepad to edit the “Charset” value from the sample setting below and save it to a .reg file. Then double-click the .reg file to install (Administrator privileges required).

You can download the .reg file here but you still need to edit in Notepad to set the Charset value before installing.

If you are on a 32-bit operating system be sure to remove the extra “\WOW6432Node” from the registry path.

[HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\SimpleIndex\Misc]
“Charset”=”1”

Charset NameCharset Value
ANSI_CHARSET (Latin)0
DEFAULT_CHARSET1
SYMBOL_CHARSET2
SHIFTJIS_CHARSET (Japanese)128
HANGUL_CHARSET (Korean)129
GB2312_CHARSET (Simplified Chinese)134
CHINESEBIG5_CHARSET (Chinese)136
GREEK_CHARSET (Greek)161
TURKISH_CHARSET (Turkish)162
HEBREW_CHARSET (Hebrew)177
ARABIC_CHARSET (Arabic)178
BALTIC_CHARSET (Baltic)186
RUSSIAN_CHARSET (Russian)204
THAI_CHARSET (Thai)222
EE_CHARSET238
OEM_CHARSET255

The full list of values is at https://msdn.microsoft.com/en-us/library/cc194829.aspx.

Automatic Data CaptureAutomatic Indexing SoftwareFile IndexingFull Text IndexingKeyword IndexingMetadataMicrosoft Word Data ExtractionOffice PDF Document IndexingPDF Data Extraction SoftwareScanned Document Indexing
Read more
No Comments

Is it possible to search for and retrieve documents with Windows desktop search?

Wednesday, 28 February 2018 by dwilder

Windows Search works great with SimpleIndex because all index data can be saved to the folder and file names as well as the file properties, and OCR text can be saved to hidden layers in PDF files. Windows Search will read all of these elements when building its index and will return any matching files when you search.

Using Windows Search on a file server allows for instantaneous searching across terabytes of documents and text for all of the users on your network.

IFilters allow Windows Search to search within file contents.

Here are three popular PDF IFilters that will enable text searching for PDF files:

  • Foxit PDF IFilter (commercial)
  • TET PDF IFilter (free/commercial)
  • Adobe PDF IFilter (32-bit / 64-bit) (free)

If you have issues with PDF text searching in Windows 10, this article has detailed instructions for resolving PDF IFilter issues:

https://fixedit.itxpress.biz/2018/07/05/searching-pdfs-in-windows-10/

ContentverseDocument Management SoftwareDocument RetrievalFile IndexingMicrosoft Word Data ExtractionOffice PDF Document IndexingOffice PDF Text ProcessingPaperless OfficePaperVisionPDF Archive Scanning SoftwareQuickBooks Document ManagementSearchServer OCRText ProcessingUnattended Processing
Read more
  • Published in Database & Retrieval, Export, Office PDF Text Processing
No Comments

Can the values of 2 or more fields be combined in a single field?

Wednesday, 28 February 2018 by dwilder

Set the Type for the field that you want to store the combinded value to “Fixed”.

In the template setting for that field, you can enter the keyword %FIELD#% (where # is the field number) and the keyword will be replaced with the value of the designated field when it is saved.

For example, to combine your first 2 fields into third, inserting a comma between them, set the template for field 3 to:
%FIELD1%,%FIELD2%

Automatic Indexing SoftwareFile IndexingFull Text IndexingKeyword IndexingMetadataMicrosoft Word Data ExtractionOffice PDF Document IndexingPDF Data Extraction SoftwareScanned Document Indexing
Read more
  • Published in Indexing & UI
No Comments

How do you configure OCR to read index information from MS Office or PDF documents?

Wednesday, 28 February 2018 by dwilder

MS Office and PDF files generated by software or PDF printer drivers already have the text you need to recognize in the file. Scanned documents need to use OCR to read text from an image of the page. With Office and PDF files, SimpleIndex can just read the text, which is much faster and accurate than image OCR.

To recognize index fields from the document text, first create OCR fields on the Index tab as you would normally. Next, on the Zones & OCR options tab, check the “Use Full Page OCR for this Field” option for each OCR field. This tells SimpleIndex to process the existing file text.

If the index value is a unique pattern of digits or list of possible values, use Template or Dictionary matching to locate the value within the text. Please see the manual for details on Template and Dictionary matching.

If the value appears in a specific location in each file, coordinates can be used to locate it. When processing text, the X, Y, Width and Height settings correspond to line and column numbers within the file text. This is explained in greater depth in the manual.

SimpleIndex will assume that any TXT file with the same name as a file being processed is the OCR text for that file, so this method can work with any type of file.

Find out more about Optical Character Recognition on the SimpleOCR Guide.

Microsoft Word Data ExtractionMS OfficeOffice PDF Document IndexingOffice PDF Text ProcessingOffice to PDFPaperless OfficePDFPDF Archive Scanning SoftwarePDF Barcode RecognitionPDF Data Extraction SoftwarePDF FormsText ProcessingUnattended Processing
Read more
  • Published in OCR, Office PDF Text Processing
No Comments

How do I configure the output folder and file naming scheme?

Wednesday, 28 February 2018 by dwilder

Use the Folder and Filename check boxes on the Indexing & File Naming step in the Job Settings Wizard to indicate whether field values will be used to generate subfolders or filenames. Any field with the Folder option checked will create nested subfolders for each value in the order the fields are listed. Any field with the Filename checked will have the values concatenated to form the filename.

For example, if Field 1 and Field 3 have the Folder option checked, and Field 2 and Field 3 have the Filename option checked, image filenames will be created in the format:

%OUTPUTFOLDER%\Field 1\Field 3\Field 2 – Field 3.tif

The Filename Separator option on the Advanced tab lets you change the ” – ” between the fields in the filename to anything you want.

Related Pages

  • SimpleIndex Wiki – File Naming Schema
  • SimpleIndex Wiki – Indexing & File Naming
Automatic Data CaptureAutomatic Indexing SoftwareFile IndexingFull Text IndexingKeyword IndexingMetadataMicrosoft Word Data ExtractionOffice PDF Document IndexingPDF Data Extraction SoftwareScanned Document Indexing
Read more
  • Published in Export
No Comments

Automatic Indexing Using Existing Data

Wednesday, 24 January 2018 by Simple Software

Automatic Indexing Using Existing Data

The Autofill feature of SimpleIndex is an easy way to associate many index fields with one document without retyping data that already exists in another database. Autofill uses a database lookup to retrieve records that match a key value entered by the user. Blank index fields are then filled in automatically with the data from this lookup. The result is a document database with many different possible search fields, of which only one needed to be entered during scanning.

The key field may be typed by the user, or it may be read from the document automatically using barcode recognition or OCR. The lookup is performed either when the user changes this field or when the index values are saved. If the lookup finds multiple matching records, the user will be notified and the first set of values will be used by default.

When used with pre-index batches, key information can be read automatically from barcodes or OCR and matched to database records with a single click. Search on up to 99 index fields without a single keystroke!

KB Articles for Automatic Indexing

  • Exclude Index Field from Index Log
  • Turn Off Prompts and Pop Ups on Job Configurations
  • Change the Font Size of Index Fields
  • Large documents (>500 pg) Slow to Process - Workaround
  • Regular Expression (RegEx) - Syntax or Type
  • Index With Non-Latin Character Sets
  • Skip to Blank Index on Save Index
  • Stop Autorun When Double Clicking Configuration
  • Autonumber Increment Value
  • Overlap of SimpleView Viewer in SimpleIndex Display
1-Click Processing, Automatic Data Capture, Automatic Indexing Software, Barcode Recognition Software, Database, Database Autofill, Document Automation, File Indexing, File Indexing, Full Text Indexing, Keyword Indexing, Metadata, Microsoft Word Data Extraction, OCR, Office PDF Document Indexing, PDF Data Extraction Software, Scanned Document Indexing, Scanning Software
1-Click ProcessingAutomatic Data CaptureAutomatic Indexing SoftwareBarcode Recognition SoftwareDatabaseDatabase AutofillDocument AutomationFile IndexingFull Text IndexingKeyword IndexingMetadataMicrosoft Word Data ExtractionOCROffice PDF Document IndexingPDF Data Extraction SoftwareScanned Document IndexingScanning Software
Read more
No Comments

MS Office & PDF Text Parsing

Tuesday, 03 October 2017 by dwilder

Office Videos | PDF Video

The template and dictionary matching capabilities of SimpleIndex‘s OCR function can be used to extract index information from the text of existing MS Office and PDF files, or any file with an accompanying TXT file. SimpleIndex® will search the document for matches on unique patterns and value lists, then index the document with the matching data. Zone coordinates can be set to limit the search area to pre-defined regions on standard forms. The result is a fully automated indexing and renaming process for all your electronic documents!

Using existing text, SimpleIndex can index and rename hundreds of files each minute and achieve perfect accuracy. These files can then be quickly searched with SimpleIndex Retrieval, SharePoint and Google search engines, or uploaded into your company’s document/content management system or custom business applications.

Enhanced Text Parsing & PDF Support

PDF Form Read Write DataMS Office and PDF text parsing features are now included in the Basic version of SimpleIndex, making it much more affordable to enable automatic document sorting on the desktop. Additional Office and PDF features include:

  • Convert any MS Office, HTML, XML and image files to PDF before processing
  • Read and write password protected PDF file
  • Searchable PDF output (Image + Hidden Text)
  • Interactive template builder and tester
  • Easily select PDF or PDF/A output format
  • Native PDF viewer and auto-repair of problematic PDFs
  • Read data from PDF forms
  • Populate blank PDF forms with index data

Batch Convert Office Documents to PDF

If you have Microsoft Office or OpenOffice installed, you can use SimpleIndex to automatically convert MS Office documents to PDF files for archival. PDF files are better for archival than editable formats like Word and Excel. They can be annotated, encrypted, searched and viewed with free PDF readers.

There are many free applications that let you convert documents to PDF one at a time. SimpleIndex lets you convert thousands of files at once while it also extracts data from the text for indexing or data entry automation. This feature is ideal for migrating or archiving Office documents to SharePoint, document management systems and custom web applications.

Quickly Organize Any File on Your Computer

SimpleIndex lets you process any type of file on your computer. If an OLE-enabled viewer is installed, SimpleIndex will display the document on the screen. Other documents can be opened automatically in their default application when they are indexed. Quickly type index field data that can be used to reorganize the files into subfolders and structured filenames for browsing and searching on your network, or uploaded to your document/content management system or custom business application.

If the file has an accompanying text file (*.TXT) with the same name, the text in that file can be used for index field extraction, fully automating the process.

Viewing & Indexing MS Office Documents

SimpleCoversheet Barcode Indexing CoversheetsSimpleIndex features full support for viewing and editing MS Office documents (Word, PowerPoint and Excel) on computers with or without MS Office installed. The full application interface is displayed within the SimpleIndex viewer, letting users view the full content of the documents, edit them with all the features of MS Office and save the changes. Modify privileges can be denied using Windows file security or by the SimpleIndex administration wizard to keep out unauthorized changes.

If MS Office is not installed, SimpleIndex can open and display them in the built-in viewer in read-only mode.

KB Articles for MS Office & PDF Text Parsing

  • Change the Dictionary Separator Value
  • Regular Expression (RegEx) - Syntax or Type
  • Check and Repair All PDF Files
  • Keep Pages in Original Order when Bookmarking
  • Do Not Combine Pages to 1 Bookmark
  • Can I split a PDF based on bookmark values?
  • Is it possible to search for and retrieve documents with Windows desktop search?
  • Can SimpleIndex read bar codes from existing PDF files?
  • Is there a way to just use part of a bar code or OCR value? For example, extract "50" from the value "124450"
  • How do you configure OCR to read index information from MS Office or PDF documents?
Automatic Data Capture, File Indexing, Microsoft Word Data Extraction, MS Office, Office PDF Document Indexing, Office PDF Text Processing, Office to PDF, Paperless Office, PDF, PDF Archive Scanning Software, PDF Barcode Recognition, PDF Data Extraction Software, PDF Forms, Text Processing, Unattended Processing
Automatic Data CaptureFile IndexingMicrosoft Word Data ExtractionMS OfficeOffice PDF Document IndexingOffice PDF Text ProcessingOffice to PDFPaperless OfficePDFPDF Archive Scanning SoftwarePDF Barcode RecognitionPDF Data Extraction SoftwarePDF FormsText ProcessingUnattended Processing
Read more
No Comments

Search

Contact Us Today!

=

Search Knowledge Base

Recent KB Articles

  • Database Export Error
  • SimpleIndex Standard Workstation
  • SimpleIndex OCR Workstation
  • SimpleIndex Barcode Workstation
  • SimpleIndex Professional Workstation
  • SimpleIndex Barcode Server 1M
  • Simple Software Server Processing Add-on for SimpleIndex
  • SimpleIndex Barcode Recognition Add-on Workstation

Feature Cloud

Document Imaging XSLT Search Microsoft Word Data Extraction PDF Barcode Recognition OCR Watermark Office to PDF PDF MS Access Invoice OCR Keyword Indexing Bates Numbering Software QR Code Document Numbering System Required Documents Auditing Barcode OCR Screen Scraping OCR SAGE Document Classification SimpleCoversheet Barcode Reading Software Document Automation Document Capture Solution MS Office Front End Scanning Document Management Software Export RPA PDF Forms Bar Code Scanning Office PDF Text Processing SharePoint Migration XML Scanning Coversheet Fast Scanning OCR Scanning Batch Scanning OCR Form Processing File Indexing Barcode Printing SharePoint Scanning Command Line Interface TIFF PDF Annotations PDF Compression

Online Support Options

Check our Wiki Help, Knowledge Base and Training Videos, or Contact Support if you still need Help

How to Buy

Solutions start at just $500! Buy SimpleIndex online or from an Authorized Dealer in your area.

Authorized Dealers

Authorized DealersSimpleIndex is a great addition to any system integrator's product line. Become an Authorized Dealer.

Get a Web Demo

Get a free online demo with a scanning specialist who can configure SimpleIndex on your computer remotely.
Sign up for a demo now!

Download a Trial

SimpleIndex Trial30-day trial downloads are available for all Simple Software applications.
Download Now!

SimpleIndex Applications

SimpleIndex Applications Packaged apps built with SimpleIndex.
SimpleInvoice for AP
Sales Tax Manager
Mortgage LoanStacker
MSDS and Patents
SimpleIndex

© 2022 Meta Enterprises, LLC | Knoxville, Tennessee | A Family Owned Company
© 2022 SimpleSoftware | Consulting Services in the Field of Software as a Service

TOP
Manage Cookie Consent
We use cookies to optimize our website and our service.
Functional cookies Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Manage options Manage services Manage vendors Read more about these purposes
View preferences
{title} {title} {title}
});