Automatically extract key data from MS Word documents using advanced pattern matching algorithms. Use that data to organize files automatically into standardized folders and filenames, or export it to CSV, XML or any SQL database.
Windows Search works great with SimpleIndex because all index data can be saved to the folder and file names as well as the file properties, and OCR text can be saved to hidden layers in PDF files. Windows Search will read all of these elements when building its index and will return any matching files when you search. Using Windows Search on a file server allows for instantaneous searching across terabytes of documents and text for all of the users on your network. IFilters allow Windows Search to search within file contents. Here are three popular PDF IFilters that will enable text searching for PDF files: Foxit PDF IFilter (commercial) TET PDF IFilter (free/commercial) Adobe PDF IFilter (32-bit / 64-bit) (free) If you have issues with PDF text searching in Windows 10, this article has detailed instructions for resolving PDF IFilter issues: https://fixedit.itxpress.biz/2018/07/05/searching-pdfs-in-windows-10/
MS Office and PDF files generated by software or PDF printer drivers already have the text you need to recognize in the file. Scanned documents need to use OCR to read text from an image of the page. With Office and PDF files, SimpleIndex can just read the text, which is much faster and accurate than image OCR. To recognize index fields from the document text, first create OCR fields on the Index tab as you would normally. Next, on the Zones & OCR options tab, check the “Use Full Page OCR for this Field” option for each OCR field. This tells SimpleIndex to process the existing file text. If the index value is a unique pattern of digits or list of possible values, use Template or Dictionary matching to locate the value within the text. Please see the manual for details on Template and Dictionary matching. If the value appears in a specific location in each file, coordinates can be used to locate it. When processing text, the X, Y, Width and Height settings correspond to