Automatic archival of Microsoft Office documents to PDF via batch conversion, indexing and document management workflow.
If you have Microsoft Office or OpenOffice installed, you can use SimpleIndex to automatically convert MS Office documents to PDF files for archival. PDF files are better for archival than editable formats like Word and Excel. They can be annotated, encrypted, searched and viewed with free PDF readers.
There are many free applications that let you convert documents to PDF one at a time. SimpleIndex lets you convert thousands of files at once while it also extracts data from the text for indexing or data entry automation. This feature is ideal for migrating or archiving Office documents to SharePoint, document management systems and custom web applications.
You can set SimpleIndex to assume that it needs to check every PDF file and fix it.
Go to this location in the Windows Registry:
Create a New String Value called “FixAllPDF” and set the value to 1
If you want to keep all the pages in the same order that they were imported, even though they all go with different bookmarks then do the following.
1. Open the configuration in Notepad.
2. Search for <BOOKMARK_PAGE_ORDER>
3. Change this line from “false” to “true”: <BOOKMARK_PAGE_ORDER>true</BOOKMARK_PAGE_ORDER>
4. Save and close.
You can tell SimpleIndex what types of files it should process and which file types to ignore. This is done by clicking “Job Options” On the “Batch” tab you will find a field labeled “Input file types or mask”. These are the file types that SimpleIndex will input files from. The default types are: TIF,PDF,JPG,GIF,BMP,DOC,XLS,PPT,DOCX,XLSX,PPTX,VSD,DWG,AVI,MP3 To process all files, enter * SimpleIndex will ignore any file whose extension does not appear on the list. In SimpleIndex 6 or above you can enter file masks to filter input files. Some examples are: abc*.pdf (PDF files starting with “abc”) ab??ef.* (All files starting with “ab”, 2 characters and “ef”) It is possible to have some file types open automatically in their default application. This can be done by inserting a pipe “|” into the list. Any file types after the pipe will be opened in their default application. For example: TIF,PDF,JPG|WAV,M
MS Office and PDF files generated by software or PDF printer drivers already have the text you need to recognize in the file. Scanned documents need to use OCR to read text from an image of the page. With Office and PDF files, SimpleIndex can just read the text, which is much faster and accurate than image OCR. To recognize index fields from the document text, first create OCR fields on the Index tab as you would normally. Next, on the Zones & OCR options tab, check the “Use Full Page OCR for this Field” option for each OCR field. This tells SimpleIndex to process the existing file text. If the index value is a unique pattern of digits or list of possible values, use Template or Dictionary matching to locate the value within the text. Please see the manual for details on Template and Dictionary matching. If the value appears in a specific location in each file, coordinates can be used to locate it. When processing text, the X, Y, Width and Height settings correspond to