Metadata document indexing.
Can I split a PDF based on bookmark values?
SimpleIndex can create PDF files with bookmarks based on the index data captured in your batch.
Going the other way–splitting an existing PDF file based on the bookmark value–is not a built-in feature of SimpleIndex. However there are inexpensive command line utilities that you can integrate with SimpleIndex in order to accomplish this.
For example, the CoolUtils PDFSplitter and A-PDF Split both offer this function starting around $35.
The command line to split the PDF can be integrated into the Pre-Process setting in SimpleIndex, found under the Advanced Settings section of the Configuration Wizard. An example pre-process using PDFSplitter to split based on the second level bookmark values would be:
PDFSplitter.exe “c:\Images\BookmarkFile.pdf” “%CONFIGFILEFOLDER%\Input” -em bookmarks -b 2
Can SimpleIndex integrate with Microsoft SharePoint?
Any document processed with SimpleIndex can be uploaded directly to your SharePoint document library, and any custom columns or metadata tags can be set automatically using the SimpleIndex index field values.
Find out more at our SharePoint Document Scanning page.
Integrated custom metadata is only supported in SharePoint 2010 and above, including SharePoint Online / Office 365. Microsoft .NET 3.5 and the SharePoint 2010 Client Object Model are required and must be installed separately in version 7. Version 8 includes a download option in the Global Settings Wizard. Version 8.4 and above include them automatically. Version 7 users can Download the SharePoint 2010 Client Object Model here.
To configure SharePoint export, go to the Advanced Options screen in your Job Options and enter the URL of your document library in the SharePoint Document Library URL setting.
The easiest way to integrate with SimpleIndex is to simply map a network drive to the SharePoint document library and set your Output folder to use this drive. SimpleIndex will create folders and name files automatically using your job settings.
In this configuration, only the Title tag is set. You can also use SimpleIndex‘s file property feature to set EXIF tags (images) or PDF file properties for the Title, Subject, Author and Keywords tags.
One thing to remember when configuring SimpleIndex jobs for SharePoint is the extra restrictions on filenames in SharePoint. For a detailed list please visit SharePoint File Name Restrictions. When using the integrated SharePoint feature with SharePoint 2010 these invalid characters are automatically replaced when exporting.
There are also several inexpensive or free applications that allow you to upload documents processed with SimpleIndex to SharePoint. These can be useful when you have a slow connection and need the files to upload in the background without slowing down production. Here are two of them:
SharePoint 2010 Bulk Document Importer
If your SharePoint integration has requirements not met by these solutions, our Professional Services department will be able to design a SharePoint interface to meet your specifications.
- Published in SharePoint Integration
Is it possible to search for and retrieve documents with Windows desktop search?
Windows Search works great with SimpleIndex because all index data can be saved to the folder and file names as well as the file properties, and OCR text can be saved to hidden layers in PDF files. Windows Search will read all of these elements when building its index and will return any matching files when you search.
Using Windows Search on a file server allows for instantaneous searching across terabytes of documents and text for all of the users on your network.
IFilters allow Windows Search to search within file contents.
Here are three popular PDF IFilters that will enable text searching for PDF files:
- Foxit PDF IFilter (commercial)
- TET PDF IFilter (free/commercial)
- Adobe PDF IFilter (32-bit / 64-bit) (free)
If you have issues with PDF text searching in Windows 10, this article has detailed instructions for resolving PDF IFilter issues:
- Published in Database & Retrieval, Export, Office PDF Text Processing
Can SimpleIndex read bar codes from existing PDF files?
There are 2 types of PDF files. PDFs created by scanning applications use images, while PDF files created by software or printer drivers use text. SimpleIndex can read bar codes from either type of document.
With image PDFs, SimpleIndex will use normal image barcode recognition. With text PDFs, SimpleIndex can read the value of the barcode from text (if it was created with a font) or convert the PDF to an image and read it (if the bar code is an image).
To read the barcode from text is much faster and all versions of SimpleIndex include the ability to parse the text of PDF file.
Find out more about bar code scanning on our Bar Code Scanning Guide.
- Published in Bar Codes, Import, Office PDF Text Processing
How do I export index data to a database?
There are a variety of ways to connect to your database. Detailed instructions are provided in the Manual (check the Help menu). Here is a brief overview of the steps involved:
-Create a job configuration to scan and index files
-On the database tab, set the “Database Mode” to “Insert New Records”
-To use ODBC, enter the data source name or file in Data Source
-To connect directly, select your database type under “Select a Data Source” and click Start. A series of dialogs will prompt you for database connection information.
-Select destination Table or View and click Reload
-For each index field, select the corresponding database field that will receive that field value
-The “Output File Field” will receive the path to the exported file
Once you have created records in your database in “Insert” mode, you can change to “Retrieve and View Records” and use SimpleIndex or SimpleSearch to search and view the files.
- Published in Database & Retrieval, Export