Any document processed with SimpleIndex can be uploaded directly to your SharePoint document library, and any custom columns or metadata tags can be set automatically using the SimpleIndex index field values.
Find out more at our SharePoint Document Scanning page.
Integrated custom metadata is only supported in SharePoint 2010 and above, including SharePoint Online / Office 365. Microsoft .NET 3.5 and the SharePoint 2010 Client Object Model are required and must be installed separately in version 7. Version 8 includes a download option in the Global Settings Wizard. Version 8.4 and above include them automatically. Version 7 users can Download the SharePoint 2010 Client Object Model here.
To configure SharePoint export, go to the Advanced Options screen in your Job Options and enter the URL of your document library in the SharePoint Document Library URL setting.
The easiest way to integrate with SimpleIndex is to simply map a network drive to the SharePoint document library and set your Output folder to use this drive. SimpleIndex will create folders and name files automatically using your job settings.
In this configuration, only the Title tag is set. You can also use SimpleIndex‘s file property feature to set EXIF tags (images) or PDF file properties for the Title, Subject, Author and Keywords tags.
One thing to remember when configuring SimpleIndex jobs for SharePoint is the extra restrictions on filenames in SharePoint. For a detailed list please visit SharePoint File Name Restrictions. When using the integrated SharePoint feature with SharePoint 2010 these invalid characters are automatically replaced when exporting.
There are also several inexpensive or free applications that allow you to upload documents processed with SimpleIndex to SharePoint. These can be useful when you have a slow connection and need the files to upload in the background without slowing down production. Here are two of them:
If your SharePoint integration has requirements not met by these solutions, our Professional Services department will be able to design a SharePoint interface to meet your specifications.
There are 2 types of PDF files. PDFs created by scanning applications use images, while PDF files created by software or printer drivers use text. SimpleIndex can read bar codes from either type of document.
With image PDFs, SimpleIndex will use normal image barcode recognition. With text PDFs, SimpleIndex can read the value of the barcode from text (if it was created with a font) or convert the PDF to an image and read it (if the bar code is an image).
To read the barcode from text is much faster and all versions of SimpleIndex include the ability to parse the text of PDF file.
Find out more about bar code scanning on our Bar Code Scanning Guide.
How do i setup Simpleindex to use a Database table field as a list file when the table is not the same as the table i am using on the Database Tab.
All you will need to do is put the table and field name you want to pull from in the Dictionary Matching & List Fields Box. It should be formatted just like the example below.
Example: Table Name|Field Name
I’m using full page OCR. The information is all appearing in the txt file but it is losing format about half way through. Data to the right is ending up at the end of the txt doc. Can this be fixed?
SimpleIndex version 7 solves this problem with the incorporation of the FineReader OCR engine. Full text in PDFs will now flow with the formatting of the PDF.
Legacy Versions: SimpleIndex can also be used with other OCR applications and servers to improve accuracy, formatting and performance. Use the OCR applications to convert the scanned images to text or searchable PDF, and SimpleIndex can extract index values from the text and automatically sort and organize the files.
SimpleIndex supports simple checkbox recognition function that will give a field the value “1” when the amount of black pixels exceeds the specified threshold for that zone and “0” if not.
To configure a field for Optical Mark Recognition, select “OMR” for the field type on the Index options tab. In the Zones & OCR settings, click “Set Field Coordinates” and follow the on-screen instructions to mark the check box region on the image as you would with an OCR field. Use unchecked checkboxes for your sample image.
After drawing the zone and selecting the OMR field from the list, you will be prompted to enter a pixel threshold for the field. The number shown is the current number of black pixels in the selected zone. If the box is unchecked, the threshold should be slightly higher (10-20%).
If you enter a negative number for the threshold, images that exceed the threshold will have the OMR zone saved to a separate file. This is useful for signature capture and similar applications.
There are several things you can do to improve accuracy for OCR.
- Scan at 300dpi, black & white for best results.
- Adjust the scan settings to remove background noise and improve the definition of characters.
- For Zone OCR, field recognition can often vary based on the surrounding white space and text in the zone. Try varying the size of the zone to achieve optimal results.
- For template matching, make sure all variations of the field format are included in the template list.
- For dictionary matching, add common variations and OCR mistakes to the “thesaurus” list.
- On the Zones & OCR tab (accessed from the Job Options) you can adjust the Max Errors setting to allow for more mistakes in the dictionary matching process.
- Use the Strip Spaces, Strip Characters, Replace Characters and Case Fixing options to standardize the field format prior to matching.
Please refer to the SimpleIndex Wiki for details on how to configure these options.
- SimpleIndex.com – Zone OCR
- SimpleIndex.com – Dynamic OCR
- SimpleOCR.com – OCR Guide
- SimpleIndex Wiki – OCR
- SimpleIndex Wiki – OCR Options
- SimpleIndex Wiki – Zone OCR
- SimpleIndex Wiki – Full Page OCR
- SimpleIndex Wiki – Zones & OCR Settings
- SimpleIndex Wiki – OCR to Field
- SimpleIndex Wiki – OCR Text View
- SimpleIndex Wiki – Template & Dictionary Matching OCR
- SimpleIndex Wiki – OMR and OCR Document Separation
Yes. On the OCR step of the Job Settings Wizard you can select the text output format need in the “Full-page OCR file type” drop down. By default it is set to PDF, but can be changed to Text (txt), Word (docx), Rich Text (rtf), Open Office (odt), Excel (xlsx), PowerPoint (pptx), ePub Zip (epub), FictionBook (fb2), HTML (htm), XML (xml) or Alto XML (alto.xml).
If the output file type is set to PDF, OCR text will be embedded as hidden text in the PDF file.
Yes, it can. You can configure this setting in the Job Settings Wizard by going to the OCR step and checking “Enable full-page OCR”. There are many settings in the OCR step that you can used to customize the output and recognition of images.
SimpleIndex has two different OCR engines (Standard and Professional) that can be used to produced PDF Image + Text files or Searchable PDFs.