PDF Processing Settings

From Simple Wiki

Back to Settings Wizard

SimpleIndex Simple Setup Configuration Wizard Import Settings Screen
PDF Processing Settings Screen

The PDF Processing options determine how PDF files will be converted on input or whenever it must be rasterized for OCR, OMR and barcode recognition.

PDF Processing Training Video[edit | edit source]

Video was recorded in a previous version of SimpleIndex. Refer to the wiki documentation for latest updates.

PDF to Image Conversion[edit | edit source]

In order to perform some operations on PDF files, they must first be converted to TIFF images. SimpleIndex will automatically convert PDF to TIFF for OMR and barcode recognition on-the-fly. If you are upgrading from an earlier version of SimpleIndex you no longer need to use this option for most job configurations.

For image-only PDFs, it is much faster to have the embedded images “extracted” from the PDF. The TIFF images can be converted back to PDF without loss of quality or data by setting the Image File Type option to PDF.

Select Convert to TIFF (Detect Color) to automatically detect color versus black and white images and save them accordingly. Use Convert to B&W or Convert to Color to convert all pages to one format or the other.

PDF Resolution (X,Y)[edit | edit source]

When converting PDF to TIFF, these settings indicate the output resolution of the TIFF image. The first indicates the X or horizontal DPI (dots per inch) setting. The second indicates the Y or vertical DPI.

When saving embedded images from a PDF file, a default resolution of 300dpi is assumed for images found in the PDF files. Use these settings to override the X and Y resolution settings when extracting PDF images. If these settings do not match the originals, output images will show incorrect page dimensions.

Prior to version 10, the default resolution for PDF files was 200dpi, but this has been increased in order to improve OCR and barcode recognition from PDF using the default settings.

The viewer automatically samples images at a lower resolution (96dpi) for viewing in order to optimize performance and memory usage.

Convert Files to PDF[edit | edit source]

Converts all MS Office, HTML, XML, text, and image files in the Input folder to PDF before processing. Files are converted and saved in the Input folder before the import step.

If Keep Input Files is unchecked, the original files are deleted following conversion. To avoid having both copies of a file imported after conversion, use the Input File Types option.

This option requires MS Office 2003 or above or OpenOffice 3 or above to work. OpenOffice is available for free at OpenOffice.org.

Automatically repair corrupt and non-compliant PDF files is enabled by default. This prevents most errors during batch processing whenever non-standard PDF files cannot be read.

There are thousands of applications that generate PDF files and many of these do not fully conform to the PDF standard. The repair function corrects the internal PDF structures allowing the images and text to be processed. However, in rare cases this can result in the loss of graphic elements like form fields or annotations.

Encryption/Decryption Password[edit | edit source]

Use the following options when importing or exporting password-protected PDF files.

Password[edit | edit source]

The password entered here is used to decrypt password-protected PDF files when importing and processing. The password will also be used to encrypt PDF files on export, if that option is select.

Use Password from User Login[edit | edit source]

When working with PDF files that have different passwords, this option is useful as it allows the user to enter a password when signing on to SimpleIndex and have that password used for that session. All files processed in a single batch must have the same password, but this option allows you to quickly change the password between batches without having to edit the job options.

Encrypt PDF on Export[edit | edit source]

All PDF files are encrypted with the current password when the batch is exported.

Next Step File Output Settings