SimpleIndex - Processing Existing Files Video

From Simple Wiki

Processing Existing Files with SimpleIndex: A Step-by-Step Guide[edit | edit source]

This tutorial outlines how to process existing files using SimpleIndex, covering various settings for file input and email integration.

1. Accessing Job Configuration[edit | edit source]

  • Open the "barcode and autofill" job from the healthcare folder [00:30].
  • Navigate to the Job Configuration Wizard [00:42].
  • Go to File Input to access primary settings [00:47].

2. Configuring Input Folder[edit | edit source]

  • The input folder uses a relative file path (e.g., %config file folder%) [00:52], which means it automatically adjusts to different machines [01:06].
  • To change the input folder, click the Set button and browse to the desired location [02:30].
  • You can choose to use the relative path or a traditional file path [02:40].

3. Managing Input Files[edit | edit source]

  • Keep input files: Check this option to retain files in the input folder after processing [03:01]. By default, files are deleted [03:08]. This is useful for testing or when you don't want to move files [03:21].
  • Split multi-page files: Enable this to process each page of a multi-page PDF or TIFF individually [03:54]. Uncheck it if indexing is only needed from the first page [04:12].

4. Processing Subfolders[edit | edit source]

  • Process subfolders: Check this option to include files located in subfolders within the main input folder [05:29]. Each subfolder will be treated as a separate batch [05:35].
  • Remove empty folders: This option becomes available when "Process subfolders" is checked [06:29]. It cleans up the file system by deleting folders that become empty after files are processed [06:36].

5. Advanced Options[edit | edit source]

  • Click the drop-down menu to reveal more advanced settings [06:51].
  • Sort files by date: Processes files based on their date rather than file name [07:11].
  • Recompress images: Recompresses files to a common compression setting, useful for standardizing file sizes from different sources [07:27].
  • Resample images: Standardizes the resolution of images, ensuring zones for indexing align correctly across files scanned at different DPIs [07:59].
  • Sort folders by date: Processes subfolders based on their date [08:35]. This option is only available when "Process subfolders" is enabled [08:38].
  • Run the job until the input folder is empty: Processes all files in the folder, but in batches limited by the "Max files per batch" setting [08:52].
  • Max files per batch: Limits the number of files processed at one time to prevent overwhelming system memory [08:59].
  • Fast import: Disables extra checks and ignores information like sorting, recompression, and resampling to speed up file import [09:39]. It also prevents the system from reading full text from PDFs into memory if it's not needed for indexing [10:03].

6. Backup and Exception Handling[edit | edit source]

  • Backup/Exception folder: Specifies a location for backing up files or moving invalid files [10:34].
  • Backup all input files: Creates a copy of all input files in the backup location [10:54].
  • Move invalid files to backup folder: Moves corrupted or unreadable files to the backup folder, allowing the system to continue processing other documents [11:29].

7. File Type Processing[edit | edit source]

  • Types of files to process: Allows you to specify which file extensions the system will read [11:38]. You can remove or add file types to this list [11:49].

8. Email Input (New Feature)[edit | edit source]

  • Go back to the first step of the wizard where you define the file source [12:10].
  • Select Email as the input source [12:17].
  • Click Next to go to the Download Emails option [12:21].
  • Connect to a file server: Enter IMAP server details, username, and password [12:29].
  • Download email attachments to the input folder: Downloads attachments (e.g., PDFs) to the input folder for processing [12:52].
  • Convert message body to HTML file: Converts the email body into an HTML file for processing [13:05].
  • Convert message body to PDF: Converts the email body into a PDF for processing [13:23].
  • Save message headers and body to text files: Saves email metadata (sender, recipient, subject, etc.) to text files, which can be used for indexing [13:48].
  • Process unread messages: Only imports emails that have not been read yet [14:28].
  • Delete messages after successful import: Deletes emails from the server after they have been successfully imported [14:51].

For further assistance, refer to the Wiki available under the Support tab in SimpleIndex [15:12].