Index With Non-Latin Character Sets
Please refer to the Wiki Documentation for the complete Languages reference.
By default SimpleIndex uses the ANSI character set to display and edit captured ocr data, index field values and full-text OCR. This works for all languages based on the Latin alphabet (English, French, Spanish, German, etc.)
To index documents in other languages like Chinese, Japanese, Russian, Arabic and other non-Latin alphabets, set the default character set using this registry key. If the key is not set correctly then Unicode text will show up as ??????????.
Use Notepad to edit the “Charset” value from the sample setting below and save it to a .reg file. Then double-click the .reg file to install (Administrator privileges required).
You can download the .reg file here but you still need to edit in Notepad to set the Charset value before installing.
If you are on a 32-bit operating system be sure to remove the extra “\WOW6432Node” from the registry path.
[HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\SimpleIndex\Misc]“Charset”=”1”
Charset Name | Charset Value |
ANSI_CHARSET (Latin) | 0 |
DEFAULT_CHARSET | 1 |
SYMBOL_CHARSET | 2 |
SHIFTJIS_CHARSET (Japanese) | 128 |
HANGUL_CHARSET (Korean) | 129 |
GB2312_CHARSET (Simplified Chinese) | 134 |
CHINESEBIG5_CHARSET (Chinese) | 136 |
GREEK_CHARSET (Greek) | 161 |
TURKISH_CHARSET (Turkish) | 162 |
HEBREW_CHARSET (Hebrew) | 177 |
ARABIC_CHARSET (Arabic) | 178 |
BALTIC_CHARSET (Baltic) | 186 |
RUSSIAN_CHARSET (Russian) | 204 |
THAI_CHARSET (Thai) | 222 |
EE_CHARSET | 238 |
OEM_CHARSET | 255 |

The full list of values is at https://msdn.microsoft.com/en-us/library/cc194829.aspx.