File Type and Encoding

Even though all files are just a stream of bytes, it is often useful to distinguish text files from binary files.

Text files are always accessed using a reader, and binary files are always accessed using an InputStream. In a Locale (a system setting that includes the language, number formats, and character set in use) that uses multibyte characters (Japanese, Chinese, Korean), there is a default encoding used to convert bytes into characters. Most text files will be written using this encoding, but sometimes it is necessary to specify the encoding explicitly to use when converting between bytes and characters.

For example, in Japan files written on Microsoft Windows usually use the Shift_JIS (SJIS) encoding. Internally, Java always uses the Unicode character set and the UTF-16 encoding.

The type of a file parameter (text or binary) can be specified by:

fileValue.setDataType(FileValueType.TYPE_TEXT);

For binary data use FileValueType.TYPE_BINARY.

For text files you can also specify the encoding for the data. Specification is necessary only if your files contain characters outside the US-ASCII character set (the standard 7-bit characters used on all Linux computers), and the model can be run on computers that use different character sets or encodings. Unless both of these conditions are true, there is no point in setting the encoding, since ASCII text files are the same in all encodings and a file written in the computer’s default encoding can always be read correctly using the computers default encoding.

There is one special encoding “(Automatic Local).” This string is available in the constant FileValueType.LOCAL_ENCODING. Automatic local is the default encoding if none is specifically set, and it indicates that the text is to be read or written in whatever encoding the computer executing the component uses by default.

There are actually two encodings for a TYPE_TEXT file parameter, one associated with the Handler and copied during parameter mapping and one associated with the Local Name and not copied during parameter mapping. If these encodings are different, Isight will convert the file from one to the other while copying the data to/from the Working directory. The Handler encoding is set with method setDataEncoding(String), and the Local Name encoding is set with method setReadWriteEncoding(String). For example:

fileValue.setDataEncoding("SJIS");
fileValue.setReadWriteEncoding(FileValueType.LOCAL_ENCODING);

This setting indicates that the data in the handler are encoded using the Shift JIS character set, but the file will be converted into the computer’s default encoding when copying the file into the Working directory.

You can see the list of supported encodings on the Files tab of the Design Gateway. You must follow the following procedure to see the list:

  1. Start the Design Gateway.

  2. Select Preferences from the Edit menu.

    The Preferences dialog box appears.

  3. Click the Parameters option on the left side of the dialog box.

  4. Click the Show File Type encoding on the Files Tab check box, and click OK to close the Preferences dialog box.

  5. Click the Files tab on the Design Gateway.

  6. Click Add Parameter, and create an INPUT file parameter.

  7. Select the newly created parameter.

    In the Read From area at the bottom of the tab there are lists (labeled Type and Encoding). The Encoding list button is probably not available.

  8. From the Type list, select Text. The Encoding list button is available.

  9. From the Encoding list, select the type you want to use. You may need to scroll down to see the entire list.