Specify E-Mail Character Set
$email_charset charset [[encoding [headerencoding]]
[In version 7 of CopiaFacts, this command simply specified the character set which is used for the supplied header text and for the body of the message.]
This command causes the body text and headers of the e-mail to be converted (from UTF-8) to the specified character set.
The parameters on this command are used as follows:
|charset ||character set name, which must be one of the following IANA names. Note that some of the more obscure names, even if shown at the IANA link below, may not be supported in all mail clients, and CopiaFacts may also not be able to find a converter supported by Windows. We recommend testing any unusual charset names.|
|utf-8||This is the default, and is also selected if this required parameter is set to "". This keyword is required for UTF-8 encoding when you also need to specify the base64 keyword described below.|
|us-ascii||Allows only 7-bit characters|
|iso-8859-1||(Latin1, West European) This will replicate the output of CopiaFacts version 7 and will actually use Windows code page 1252. In most e-mail applications "iso-8859-1" is typically used to describe Windows code page 1252 though the encodings are not identical. Latin1 covers most West European languages, such as French, Spanish, Catalan, Basque, Portuguese, Italian, Albanian, Rhaeto-Romanic, Dutch, German, Danish, Swedish, Norwegian, Finnish, Faroese, Icelandic, Irish, Scottish, and English, incidentally also Afrikaans and Swahili, thus in effect also the entire American continent, Australia and much of Africa.|
|iso-8859-2||(Latin2, Central and Eastern Europe) Latin2 covers Czech, Hungarian, Polish, Romanian, Croatian, Slovak, Slovenian. Windows code page 28592.|
|iso-8859-3||(Latin3, Southern Europe) Latin3 covers Maltese and some Turkish, and is also used for Esperanto. Windows code page 28593.|
|iso-8859-4||(Baltic) Covers Latvian, Lithuanian, Greenlandic, Lappish. Windows code page 28594.|
|iso-8859-5||(Cyrillic) Covers Bulgarian, Byelorussian, Macedonian, Russian, Serbian,Ukrainian. Windows code page 28595.|
|iso-8859-6||(Arabic) Covers basic Arabic only. Windows code page 28596.|
|iso-8859-8||(Hebrew) Covers Hebrew and Yiddish. Windows code page 28598.|
|iso-8859-9||(Latin5) Turkish. Windows code page 28599.|
|iso-2022-jp||ISO 2022 Japanese with no half-width Katakana; Japanese (JIS).|
|iso-2022-kr||ISO 2022 Korean.|
|x-cp50227||ISO 2022 Simplified Chinese.|
|shift-jis||(Japan) Windows code page 932 (note that there is no official IANA name 'windows-932').|
|gb2312||(China) Windows code page 936 (note that there is no official IANA name 'windows-936').|
|big5||(Taiwan) Windows code page 950 (note that there is no official IANA name 'windows-950').windows-1250 (Central Europe) This is a superset of iso-8859-2.|
|windows-...||You may use some of the Microsoft Windows Code Page identifiers, for example 'windows-874' for Thai. Many of these code pages are not appropriate for e-mail.|
|encoding||One of the following keywords to indicate how body text and HTML attachments are to be encoded when it contains characters with the high-bit set. If not provided, the default is to use quoted-printable encoding if fewer than half of the characters have high-bit set and base64 encoding if more than half have high-bit set or if non-printable ASCII characters are present. If a headerencoding parameter is not provided, this parameter also affects the encoding of header lines, but it does not affect non-HTML attachments, which are always base64-encoded. See also the encoding options of $email_options.|
|base64||Base64 encoding is to be used if any non transportable characters are found.|
|q-p||Quoted-Printable encoding is to be used if any non transportable characters are found.|
|8bit||No encoding is to be used and the 8-bit characters are to be transmitted.|
|headerencoding||Use one of the above keywords to indicate how header text is to be encoded when it contains characters with the high-bit set. If not provided, the default is to use the encoding parameter.|
For further information on MIME character sets and encoding, consult the documents at the following links:
If the body text contains no high-bit characters, the MIME charset is specified as "us-ascii" and the charset value on this command is ignored.
CopiaFacts does not check that a single 'encoded-word' length on e-mail headers does not exceed the 75-character limit specified by RFC2047. Many mail clients will allow a longer length, but in general you should avoid for example an e-mail subject with a long sequence of non-ASCII characters and no embedded white space.
Special note: The Japanese 'wide dash' character (JIS 8160) has two possible Unicode equivalents, U+301C and U+FF5E (of which the first was accidentally defined upside-down in the Unicode standard). To handle this, some Windows codepage conversions allow one and some allow the other depending on how the conversion was done and what code page is used as the destination encoding. This often results in the output of a ? or conversion failure.
Some other Unicode characters may also have no equivalent in certain encodings, and in some cases the JIS standard conflicts with Unicode. When outputting a CJK (or any other) charset, you can set CopiaFacts variable FIX_UNICODE in FAXFACTS.CFG to allow recoding specific code points if a suitable similar glyph is available. The value of this variable consists of comma-separated triplets: each triplet contains the Unicode code point to be matched (hexadecimal), the target code page (decimal), and the replacement code point (hexadecimal). The default value of this variable is now "FF5E,50220,301C, 301C,932,FF5E, FF0D,50220,002D", but you may find other examples to add, in which case you should specify the additional values as well as the default triplets, if you need them. Please test your values.
$email_charset iso-8859-1 ; use iso-8859-1 (code page 1252) encoding
$email_charset gb2312 base64 ; use Chinese encoding with base64
$email_charset utf-8 base64 ; use utf-8 encoding with base64