1 | '\" |
---|
2 | '\" Copyright (c) 1998 by Scriptics Corporation. |
---|
3 | '\" |
---|
4 | '\" See the file "license.terms" for information on usage and redistribution |
---|
5 | '\" of this file, and for a DISCLAIMER OF ALL WARRANTIES. |
---|
6 | '\" |
---|
7 | '\" RCS: @(#) $Id: encoding.n,v 1.15 2007/12/13 15:22:32 dgp Exp $ |
---|
8 | '\" |
---|
9 | .so man.macros |
---|
10 | .TH encoding n "8.1" Tcl "Tcl Built-In Commands" |
---|
11 | .BS |
---|
12 | .SH NAME |
---|
13 | encoding \- Manipulate encodings |
---|
14 | .SH SYNOPSIS |
---|
15 | \fBencoding \fIoption\fR ?\fIarg arg ...\fR? |
---|
16 | .BE |
---|
17 | |
---|
18 | .SH INTRODUCTION |
---|
19 | .PP |
---|
20 | Strings in Tcl are encoded using 16-bit Unicode characters. Different |
---|
21 | operating system interfaces or applications may generate strings in |
---|
22 | other encodings such as Shift-JIS. The \fBencoding\fR command helps |
---|
23 | to bridge the gap between Unicode and these other formats. |
---|
24 | .SH DESCRIPTION |
---|
25 | .PP |
---|
26 | Performs one of several encoding related operations, depending on |
---|
27 | \fIoption\fR. The legal \fIoption\fRs are: |
---|
28 | .TP |
---|
29 | \fBencoding convertfrom\fR ?\fIencoding\fR? \fIdata\fR |
---|
30 | Convert \fIdata\fR to Unicode from the specified \fIencoding\fR. The |
---|
31 | characters in \fIdata\fR are treated as binary data where the lower |
---|
32 | 8-bits of each character is taken as a single byte. The resulting |
---|
33 | sequence of bytes is treated as a string in the specified |
---|
34 | \fIencoding\fR. If \fIencoding\fR is not specified, the current |
---|
35 | system encoding is used. |
---|
36 | .TP |
---|
37 | \fBencoding convertto\fR ?\fIencoding\fR? \fIstring\fR |
---|
38 | Convert \fIstring\fR from Unicode to the specified \fIencoding\fR. |
---|
39 | The result is a sequence of bytes that represents the converted |
---|
40 | string. Each byte is stored in the lower 8-bits of a Unicode |
---|
41 | character. If \fIencoding\fR is not specified, the current |
---|
42 | system encoding is used. |
---|
43 | .TP |
---|
44 | \fBencoding dirs\fR ?\fIdirectoryList\fR? |
---|
45 | .VS 8.5 |
---|
46 | Tcl can load encoding data files from the file system that describe |
---|
47 | additional encodings for it to work with. This command sets the search |
---|
48 | path for \fB*.enc\fR encoding data files to the list of directories |
---|
49 | \fIdirectoryList\fR. If \fIdirectoryList\fR is omitted then the |
---|
50 | command returns the current list of directories that make up the |
---|
51 | search path. It is an error for \fIdirectoryList\fR to not be a valid |
---|
52 | list. If, when a search for an encoding data file is happening, an |
---|
53 | element in \fIdirectoryList\fR does not refer to a readable, |
---|
54 | searchable directory, that element is ignored. |
---|
55 | .VE 8.5 |
---|
56 | .TP |
---|
57 | \fBencoding names\fR |
---|
58 | Returns a list containing the names of all of the encodings that are |
---|
59 | currently available. |
---|
60 | .TP |
---|
61 | \fBencoding system\fR ?\fIencoding\fR? |
---|
62 | Set the system encoding to \fIencoding\fR. If \fIencoding\fR is |
---|
63 | omitted then the command returns the current system encoding. The |
---|
64 | system encoding is used whenever Tcl passes strings to system calls. |
---|
65 | .SH EXAMPLE |
---|
66 | .PP |
---|
67 | It is common practice to write script files using a text editor that |
---|
68 | produces output in the euc-jp encoding, which represents the ASCII |
---|
69 | characters as singe bytes and Japanese characters as two bytes. This |
---|
70 | makes it easy to embed literal strings that correspond to non-ASCII |
---|
71 | characters by simply typing the strings in place in the script. |
---|
72 | However, because the \fBsource\fR command always reads files using the |
---|
73 | current system encoding, Tcl will only source such files correctly |
---|
74 | when the encoding used to write the file is the same. This tends not |
---|
75 | to be true in an internationalized setting. For example, if such a |
---|
76 | file was sourced in North America (where the ISO8859-1 is normally |
---|
77 | used), each byte in the file would be treated as a separate character |
---|
78 | that maps to the 00 page in Unicode. The resulting Tcl strings will |
---|
79 | not contain the expected Japanese characters. Instead, they will |
---|
80 | contain a sequence of Latin-1 characters that correspond to the bytes |
---|
81 | of the original string. The \fBencoding\fR command can be used to |
---|
82 | convert this string to the expected Japanese Unicode characters. For |
---|
83 | example, |
---|
84 | .CS |
---|
85 | set s [\fBencoding convertfrom\fR euc-jp "\exA4\exCF"] |
---|
86 | .CE |
---|
87 | would return the Unicode string |
---|
88 | .QW "\eu306F" , |
---|
89 | which is the Hiragana letter HA. |
---|
90 | |
---|
91 | .SH "SEE ALSO" |
---|
92 | Tcl_GetEncoding(3) |
---|
93 | |
---|
94 | .SH KEYWORDS |
---|
95 | encoding |
---|