Planet
navi homePPSaboutscreenshotsdownloaddevelopmentforum

source: downloads/tcl8.5.2/doc/string.n @ 25

Last change on this file since 25 was 25, checked in by landauf, 17 years ago

added tcl to libs

File size: 15.9 KB
Line 
1.\"
2.\" Copyright (c) 1993 The Regents of the University of California.
3.\" Copyright (c) 1994-1996 Sun Microsystems, Inc.
4.\"
5.\" See the file "license.terms" for information on usage and redistribution
6.\" of this file, and for a DISCLAIMER OF ALL WARRANTIES.
7.\"
8.\" RCS: @(#) $Id: string.n,v 1.43 2007/12/13 15:22:33 dgp Exp $
9.\"
10.so man.macros
11.TH string n 8.1 Tcl "Tcl Built-In Commands"
12.BS
13.\" Note:  do not modify the .SH NAME line immediately below!
14.SH NAME
15string \- Manipulate strings
16.SH SYNOPSIS
17\fBstring \fIoption arg \fR?\fIarg ...?\fR
18.BE
19
20.SH DESCRIPTION
21.PP
22Performs one of several string operations, depending on \fIoption\fR.
23The legal \fIoption\fRs (which may be abbreviated) are:
24.TP
25\fBstring bytelength \fIstring\fR
26Returns a decimal string giving the number of bytes used to represent
27\fIstring\fR in memory.  Because UTF\-8 uses one to three bytes to
28represent Unicode characters, the byte length will not be the same as
29the character length in general.  The cases where a script cares about
30the byte length are rare.  In almost all cases, you should use the
31\fBstring length\fR operation (including determining the length of a
32Tcl ByteArray object).  Refer to the \fBTcl_NumUtfChars\fR manual
33entry for more details on the UTF\-8 representation.
34.TP
35\fBstring compare\fR ?\fB\-nocase\fR? ?\fB\-length int\fR? \fIstring1 string2\fR
36Perform a character-by-character comparison of strings \fIstring1\fR
37and \fIstring2\fR.  Returns \-1, 0, or 1, depending on whether
38\fIstring1\fR is lexicographically less than, equal to, or greater
39than \fIstring2\fR.  If \fB\-length\fR is specified, then only the
40first \fIlength\fR characters are used in the comparison.  If
41\fB\-length\fR is negative, it is ignored.  If \fB\-nocase\fR is
42specified, then the strings are compared in a case-insensitive manner.
43.TP
44\fBstring equal\fR ?\fB\-nocase\fR? ?\fB\-length int\fR? \fIstring1 string2\fR
45Perform a character-by-character comparison of strings \fIstring1\fR
46and \fIstring2\fR.  Returns 1 if \fIstring1\fR and \fIstring2\fR are
47identical, or 0 when not.  If \fB\-length\fR is specified, then only
48the first \fIlength\fR characters are used in the comparison.  If
49\fB\-length\fR is negative, it is ignored.  If \fB\-nocase\fR is
50specified, then the strings are compared in a case-insensitive manner.
51.TP
52\fBstring first \fIneedleString haystackString\fR ?\fIstartIndex\fR?
53Search \fIhaystackString\fR for a sequence of characters that exactly match
54the characters in \fIneedleString\fR.  If found, return the index of the
55first character in the first such match within \fIhaystackString\fR.  If not
56found, return \-1.  If \fIstartIndex\fR is specified (in any of the
57forms accepted by the \fBindex\fR method), then the search is
58constrained to start with the character in \fIhaystackString\fR specified by
59the index.  For example,
60.RS
61.CS
62\fBstring first a 0a23456789abcdef 5\fR
63.CE
64will return \fB10\fR, but
65.CS
66\fBstring first a 0123456789abcdef 11\fR
67.CE
68will return \fB\-1\fR.
69.RE
70.TP
71\fBstring index \fIstring charIndex\fR
72Returns the \fIcharIndex\fR'th character of the \fIstring\fR argument.
73A \fIcharIndex\fR of 0 corresponds to the first character of the
74string.  \fIcharIndex\fR may be specified as follows:
75.VS 8.5
76.RS
77.IP \fIinteger\fR 10
78For any index value that passes \fBstring is integer -strict\fR,
79the char specified at this integral index
80(e.g. \fB2\fR would refer to the
81.QW c
82in
83.QW abcd ).
84.IP \fBend\fR 10
85The last char of the string
86(e.g. \fBend\fR would refer to the
87.QW d
88in
89.QW abcd ).
90.IP \fBend\fR\-\fIN\fR 10
91The last char of the string minus the specified integer offset \fIN\fR
92(e.g. \fBend\fR\-1 would refer to the
93.QW c
94in
95.QW abcd ).
96.IP \fBend\fR+\fIN\fR 10
97The last char of the string plus the specified integer offset \fIN\fR
98(e.g. \fBend\fR+\-1 would refer to the
99.QW c
100in
101.QW abcd ).
102.IP \fIM\fR+\fIN\fR 10
103The char specified at the integral index that is the sum of
104integer values \fIM\fR and \fIN\fR
105(e.g. \fB1+1\fR would refer to the
106.QW c
107in
108.QW abcd ).
109.IP \fIM\fR\-\fIN\fR 10
110The char specified at the integral index that is the difference of
111integer values \fIM\fR and \fIN\fR
112(e.g. \fB2\-1\fR would refer to the
113.QW b
114in
115.QW abcd ).
116.PP
117In the specifications above, the integer value \fIM\fR contains no
118trailing whitespace and the integer value \fIN\fR contains no
119leading whitespace.
120.PP
121If \fIcharIndex\fR is less than 0 or greater than or equal to the
122length of the string then this command returns an empty string.
123.RE
124.VE
125.TP
126\fBstring is \fIclass\fR ?\fB\-strict\fR? ?\fB\-failindex \fIvarname\fR? \fIstring\fR
127Returns 1 if \fIstring\fR is a valid member of the specified character
128class, otherwise returns 0.  If \fB\-strict\fR is specified, then an
129empty string returns 0, otherwise an empty string will return 1 on
130any class.  If \fB\-failindex\fR is specified, then if the function
131returns 0, the index in the string where the class was no longer valid
132will be stored in the variable named \fIvarname\fR.  The \fIvarname\fR
133will not be set if the function returns 1.  The following character
134classes are recognized (the class name can be abbreviated):
135.RS
136.IP \fBalnum\fR 12
137Any Unicode alphabet or digit character.
138.IP \fBalpha\fR 12
139Any Unicode alphabet character.
140.IP \fBascii\fR 12
141Any character with a value less than \eu0080 (those that are in the
1427\-bit ascii range).
143.IP \fBboolean\fR 12
144Any of the forms allowed to \fBTcl_GetBoolean\fR.
145.IP \fBcontrol\fR 12
146Any Unicode control character.
147.IP \fBdigit\fR 12
148Any Unicode digit character.  Note that this includes characters
149outside of the [0\-9] range.
150.IP \fBdouble\fR 12
151Any of the valid forms for a double in Tcl, with optional surrounding
152whitespace.  In case of under/overflow in the value, 0 is returned and
153the \fIvarname\fR will contain \-1.
154.IP \fBfalse\fR 12
155Any of the forms allowed to \fBTcl_GetBoolean\fR where the value is
156false.
157.IP \fBgraph\fR 12
158Any Unicode printing character, except space.
159.IP \fBinteger\fR 12
160Any of the valid string formats for a 32-bit integer value in Tcl,
161with optional surrounding whitespace.  In case of under/overflow in
162the value, 0 is returned and the \fIvarname\fR will contain \-1.
163.IP \fBlist\fR 12
164Any proper list structure, with optional surrounding whitespace. In
165case of improper list structure, 0 is returned and the \fIvarname\fR
166will contain the index of the
167.QW element
168where the list parsing fails, or \-1 if this cannot be determined.
169.IP \fBlower\fR 12
170Any Unicode lower case alphabet character.
171.IP \fBprint\fR 12
172Any Unicode printing character, including space.
173.IP \fBpunct\fR 12
174Any Unicode punctuation character.
175.IP \fBspace\fR 12
176Any Unicode space character.
177.IP \fBtrue\fR 12
178Any of the forms allowed to \fBTcl_GetBoolean\fR where the value is
179true.
180.IP \fBupper\fR 12
181Any upper case alphabet character in the Unicode character set.
182.VS 8.5
183.IP \fBwideinteger\fR 12
184Any of the valid forms for a wide integer in Tcl, with optional
185surrounding whitespace.  In case of under/overflow in the value, 0 is
186returned and the \fIvarname\fR will contain \-1.
187.VE 8.5
188.IP \fBwordchar\fR 12
189Any Unicode word character.  That is any alphanumeric character, and
190any Unicode connector punctuation characters (e.g. underscore).
191.IP \fBxdigit\fR 12
192Any hexadecimal digit character ([0\-9A\-Fa\-f]).
193.PP
194In the case of \fBboolean\fR, \fBtrue\fR and \fBfalse\fR, if the
195function will return 0, then the \fIvarname\fR will always be set to
1960, due to the varied nature of a valid boolean value.
197.RE
198.TP
199\fBstring last \fIneedleString haystackString\fR ?\fIlastIndex\fR?
200Search \fIhaystackString\fR for a sequence of characters that exactly match
201the characters in \fIneedleString\fR.  If found, return the index of the
202first character in the last such match within \fIhaystackString\fR.  If there
203is no match, then return \-1.  If \fIlastIndex\fR is specified (in any
204of the forms accepted by the \fBindex\fR method), then only the
205characters in \fIhaystackString\fR at or before the specified \fIlastIndex\fR
206will be considered by the search.  For example,
207.RS
208.CS
209\fBstring last a 0a23456789abcdef 15\fR
210.CE
211will return \fB10\fR, but
212.CS
213\fBstring last a 0a23456789abcdef 9\fR
214.CE
215will return \fB1\fR.
216.RE
217.TP
218\fBstring length \fIstring\fR
219Returns a decimal string giving the number of characters in
220\fIstring\fR.  Note that this is not necessarily the same as the
221number of bytes used to store the string.  If the object is a
222ByteArray object (such as those returned from reading a binary encoded
223channel), then this will return the actual byte length of the object.
224.TP
225\fBstring map\fR ?\fB\-nocase\fR? \fImapping string\fR
226Replaces substrings in \fIstring\fR based on the key-value pairs in
227\fImapping\fR.  \fImapping\fR is a list of \fIkey value key value ...\fR
228as in the form returned by \fBarray get\fR.  Each instance of a
229key in the string will be replaced with its corresponding value.  If
230\fB\-nocase\fR is specified, then matching is done without regard to
231case differences. Both \fIkey\fR and \fIvalue\fR may be multiple
232characters.  Replacement is done in an ordered manner, so the key
233appearing first in the list will be checked first, and so on.
234\fIstring\fR is only iterated over once, so earlier key replacements
235will have no affect for later key matches.  For example,
236.RS
237.CS
238\fBstring map {abc 1 ab 2 a 3 1 0} 1abcaababcabababc\fR
239.CE
240will return the string \fB01321221\fR.
241.PP
242Note that if an earlier \fIkey\fR is a prefix of a later one, it will
243completely mask the later one.  So if the previous example is
244reordered like this,
245.CS
246\fBstring map {1 0 ab 2 a 3 abc 1} 1abcaababcabababc\fR
247.CE
248it will return the string \fB02c322c222c\fR.
249.RE
250.TP
251\fBstring match\fR ?\fB\-nocase\fR? \fIpattern\fR \fIstring\fR
252See if \fIpattern\fR matches \fIstring\fR; return 1 if it does, 0 if
253it does not.  If \fB\-nocase\fR is specified, then the pattern attempts
254to match against the string in a case insensitive manner.  For the two
255strings to match, their contents must be identical except that the
256following special sequences may appear in \fIpattern\fR:
257.RS
258.IP \fB*\fR 10
259Matches any sequence of characters in \fIstring\fR, including a null
260string.
261.IP \fB?\fR 10
262Matches any single character in \fIstring\fR.
263.IP \fB[\fIchars\fB]\fR 10
264Matches any character in the set given by \fIchars\fR.  If a sequence
265of the form \fIx\fB\-\fIy\fR appears in \fIchars\fR, then any
266character between \fIx\fR and \fIy\fR, inclusive, will match.  When
267used with \fB\-nocase\fR, the end points of the range are converted to
268lower case first.  Whereas {[A\-z]} matches
269.QW _
270when matching case-sensitively (since
271.QW _
272falls between the
273.QW Z
274and
275.QW a ),
276with \fB\-nocase\fR this is considered like {[A\-Za\-z]} (and
277probably what was meant in the first place).
278.IP \fB\e\fIx\fR 10
279Matches the single character \fIx\fR.  This provides a way of avoiding
280the special interpretation of the characters \fB*?[]\e\fR in
281\fIpattern\fR.
282.RE
283.TP
284\fBstring range \fIstring first last\fR
285Returns a range of consecutive characters from \fIstring\fR, starting
286with the character whose index is \fIfirst\fR and ending with the
287character whose index is \fIlast\fR. An index of 0 refers to the first
288character of the string.  \fIfirst\fR and \fIlast\fR may be specified
289as for the \fBindex\fR method.  If \fIfirst\fR is less than zero then
290it is treated as if it were zero, and if \fIlast\fR is greater than or
291equal to the length of the string then it is treated as if it were
292\fBend\fR.  If \fIfirst\fR is greater than \fIlast\fR then an empty
293string is returned.
294.TP
295\fBstring repeat \fIstring count\fR
296Returns \fIstring\fR repeated \fIcount\fR number of times.
297.TP
298\fBstring replace \fIstring first last\fR ?\fInewstring\fR?
299Removes a range of consecutive characters from \fIstring\fR, starting
300with the character whose index is \fIfirst\fR and ending with the
301character whose index is \fIlast\fR.  An index of 0 refers to the
302first character of the string.  \fIFirst\fR and \fIlast\fR may be
303specified as for the \fBindex\fR method.  If \fInewstring\fR is
304specified, then it is placed in the removed character range.  If
305\fIfirst\fR is less than zero then it is treated as if it were zero,
306and if \fIlast\fR is greater than or equal to the length of the string
307then it is treated as if it were \fBend\fR.  If \fIfirst\fR is greater
308than \fIlast\fR or the length of the initial string, or \fIlast\fR is
309less than 0, then the initial string is returned untouched.
310.VS 8.5
311.TP
312\fBstring reverse \fIstring\fR
313Returns a string that is the same length as \fIstring\fR but with its
314characters in the reverse order.
315.VE 8.5
316.TP
317\fBstring tolower \fIstring\fR ?\fIfirst\fR? ?\fIlast\fR?
318Returns a value equal to \fIstring\fR except that all upper (or title)
319case letters have been converted to lower case.  If \fIfirst\fR is
320specified, it refers to the first char index in the string to start
321modifying.  If \fIlast\fR is specified, it refers to the char index in
322the string to stop at (inclusive).  \fIfirst\fR and \fIlast\fR may be
323specified as for the \fBindex\fR method.
324.TP
325\fBstring totitle \fIstring\fR ?\fIfirst\fR? ?\fIlast\fR?
326Returns a value equal to \fIstring\fR except that the first character
327in \fIstring\fR is converted to its Unicode title case variant (or
328upper case if there is no title case variant) and the rest of the
329string is converted to lower case.  If \fIfirst\fR is specified, it
330refers to the first char index in the string to start modifying.  If
331\fIlast\fR is specified, it refers to the char index in the string to
332stop at (inclusive).  \fIfirst\fR and \fIlast\fR may be specified as
333for the \fBindex\fR method.
334.TP
335\fBstring toupper \fIstring\fR ?\fIfirst\fR? ?\fIlast\fR?
336Returns a value equal to \fIstring\fR except that all lower (or title)
337case letters have been converted to upper case.  If \fIfirst\fR is
338specified, it refers to the first char index in the string to start
339modifying.  If \fIlast\fR is specified, it refers to the char index in
340the string to stop at (inclusive).  \fIfirst\fR and \fIlast\fR may be
341specified as for the \fBindex\fR method.
342.TP
343\fBstring trim \fIstring\fR ?\fIchars\fR?
344Returns a value equal to \fIstring\fR except that any leading or
345trailing characters present in the string given by \fIchars\fR are removed.  If
346\fIchars\fR is not specified then white space is removed (spaces,
347tabs, newlines, and carriage returns).
348.TP
349\fBstring trimleft \fIstring\fR ?\fIchars\fR?
350Returns a value equal to \fIstring\fR except that any leading
351characters present in the string given by \fIchars\fR are removed.  If
352\fIchars\fR is not specified then white space is removed (spaces,
353tabs, newlines, and carriage returns).
354.TP
355\fBstring trimright \fIstring\fR ?\fIchars\fR?
356Returns a value equal to \fIstring\fR except that any trailing
357characters present in the string given by \fIchars\fR are removed.  If
358\fIchars\fR is not specified then white space is removed (spaces,
359tabs, newlines, and carriage returns).
360.TP
361\fBstring wordend \fIstring charIndex\fR
362Returns the index of the character just after the last one in the word
363containing character \fIcharIndex\fR of \fIstring\fR.  \fIcharIndex\fR
364may be specified as for the \fBindex\fR method.  A word is
365considered to be any contiguous range of alphanumeric (Unicode letters
366or decimal digits) or underscore (Unicode connector punctuation)
367characters, or any single character other than these.
368.TP
369\fBstring wordstart \fIstring charIndex\fR
370Returns the index of the first character in the word containing
371character \fIcharIndex\fR of \fIstring\fR.  \fIcharIndex\fR may be
372specified as for the \fBindex\fR method.  A word is considered to be any
373contiguous range of alphanumeric (Unicode letters or decimal digits)
374or underscore (Unicode connector punctuation) characters, or any
375single character other than these.
376.SH EXAMPLE
377Test if the string in the variable \fIstring\fR is a proper non-empty
378prefix of the string \fBfoobar\fR.
379.CS
380set length [\fBstring length\fR $string]
381if {$length == 0} {
382   set isPrefix 0
383} else {
384   set isPrefix [\fBstring equal\fR -length $length $string "foobar"]
385}
386.CE
387
388.SH "SEE ALSO"
389expr(n), list(n)
390
391.SH KEYWORDS
392case conversion, compare, index, match, pattern, string, word, equal,
393ctype, character, reverse
394
395.\" Local Variables:
396.\" mode: nroff
397.\" End:
Note: See TracBrowser for help on using the repository browser.