KickJava   Java API By Example, From Geeks To Geeks.

Java > Open Source Codes > com > ibm > icu > text > RuleBasedTransliterator


1 /*
2  *******************************************************************************
3  * Copyright (C) 1996-2006, International Business Machines Corporation and *
4  * others. All Rights Reserved. *
5  *******************************************************************************
6  */

7 package com.ibm.icu.text;
8
9 import java.util.Hashtable JavaDoc;
10
11
12 /**
13  * <code>RuleBasedTransliterator</code> is a transliterator
14  * that reads a set of rules in order to determine how to perform
15  * translations. Rule sets are stored in resource bundles indexed by
16  * name. Rules within a rule set are separated by semicolons (';').
17  * To include a literal semicolon, prefix it with a backslash ('\').
18  * Whitespace, as defined by <code>UCharacterProperty.isRuleWhiteSpace()</code>,
19  * is ignored. If the first non-blank character on a line is '#',
20  * the entire line is ignored as a comment. </p>
21  *
22  * <p>Each set of rules consists of two groups, one forward, and one
23  * reverse. This is a convention that is not enforced; rules for one
24  * direction may be omitted, with the result that translations in
25  * that direction will not modify the source text. In addition,
26  * bidirectional forward-reverse rules may be specified for
27  * symmetrical transformations.</p>
28  *
29  * <p><b>Rule syntax</b> </p>
30  *
31  * <p>Rule statements take one of the following forms: </p>
32  *
33  * <dl>
34  * <dt><code>$alefmadda=?;</code></dt>
35  * <dd><strong>Variable definition.</strong> The name on the
36  * left is assigned the text on the right. In this example,
37  * after this statement, instances of the left hand name,
38  * &quot;<code>$alefmadda</code>&quot;, will be replaced by
39  * the Unicode character U+0622. Variable names must begin
40  * with a letter and consist only of letters, digits, and
41  * underscores. Case is significant. Duplicate names cause
42  * an exception to be thrown, that is, variables cannot be
43  * redefined. The right hand side may contain well-formed
44  * text of any length, including no text at all (&quot;<code>$empty=;</code>&quot;).
45  * The right hand side may contain embedded <code>UnicodeSet</code>
46  * patterns, for example, &quot;<code>$softvowel=[eiyEIY]</code>&quot;.</dd>
47  * <dd>&nbsp;</dd>
48  * <dt><code>ai&gt;$alefmadda;</code></dt>
49  * <dd><strong>Forward translation rule.</strong> This rule
50  * states that the string on the left will be changed to the
51  * string on the right when performing forward
52  * transliteration.</dd>
53  * <dt>&nbsp;</dt>
54  * <dt><code>ai&lt;$alefmadda;</code></dt>
55  * <dd><strong>Reverse translation rule.</strong> This rule
56  * states that the string on the right will be changed to
57  * the string on the left when performing reverse
58  * transliteration.</dd>
59  * </dl>
60  *
61  * <dl>
62  * <dt><code>ai&lt;&gt;$alefmadda;</code></dt>
63  * <dd><strong>Bidirectional translation rule.</strong> This
64  * rule states that the string on the right will be changed
65  * to the string on the left when performing forward
66  * transliteration, and vice versa when performing reverse
67  * transliteration.</dd>
68  * </dl>
69  *
70  * <p>Translation rules consist of a <em>match pattern</em> and an <em>output
71  * string</em>. The match pattern consists of literal characters,
72  * optionally preceded by context, and optionally followed by
73  * context. Context characters, like literal pattern characters,
74  * must be matched in the text being transliterated. However, unlike
75  * literal pattern characters, they are not replaced by the output
76  * text. For example, the pattern &quot;<code>abc{def}</code>&quot;
77  * indicates the characters &quot;<code>def</code>&quot; must be
78  * preceded by &quot;<code>abc</code>&quot; for a successful match.
79  * If there is a successful match, &quot;<code>def</code>&quot; will
80  * be replaced, but not &quot;<code>abc</code>&quot;. The final '<code>}</code>'
81  * is optional, so &quot;<code>abc{def</code>&quot; is equivalent to
82  * &quot;<code>abc{def}</code>&quot;. Another example is &quot;<code>{123}456</code>&quot;
83  * (or &quot;<code>123}456</code>&quot;) in which the literal
84  * pattern &quot;<code>123</code>&quot; must be followed by &quot;<code>456</code>&quot;.
85  * </p>
86  *
87  * <p>The output string of a forward or reverse rule consists of
88  * characters to replace the literal pattern characters. If the
89  * output string contains the character '<code>|</code>', this is
90  * taken to indicate the location of the <em>cursor</em> after
91  * replacement. The cursor is the point in the text at which the
92  * next replacement, if any, will be applied. The cursor is usually
93  * placed within the replacement text; however, it can actually be
94  * placed into the precending or following context by using the
95  * special character '<code>@</code>'. Examples:</p>
96  *
97  * <blockquote>
98  * <p><code>a {foo} z &gt; | @ bar; # foo -&gt; bar, move cursor
99  * before a<br>
100  * {foo} xyz &gt; bar @@|; #&nbsp;foo -&gt; bar, cursor between
101  * y and z</code></p>
102  * </blockquote>
103  *
104  * <p><b>UnicodeSet</b></p>
105  *
106  * <p><code>UnicodeSet</code> patterns may appear anywhere that
107  * makes sense. They may appear in variable definitions.
108  * Contrariwise, <code>UnicodeSet</code> patterns may themselves
109  * contain variable references, such as &quot;<code>$a=[a-z];$not_a=[^$a]</code>&quot;,
110  * or &quot;<code>$range=a-z;$ll=[$range]</code>&quot;.</p>
111  *
112  * <p><code>UnicodeSet</code> patterns may also be embedded directly
113  * into rule strings. Thus, the following two rules are equivalent:</p>
114  *
115  * <blockquote>
116  * <p><code>$vowel=[aeiou]; $vowel&gt;'*'; # One way to do this<br>
117  * [aeiou]&gt;'*';
118  * &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#
119  * Another way</code></p>
120  * </blockquote>
121  *
122  * <p>See {@link UnicodeSet} for more documentation and examples.</p>
123  *
124  * <p><b>Segments</b></p>
125  *
126  * <p>Segments of the input string can be matched and copied to the
127  * output string. This makes certain sets of rules simpler and more
128  * general, and makes reordering possible. For example:</p>
129  *
130  * <blockquote>
131  * <p><code>([a-z]) &gt; $1 $1;
132  * &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#
133  * double lowercase letters<br>
134  * ([:Lu:]) ([:Ll:]) &gt; $2 $1; # reverse order of Lu-Ll pairs</code></p>
135  * </blockquote>
136  *
137  * <p>The segment of the input string to be copied is delimited by
138  * &quot;<code>(</code>&quot; and &quot;<code>)</code>&quot;. Up to
139  * nine segments may be defined. Segments may not overlap. In the
140  * output string, &quot;<code>$1</code>&quot; through &quot;<code>$9</code>&quot;
141  * represent the input string segments, in left-to-right order of
142  * definition.</p>
143  *
144  * <p><b>Anchors</b></p>
145  *
146  * <p>Patterns can be anchored to the beginning or the end of the text. This is done with the
147  * special characters '<code>^</code>' and '<code>$</code>'. For example:</p>
148  *
149  * <blockquote>
150  * <p><code>^ a&nbsp;&nbsp; &gt; 'BEG_A'; &nbsp;&nbsp;# match 'a' at start of text<br>
151  * &nbsp; a&nbsp;&nbsp; &gt; 'A';&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; # match other instances
152  * of 'a'<br>
153  * &nbsp; z $ &gt; 'END_Z'; &nbsp;&nbsp;# match 'z' at end of text<br>
154  * &nbsp; z&nbsp;&nbsp; &gt; 'Z';&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; # match other instances
155  * of 'z'</code></p>
156  * </blockquote>
157  *
158  * <p>It is also possible to match the beginning or the end of the text using a <code>UnicodeSet</code>.
159  * This is done by including a virtual anchor character '<code>$</code>' at the end of the
160  * set pattern. Although this is usually the match chafacter for the end anchor, the set will
161  * match either the beginning or the end of the text, depending on its placement. For
162  * example:</p>
163  *
164  * <blockquote>
165  * <p><code>$x = [a-z$]; &nbsp;&nbsp;# match 'a' through 'z' OR anchor<br>
166  * $x 1&nbsp;&nbsp;&nbsp; &gt; 2;&nbsp;&nbsp; # match '1' after a-z or at the start<br>
167  * &nbsp;&nbsp; 3 $x &gt; 4; &nbsp;&nbsp;# match '3' before a-z or at the end</code></p>
168  * </blockquote>
169  *
170  * <p><b>Example</b> </p>
171  *
172  * <p>The following example rules illustrate many of the features of
173  * the rule language. </p>
174  *
175  * <table border="0" cellpadding="4">
176  * <tr>
177  * <td valign="top">Rule 1.</td>
178  * <td valign="top" nowrap><code>abc{def}&gt;x|y</code></td>
179  * </tr>
180  * <tr>
181  * <td valign="top">Rule 2.</td>
182  * <td valign="top" nowrap><code>xyz&gt;r</code></td>
183  * </tr>
184  * <tr>
185  * <td valign="top">Rule 3.</td>
186  * <td valign="top" nowrap><code>yz&gt;q</code></td>
187  * </tr>
188  * </table>
189  *
190  * <p>Applying these rules to the string &quot;<code>adefabcdefz</code>&quot;
191  * yields the following results: </p>
192  *
193  * <table border="0" cellpadding="4">
194  * <tr>
195  * <td valign="top" nowrap><code>|adefabcdefz</code></td>
196  * <td valign="top">Initial state, no rules match. Advance
197  * cursor.</td>
198  * </tr>
199  * <tr>
200  * <td valign="top" nowrap><code>a|defabcdefz</code></td>
201  * <td valign="top">Still no match. Rule 1 does not match
202  * because the preceding context is not present.</td>
203  * </tr>
204  * <tr>
205  * <td valign="top" nowrap><code>ad|efabcdefz</code></td>
206  * <td valign="top">Still no match. Keep advancing until
207  * there is a match...</td>
208  * </tr>
209  * <tr>
210  * <td valign="top" nowrap><code>ade|fabcdefz</code></td>
211  * <td valign="top">...</td>
212  * </tr>
213  * <tr>
214  * <td valign="top" nowrap><code>adef|abcdefz</code></td>
215  * <td valign="top">...</td>
216  * </tr>
217  * <tr>
218  * <td valign="top" nowrap><code>adefa|bcdefz</code></td>
219  * <td valign="top">...</td>
220  * </tr>
221  * <tr>
222  * <td valign="top" nowrap><code>adefab|cdefz</code></td>
223  * <td valign="top">...</td>
224  * </tr>
225  * <tr>
226  * <td valign="top" nowrap><code>adefabc|defz</code></td>
227  * <td valign="top">Rule 1 matches; replace &quot;<code>def</code>&quot;
228  * with &quot;<code>xy</code>&quot; and back up the cursor
229  * to before the '<code>y</code>'.</td>
230  * </tr>
231  * <tr>
232  * <td valign="top" nowrap><code>adefabcx|yz</code></td>
233  * <td valign="top">Although &quot;<code>xyz</code>&quot; is
234  * present, rule 2 does not match because the cursor is
235  * before the '<code>y</code>', not before the '<code>x</code>'.
236  * Rule 3 does match. Replace &quot;<code>yz</code>&quot;
237  * with &quot;<code>q</code>&quot;.</td>
238  * </tr>
239  * <tr>
240  * <td valign="top" nowrap><code>adefabcxq|</code></td>
241  * <td valign="top">The cursor is at the end;
242  * transliteration is complete.</td>
243  * </tr>
244  * </table>
245  *
246  * <p>The order of rules is significant. If multiple rules may match
247  * at some point, the first matching rule is applied. </p>
248  *
249  * <p>Forward and reverse rules may have an empty output string.
250  * Otherwise, an empty left or right hand side of any statement is a
251  * syntax error. </p>
252  *
253  * <p>Single quotes are used to quote any character other than a
254  * digit or letter. To specify a single quote itself, inside or
255  * outside of quotes, use two single quotes in a row. For example,
256  * the rule &quot;<code>'&gt;'&gt;o''clock</code>&quot; changes the
257  * string &quot;<code>&gt;</code>&quot; to the string &quot;<code>o'clock</code>&quot;.
258  * </p>
259  *
260  * <p><b>Notes</b> </p>
261  *
262  * <p>While a RuleBasedTransliterator is being built, it checks that
263  * the rules are added in proper order. For example, if the rule
264  * &quot;a&gt;x&quot; is followed by the rule &quot;ab&gt;y&quot;,
265  * then the second rule will throw an exception. The reason is that
266  * the second rule can never be triggered, since the first rule
267  * always matches anything it matches. In other words, the first
268  * rule <em>masks</em> the second rule. </p>
269  *
270  * <p>Copyright (c) IBM Corporation 1999-2000. All rights reserved.</p>
271  *
272  * @author Alan Liu
273  * @internal
274  * @deprecated This API is ICU internal only.
275  */

276 public class RuleBasedTransliterator extends Transliterator {
277
278     private Data data;
279
280     private static final String JavaDoc COPYRIGHT =
281         "\u00A9 IBM Corporation 1999. All rights reserved.";
282
283     /**
284      * Constructs a new transliterator from the given rules.
285      * @param rules rules, separated by ';'
286      * @param direction either FORWARD or REVERSE.
287      * @exception IllegalArgumentException if rules are malformed
288      * or direction is invalid.
289      * @internal
290      * @deprecated This API is ICU internal only.
291      */

292     public RuleBasedTransliterator(String JavaDoc ID, String JavaDoc rules, int direction,
293                                    UnicodeFilter filter) {
294         super(ID, filter);
295         if (direction != FORWARD && direction != REVERSE) {
296             throw new IllegalArgumentException JavaDoc("Invalid direction");
297         }
298
299         TransliteratorParser parser = new TransliteratorParser();
300         parser.parse(rules, direction);
301         if (parser.idBlockVector.size() != 0 ||
302             parser.compoundFilter != null) {
303             throw new IllegalArgumentException JavaDoc("::ID blocks illegal in RuleBasedTransliterator constructor");
304         }
305
306         data = (Data)parser.dataVector.get(0);
307         setMaximumContextLength(data.ruleSet.getMaximumContextLength());
308     }
309
310     /**
311      * Constructs a new transliterator from the given rules in the
312      * <code>FORWARD</code> direction.
313      * @param rules rules, separated by ';'
314      * @exception IllegalArgumentException if rules are malformed
315      * or direction is invalid.
316      * @internal
317      * @deprecated This API is ICU internal only.
318      */

319     public RuleBasedTransliterator(String JavaDoc ID, String JavaDoc rules) {
320         this(ID, rules, FORWARD, null);
321     }
322
323     RuleBasedTransliterator(String JavaDoc ID, Data data, UnicodeFilter filter) {
324         super(ID, filter);
325         this.data = data;
326         setMaximumContextLength(data.ruleSet.getMaximumContextLength());
327     }
328
329     /**
330      * Implements {@link Transliterator#handleTransliterate}.
331      * @internal
332      * @deprecated This API is ICU internal only.
333      */

334     protected synchronized void handleTransliterate(Replaceable text,
335                                        Position index, boolean incremental) {
336         /* We keep start and limit fixed the entire time,
337          * relative to the text -- limit may move numerically if text is
338          * inserted or removed. The cursor moves from start to limit, with
339          * replacements happening under it.
340          *
341          * Example: rules 1. ab>x|y
342          * 2. yc>z
343          *
344          * |eabcd start - no match, advance cursor
345          * e|abcd match rule 1 - change text & adjust cursor
346          * ex|ycd match rule 2 - change text & adjust cursor
347          * exz|d no match, advance cursor
348          * exzd| done
349          */

350
351         /* A rule like
352          * a>b|a
353          * creates an infinite loop. To prevent that, we put an arbitrary
354          * limit on the number of iterations that we take, one that is
355          * high enough that any reasonable rules are ok, but low enough to
356          * prevent a server from hanging. The limit is 16 times the
357          * number of characters n, unless n is so large that 16n exceeds a
358          * uint32_t.
359          */

360         int loopCount = 0;
361         int loopLimit = (index.limit - index.start) << 4;
362         if (loopLimit < 0) {
363             loopLimit = 0x7FFFFFFF;
364         }
365
366         while (index.start < index.limit &&
367                loopCount <= loopLimit &&
368                data.ruleSet.transliterate(text, index, incremental)) {
369             ++loopCount;
370         }
371     }
372
373
374     static class Data {
375         public Data() {
376             variableNames = new Hashtable JavaDoc();
377             ruleSet = new TransliterationRuleSet();
378         }
379
380         /**
381          * Rule table. May be empty.
382          */

383         public TransliterationRuleSet ruleSet;
384
385         /**
386          * Map variable name (String) to variable (char[]). A variable name
387          * corresponds to zero or more characters, stored in a char[] array in
388          * this hash. One or more of these chars may also correspond to a
389          * UnicodeSet, in which case the character in the char[] in this hash is
390          * a stand-in: it is an index for a secondary lookup in
391          * data.variables. The stand-in also represents the UnicodeSet in
392          * the stored rules.
393          */

394         Hashtable JavaDoc variableNames;
395
396         /**
397          * Map category variable (Character) to UnicodeMatcher or UnicodeReplacer.
398          * Variables that correspond to a set of characters are mapped
399          * from variable name to a stand-in character in data.variableNames.
400          * The stand-in then serves as a key in this hash to lookup the
401          * actual UnicodeSet object. In addition, the stand-in is
402          * stored in the rule text to represent the set of characters.
403          * variables[i] represents character (variablesBase + i).
404          */

405         Object JavaDoc[] variables;
406
407         /**
408          * The character that represents variables[0]. Characters
409          * variablesBase through variablesBase +
410          * variables.length - 1 represent UnicodeSet objects.
411          */

412         char variablesBase;
413
414         /**
415          * Return the UnicodeMatcher represented by the given character, or
416          * null if none.
417          */

418         public UnicodeMatcher lookupMatcher(int standIn) {
419             int i = standIn - variablesBase;
420             return (i >= 0 && i < variables.length)
421                 ? (UnicodeMatcher) variables[i] : null;
422         }
423
424         /**
425          * Return the UnicodeReplacer represented by the given character, or
426          * null if none.
427          */

428         public UnicodeReplacer lookupReplacer(int standIn) {
429             int i = standIn - variablesBase;
430             return (i >= 0 && i < variables.length)
431                 ? (UnicodeReplacer) variables[i] : null;
432         }
433     }
434
435
436     /**
437      * Return a representation of this transliterator as source rules.
438      * These rules will produce an equivalent transliterator if used
439      * to construct a new transliterator.
440      * @param escapeUnprintable if TRUE then convert unprintable
441      * character to their hex escape representations, \\uxxxx or
442      * \\Uxxxxxxxx. Unprintable characters are those other than
443      * U+000A, U+0020..U+007E.
444      * @return rules string
445      * @internal
446      * @deprecated This API is ICU internal only.
447      */

448     public String JavaDoc toRules(boolean escapeUnprintable) {
449         return data.ruleSet.toRules(escapeUnprintable);
450     }
451
452     /**
453      * Return the set of all characters that may be modified by this
454      * Transliterator, ignoring the effect of our filter.
455      * @internal
456      * @deprecated This API is ICU internal only.
457      */

458     protected UnicodeSet handleGetSourceSet() {
459         return data.ruleSet.getSourceTargetSet(false);
460     }
461
462     /**
463      * Returns the set of all characters that may be generated as
464      * replacement text by this transliterator.
465      * @internal
466      * @deprecated This API is ICU internal only.
467      */

468     public UnicodeSet getTargetSet() {
469         return data.ruleSet.getSourceTargetSet(true);
470     }
471 }
472
473 /**
474  * Revision 1.61 2004/02/25 01:26:23 alan
475  * jitterbug 3517: make concrete transilterators package private and @internal
476  *
477  * Revision 1.60 2003/06/03 18:49:35 alan
478  * jitterbug 2959: update copyright dates to include 2003
479  *
480  * Revision 1.59 2003/05/14 19:03:30 rviswanadha
481  * jitterbug 2836: fix compiler warnings
482  *
483  * Revision 1.58 2002/12/03 18:57:36 alan
484  * jitterbug 2087: fix @ tags
485  *
486  * Revision 1.57 2002/07/26 21:12:36 alan
487  * jitterbug 1997: use UCharacterProperty.isRuleWhiteSpace() in parsers
488  *
489  * Revision 1.56 2002/06/28 19:15:52 alan
490  * jitterbug 1434: improve method names; minor cleanup
491  *
492  * Revision 1.55 2002/06/26 18:12:39 alan
493  * jitterbug 1434: initial public implementation of getSourceSet and getTargetSet
494  *
495  * Revision 1.54 2002/02/25 22:43:58 ram
496  * Move Utility class to icu.impl
497  *
498  * Revision 1.53 2002/02/16 03:06:13 Mohan
499  * ICU4J reorganization
500  *
501  * Revision 1.52 2002/02/07 00:53:54 alan
502  * jitterbug 1234: make output side of RBTs object-oriented; rewrite ID parsers and modularize them; implement &Any-Lower() support
503  *
504  * Revision 1.51 2001/11/29 22:31:18 alan
505  * jitterbug 1560: add source-set methods and TransliteratorUtility class
506  *
507  * Revision 1.50 2001/11/27 22:07:33 alan
508  * jitterbug 1389: incorporate Mark's review comments - comments only
509  *
510  * Revision 1.49 2001/10/10 20:26:27 alan
511  * jitterbug 81: initial implementation of compound filters in IDs and ::ID blocks
512  *
513  * Revision 1.48 2001/10/05 18:15:54 alan
514  * jitterbug 74: finish port of Source-Target/Variant code incl. TransliteratorRegistry and tests
515  *
516  * Revision 1.47 2001/10/03 00:14:22 alan
517  * jitterbug 73: finish quantifier and supplemental char support
518  *
519  * Revision 1.46 2001/09/26 18:00:06 alan
520  * jitterbug 67: sync parser with icu4c, allow unlimited, nested segments
521  *
522  * Revision 1.45 2001/09/24 19:57:17 alan
523  * jitterbug 60: implement toPattern in UnicodeSet; update UnicodeFilter.contains to take an int; update UnicodeSet to support code points to U+10FFFF
524  *
525  * Revision 1.44 2001/09/21 21:24:04 alan
526  * jitterbug 64: allow ::ID blocks in rules
527  *
528  * Revision 1.43 2001/09/19 17:43:37 alan
529  * jitterbug 60: initial implementation of toRules()
530  *
531  * Revision 1.42 2001/02/20 17:59:40 alan4j
532  * Remove backslash-u from log
533  *
534  * Revision 1.41 2001/02/16 18:53:55 alan4j
535  * Handle backslash-u escapes
536  *
537  * Revision 1.40 2001/02/03 00:46:21 alan4j
538  * Load RuleBasedTransliterator files from UTF8 files instead of ResourceBundles
539  *
540  * Revision 1.39 2000/08/31 17:11:42 alan4j
541  * Implement anchors.
542  *
543  * Revision 1.38 2000/08/30 20:40:30 alan4j
544  * Implement anchors.
545  *
546  * Revision 1.37 2000/07/12 16:31:36 alan4j
547  * Simplify loop limit logic
548  *
549  * Revision 1.36 2000/06/29 21:59:23 alan4j
550  * Fix handling of Transliterator.Position fields
551  *
552  * Revision 1.35 2000/06/28 20:49:54 alan4j
553  * Fix handling of Positions fields
554  *
555  * Revision 1.34 2000/06/28 20:36:32 alan4j
556  * Clean up Transliterator::Position - rename temporary names
557  *
558  * Revision 1.33 2000/06/28 20:31:43 alan4j
559  * Clean up Transliterator::Position and rename fields (related to jitterbug 450)
560  *
561  * Revision 1.32 2000/05/24 22:21:00 alan4j
562  * Compact UnicodeSets
563  *
564  * Revision 1.31 2000/05/23 16:48:27 alan4j
565  * Fix doc; remove unused auto
566  *
567  * Revision 1.30 2000/05/18 22:49:51 alan
568  * Update docs
569  *
570  * Revision 1.29 2000/04/28 00:25:42 alan
571  * Improve error reporting
572  *
573  * Revision 1.28 2000/04/25 17:38:00 alan
574  * Minor parser cleanup.
575  *
576  * Revision 1.27 2000/04/25 01:42:58 alan
577  * Allow arbitrary length variable values. Clean up Data API. Update javadocs.
578  *
579  * Revision 1.26 2000/04/22 01:25:10 alan
580  * Add support for cursor positioner '@'; update javadoc
581  *
582  * Revision 1.25 2000/04/22 00:08:43 alan
583  * Narrow range to 21 - 7E for mandatory quoting.
584  *
585  * Revision 1.24 2000/04/22 00:03:54 alan
586  * Disallow unquoted special chars. Report multiple errors at once.
587  *
588  * Revision 1.23 2000/04/21 22:23:40 alan
589  * Clean up parseReference. Previous log should read 'delegate', not 'delete'.
590  *
591  * Revision 1.22 2000/04/21 22:16:29 alan
592  * Delete variable name parsing to SymbolTable interface to consolidate parsing code.
593  *
594  * Revision 1.21 2000/04/21 21:16:40 alan
595  * Modify rule syntax
596  *
597  * Revision 1.20 2000/04/19 17:35:23 alan
598  * Update javadoc; fix compile error
599  *
600  * Revision 1.19 2000/04/19 16:34:18 alan
601  * Add segment support.
602  *
603  * Revision 1.18 2000/04/12 20:17:45 alan
604  * Delegate replace operation to rule object
605  *
606  * Revision 1.17 2000/03/10 04:07:23 johnf
607  * Copyright update
608  *
609  * Revision 1.16 2000/02/24 20:46:49 liu
610  * Add infinite loop check
611  *
612  * Revision 1.15 2000/02/10 07:36:25 johnf
613  * fixed imports for com.ibm.icu.impl.Utility
614  *
615  * Revision 1.14 2000/02/03 18:18:42 Alan
616  * Use array rather than hashtable for char-to-set map
617  *
618  * Revision 1.13 2000/01/27 18:59:19 Alan
619  * Use Position rather than int[] and move all subclass overrides to one method (handleTransliterate)
620  *
621  * Revision 1.12 2000/01/18 17:51:09 Alan
622  * Remove "keyboard" from method names. Make maximum context a field of Transliterator, and have subclasses set it.
623  *
624  * Revision 1.11 2000/01/18 02:30:49 Alan
625  * Add Jamo-Hangul, Hangul-Jamo, fix rules, add compound ID support
626  *
627  * Revision 1.10 2000/01/13 23:53:23 Alan
628  * Fix bugs found during ICU port
629  *
630  * Revision 1.9 2000/01/11 04:12:06 Alan
631  * Cleanup, embellish comments
632  *
633  * Revision 1.8 2000/01/11 02:25:03 Alan
634  * Rewrite UnicodeSet and RBT parsers for better performance and new syntax
635  *
636  * Revision 1.7 2000/01/06 01:36:36 Alan
637  * Allow string arrays in rule resource bundles
638  *
639  * Revision 1.6 2000/01/04 21:43:57 Alan
640  * Add rule indexing, and move masking check to TransliterationRuleSet.
641  *
642  * Revision 1.5 1999/12/22 01:40:54 Alan
643  * Consolidate rule pattern anteContext, key, and postContext into one string.
644  *
645  * Revision 1.4 1999/12/22 01:05:54 Alan
646  * Improve masking checking; turn it off by default, for better performance
647  */

648
Popular Tags