KickJava   Java API By Example, From Geeks To Geeks.

Java > Open Source Codes > com > ibm > icu > text > BreakIterator


1 /*
2  *******************************************************************************
3  * Copyright (C) 1996-2006, International Business Machines Corporation and *
4  * others. All Rights Reserved. *
5  *******************************************************************************
6  */

7
8 package com.ibm.icu.text;
9
10 import java.lang.ref.SoftReference JavaDoc;
11 import java.text.CharacterIterator JavaDoc;
12 import java.text.StringCharacterIterator JavaDoc;
13 import java.util.Locale JavaDoc;
14 import java.util.MissingResourceException JavaDoc;
15
16 import com.ibm.icu.impl.ICUDebug;
17 import com.ibm.icu.util.ULocale;
18
19 /**
20  * A class that locates boundaries in text. This class defines a protocol for
21  * objects that break up a piece of natural-language text according to a set
22  * of criteria. Instances or subclasses of BreakIterator can be provided, for
23  * example, to break a piece of text into words, sentences, or logical characters
24  * according to the conventions of some language or group of languages.
25  *
26  * We provide five built-in types of BreakIterator:
27  * <ul><li>getTitleInstance() returns a BreakIterator that locates boundaries
28  * between title breaks.
29  * <li>getSentenceInstance() returns a BreakIterator that locates boundaries
30  * between sentences. This is useful for triple-click selection, for example.
31  * <li>getWordInstance() returns a BreakIterator that locates boundaries between
32  * words. This is useful for double-click selection or "find whole words" searches.
33  * This type of BreakIterator makes sure there is a boundary position at the
34  * beginning and end of each legal word. (Numbers count as words, too.) Whitespace
35  * and punctuation are kept separate from real words.
36  * <li>getLineInstance() returns a BreakIterator that locates positions where it is
37  * legal for a text editor to wrap lines. This is similar to word breaking, but
38  * not the same: punctuation and whitespace are generally kept with words (you don't
39  * want a line to start with whitespace, for example), and some special characters
40  * can force a position to be considered a line-break position or prevent a position
41  * from being a line-break position.
42  * <li>getCharacterInstance() returns a BreakIterator that locates boundaries between
43  * logical characters. Because of the structure of the Unicode encoding, a logical
44  * character may be stored internally as more than one Unicode code point. (A with an
45  * umlaut may be stored as an a followed by a separate combining umlaut character,
46  * for example, but the user still thinks of it as one character.) This iterator allows
47  * various processes (especially text editors) to treat as characters the units of text
48  * that a user would think of as characters, rather than the units of text that the
49  * computer sees as "characters".</ul>
50  *
51  * BreakIterator's interface follows an "iterator" model (hence the name), meaning it
52  * has a concept of a "current position" and methods like first(), last(), next(),
53  * and previous() that update the current position. All BreakIterators uphold the
54  * following invariants:
55  * <ul><li>The beginning and end of the text are always treated as boundary positions.
56  * <li>The current position of the iterator is always a boundary position (random-
57  * access methods move the iterator to the nearest boundary position before or
58  * after the specified position, not _to_ the specified position).
59  * <li>DONE is used as a flag to indicate when iteration has stopped. DONE is only
60  * returned when the current position is the end of the text and the user calls next(),
61  * or when the current position is the beginning of the text and the user calls
62  * previous().
63  * <li>Break positions are numbered by the positions of the characters that follow
64  * them. Thus, under normal circumstances, the position before the first character
65  * is 0, the position after the first character is 1, and the position after the
66  * last character is 1 plus the length of the string.
67  * <li>The client can change the position of an iterator, or the text it analyzes,
68  * at will, but cannot change the behavior. If the user wants different behavior, he
69  * must instantiate a new iterator.</ul>
70  *
71  * BreakIterator accesses the text it analyzes through a CharacterIterator, which makes
72  * it possible to use BreakIterator to analyze text in any text-storage vehicle that
73  * provides a CharacterIterator interface.
74  *
75  * <b>NOTE:</b> Some types of BreakIterator can take a long time to create, and
76  * instances of BreakIterator are not currently cached by the system. For
77  * optimal performance, keep instances of BreakIterator around as long as makes
78  * sense. For example, when word-wrapping a document, don't create and destroy a
79  * new BreakIterator for each line. Create one break iterator for the whole document
80  * (or whatever stretch of text you're wrapping) and use it to do the whole job of
81  * wrapping the text.
82  *
83   * <P>
84  * <strong>Examples</strong>:<P>
85  * Creating and using text boundaries
86  * <blockquote>
87  * <pre>
88  * public static void main(String args[]) {
89  * if (args.length == 1) {
90  * String stringToExamine = args[0];
91  * //print each word in order
92  * BreakIterator boundary = BreakIterator.getWordInstance();
93  * boundary.setText(stringToExamine);
94  * printEachForward(boundary, stringToExamine);
95  * //print each sentence in reverse order
96  * boundary = BreakIterator.getSentenceInstance(Locale.US);
97  * boundary.setText(stringToExamine);
98  * printEachBackward(boundary, stringToExamine);
99  * printFirst(boundary, stringToExamine);
100  * printLast(boundary, stringToExamine);
101  * }
102  * }
103  * </pre>
104  * </blockquote>
105  *
106  * Print each element in order
107  * <blockquote>
108  * <pre>
109  * public static void printEachForward(BreakIterator boundary, String source) {
110  * int start = boundary.first();
111  * for (int end = boundary.next();
112  * end != BreakIterator.DONE;
113  * start = end, end = boundary.next()) {
114  * System.out.println(source.substring(start,end));
115  * }
116  * }
117  * </pre>
118  * </blockquote>
119  *
120  * Print each element in reverse order
121  * <blockquote>
122  * <pre>
123  * public static void printEachBackward(BreakIterator boundary, String source) {
124  * int end = boundary.last();
125  * for (int start = boundary.previous();
126  * start != BreakIterator.DONE;
127  * end = start, start = boundary.previous()) {
128  * System.out.println(source.substring(start,end));
129  * }
130  * }
131  * </pre>
132  * </blockquote>
133  *
134  * Print first element
135  * <blockquote>
136  * <pre>
137  * public static void printFirst(BreakIterator boundary, String source) {
138  * int start = boundary.first();
139  * int end = boundary.next();
140  * System.out.println(source.substring(start,end));
141  * }
142  * </pre>
143  * </blockquote>
144  *
145  * Print last element
146  * <blockquote>
147  * <pre>
148  * public static void printLast(BreakIterator boundary, String source) {
149  * int end = boundary.last();
150  * int start = boundary.previous();
151  * System.out.println(source.substring(start,end));
152  * }
153  * </pre>
154  * </blockquote>
155  *
156  * Print the element at a specified position
157  * <blockquote>
158  * <pre>
159  * public static void printAt(BreakIterator boundary, int pos, String source) {
160  * int end = boundary.following(pos);
161  * int start = boundary.previous();
162  * System.out.println(source.substring(start,end));
163  * }
164  * </pre>
165  * </blockquote>
166  *
167  * Find the next word
168  * <blockquote>
169  * <pre>
170  * public static int nextWordStartAfter(int pos, String text) {
171  * BreakIterator wb = BreakIterator.getWordInstance();
172  * wb.setText(text);
173  * int last = wb.following(pos);
174  * int current = wb.next();
175  * while (current != BreakIterator.DONE) {
176  * for (int p = last; p < current; p++) {
177  * if (Character.isLetter(text.charAt(p))
178  * return last;
179  * }
180  * last = current;
181  * current = wb.next();
182  * }
183  * return BreakIterator.DONE;
184  * }
185  * </pre>
186  * (The iterator returned by BreakIterator.getWordInstance() is unique in that
187  * the break positions it returns don't represent both the start and end of the
188  * thing being iterated over. That is, a sentence-break iterator returns breaks
189  * that each represent the end of one sentence and the beginning of the next.
190  * With the word-break iterator, the characters between two boundaries might be a
191  * word, or they might be the punctuation or whitespace between two words. The
192  * above code uses a simple heuristic to determine which boundary is the beginning
193  * of a word: If the characters between this boundary and the next boundary
194  * include at least one letter (this can be an alphabetical letter, a CJK ideograph,
195  * a Hangul syllable, a Kana character, etc.), then the text between this boundary
196  * and the next is a word; otherwise, it's the material between words.)
197  * </blockquote>
198  *
199  * @see CharacterIterator
200  * @stable ICU 2.0
201  *
202  */

203
204 public abstract class BreakIterator implements Cloneable JavaDoc
205 {
206
207     private static final boolean DEBUG = ICUDebug.enabled("breakiterator");
208     
209     /**
210      * Default constructor. There is no state that is carried by this abstract
211      * base class.
212      * @stable ICU 2.0
213      */

214     protected BreakIterator()
215     {
216     }
217
218     /**
219      * Clone method. Creates another BreakIterator with the same behavior and
220      * current state as this one.
221      * @return The clone.
222      * @stable ICU 2.0
223      */

224     public Object JavaDoc clone()
225     {
226         try {
227             return super.clone();
228         }
229         catch (CloneNotSupportedException JavaDoc e) {
230             ///CLOVER:OFF
231
throw new IllegalStateException JavaDoc();
232             ///CLOVER:ON
233
}
234     }
235
236     /**
237      * DONE is returned by previous() and next() after all valid
238      * boundaries have been returned.
239      * @stable ICU 2.0
240      */

241     public static final int DONE = -1;
242
243     /**
244      * Return the first boundary position. This is always the beginning
245      * index of the text this iterator iterates over. For example, if
246      * the iterator iterates over a whole string, this function will
247      * always return 0. This function also updates the iteration position
248      * to point to the beginning of the text.
249      * @return The character offset of the beginning of the stretch of text
250      * being broken.
251      * @stable ICU 2.0
252      */

253     public abstract int first();
254
255     /**
256      * Return the last boundary position. This is always the "past-the-end"
257      * index of the text this iterator iterates over. For example, if the
258      * iterator iterates over a whole string (call it "text"), this function
259      * will always return text.length(). This function also updated the
260      * iteration position to point to the end of the text.
261      * @return The character offset of the end of the stretch of text
262      * being broken.
263      * @stable ICU 2.0
264      */

265     public abstract int last();
266
267     /**
268      * Advances the specified number of steps forward in the text (a negative
269      * number, therefore, advances backwards). If this causes the iterator
270      * to advance off either end of the text, this function returns DONE;
271      * otherwise, this function returns the position of the appropriate
272      * boundary. Calling this function is equivalent to calling next() or
273      * previous() n times.
274      * @param n The number of boundaries to advance over (if positive, moves
275      * forward; if negative, moves backwards).
276      * @return The position of the boundary n boundaries from the current
277      * iteration position, or DONE if moving n boundaries causes the iterator
278      * to advance off either end of the text.
279      * @stable ICU 2.0
280      */

281     public abstract int next(int n);
282
283     /**
284      * Advances the iterator forward one boundary. The current iteration
285      * position is updated to point to the next boundary position after the
286      * current position, and this is also the value that is returned. If
287      * the current position is equal to the value returned by last(), or to
288      * DONE, this function returns DONE and sets the current position to
289      * DONE.
290      * @return The position of the first boundary position following the
291      * iteration position.
292      * @stable ICU 2.0
293      */

294     public abstract int next();
295
296     /**
297      * Advances the iterator backward one boundary. The current iteration
298      * position is updated to point to the last boundary position before
299      * the current position, and this is also the value that is returned. If
300      * the current position is equal to the value returned by first(), or to
301      * DONE, this function returns DONE and sets the current position to
302      * DONE.
303      * @return The position of the last boundary position preceding the
304      * iteration position.
305      * @stable ICU 2.0
306      */

307     public abstract int previous();
308
309     /**
310      * Sets the iterator's current iteration position to be the first
311      * boundary position following the specified position. (Whether the
312      * specified position is itself a boundary position or not doesn't
313      * matter-- this function always moves the iteration position to the
314      * first boundary after the specified position.) If the specified
315      * position is the past-the-end position, returns DONE.
316      * @param offset The character position to start searching from.
317      * @return The position of the first boundary position following
318      * "offset" (whether or not "offset" itself is a boundary position),
319      * or DONE if "offset" is the past-the-end offset.
320      * @stable ICU 2.0
321      */

322     public abstract int following(int offset);
323
324     /**
325      * Sets the iterator's current iteration position to be the last
326      * boundary position preceding the specified position. (Whether the
327      * specified position is itself a boundary position or not doesn't
328      * matter-- this function always moves the iteration position to the
329      * last boundary before the specified position.) If the specified
330      * position is the starting position, returns DONE.
331      * @param offset The character position to start searching from.
332      * @return The position of the last boundary position preceding
333      * "offset" (whether of not "offset" itself is a boundary position),
334      * or DONE if "offset" is the starting offset of the iterator.
335      * @stable ICU 2.0
336      */

337     public int preceding(int offset) {
338         // NOTE: This implementation is here solely because we can't add new
339
// abstract methods to an existing class. There is almost ALWAYS a
340
// better, faster way to do this.
341
int pos = following(offset);
342         while (pos >= offset && pos != DONE)
343             pos = previous();
344         return pos;
345     }
346
347     /**
348      * Return true if the specfied position is a boundary position. If the
349      * function returns true, the current iteration position is set to the
350      * specified position; if the function returns false, the current
351      * iteration position is set as though following() had been called.
352      * @param offset the offset to check.
353      * @return True if "offset" is a boundary position.
354      * @stable ICU 2.0
355      */

356     public boolean isBoundary(int offset) {
357         // Again, this is the default implementation, which is provided solely because
358
// we couldn't add a new abstract method to an existing class. The real
359
// implementations will usually need to do a little more work.
360
if (offset == 0) {
361             return true;
362         }
363         else
364             return following(offset - 1) == offset;
365     }
366
367     /**
368      * Return the iterator's current position.
369      * @return The iterator's current position.
370      * @stable ICU 2.0
371      */

372     public abstract int current();
373
374     /**
375      * Returns a CharacterIterator over the text being analyzed.
376      * For at least some subclasses of BreakIterator, this is a reference
377      * to the <b>actual iterator being used</b> by the BreakIterator,
378      * and therefore, this function's return value should be treated as
379      * <tt>const</tt>. No guarantees are made about the current position
380      * of this iterator when it is returned. If you need to move that
381      * position to examine the text, clone this function's return value first.
382      * @return A CharacterIterator over the text being analyzed.
383      * @stable ICU 2.0
384      */

385     public abstract CharacterIterator JavaDoc getText();
386
387     /**
388      * Sets the iterator to analyze a new piece of text. The new
389      * piece of text is passed in as a String, and the current
390      * iteration position is reset to the beginning of the string.
391      * (The old text is dropped.)
392      * @param newText A String containing the text to analyze with
393      * this BreakIterator.
394      * @stable ICU 2.0
395      */

396     public void setText(String JavaDoc newText)
397     {
398         setText(new StringCharacterIterator(newText));
399     }
400
401     /**
402      * Sets the iterator to analyze a new piece of text. The
403      * BreakIterator is passed a CharacterIterator through which
404      * it will access the text itself. The current iteration
405      * position is reset to the CharacterIterator's start index.
406      * (The old iterator is dropped.)
407      * @param newText A CharacterIterator referring to the text
408      * to analyze with this BreakIterator (the iterator's current
409      * position is ignored, but its other state is significant).
410      * @stable ICU 2.0
411      */

412     public abstract void setText(CharacterIterator JavaDoc newText);
413
414     /** @stable ICU 2.4 */
415     public static final int KIND_CHARACTER = 0;
416     /** @stable ICU 2.4 */
417     public static final int KIND_WORD = 1;
418     /** @stable ICU 2.4 */
419     public static final int KIND_LINE = 2;
420     /** @stable ICU 2.4 */
421     public static final int KIND_SENTENCE = 3;
422     /** @stable ICU 2.4 */
423     public static final int KIND_TITLE = 4;
424
425     /** @since ICU 2.8 */
426     private static final int KIND_COUNT = 5;
427
428     /** @internal */
429     private static final SoftReference JavaDoc[] iterCache = new SoftReference JavaDoc[5];
430
431     /**
432      * Returns a new instance of BreakIterator that locates word boundaries.
433      * This function assumes that the text being analyzed is in the default
434      * locale's language.
435      * @return An instance of BreakIterator that locates word boundaries.
436      * @stable ICU 2.0
437      */

438     public static BreakIterator getWordInstance()
439     {
440         return getWordInstance(ULocale.getDefault());
441     }
442
443     /**
444      * Returns a new instance of BreakIterator that locates word boundaries.
445      * @param where A locale specifying the language of the text to be
446      * analyzed.
447      * @return An instance of BreakIterator that locates word boundaries.
448      * @stable ICU 2.0
449      */

450     public static BreakIterator getWordInstance(Locale JavaDoc where)
451     {
452         return getBreakInstance(ULocale.forLocale(where), KIND_WORD);
453     }
454
455     /**
456      * Returns a new instance of BreakIterator that locates word boundaries.
457      * @param where A locale specifying the language of the text to be
458      * analyzed.
459      * @return An instance of BreakIterator that locates word boundaries.
460      * @draft ICU 3.2
461      * @provisional This API might change or be removed in a future release.
462      */

463     public static BreakIterator getWordInstance(ULocale where)
464     {
465         return getBreakInstance(where, KIND_WORD);
466     }
467
468     /**
469      * Returns a new instance of BreakIterator that locates legal line-
470      * wrapping positions. This function assumes the text being broken
471      * is in the default locale's language.
472      * @return A new instance of BreakIterator that locates legal
473      * line-wrapping positions.
474      * @stable ICU 2.0
475      */

476     public static BreakIterator getLineInstance()
477     {
478         return getLineInstance(ULocale.getDefault());
479     }
480
481     /**
482      * Returns a new instance of BreakIterator that locates legal line-
483      * wrapping positions.
484      * @param where A Locale specifying the language of the text being broken.
485      * @return A new instance of BreakIterator that locates legal
486      * line-wrapping positions.
487      * @stable ICU 2.0
488      */

489     public static BreakIterator getLineInstance(Locale JavaDoc where)
490     {
491         return getBreakInstance(ULocale.forLocale(where), KIND_LINE);
492     }
493
494     /**
495      * Returns a new instance of BreakIterator that locates legal line-
496      * wrapping positions.
497      * @param where A Locale specifying the language of the text being broken.
498      * @return A new instance of BreakIterator that locates legal
499      * line-wrapping positions.
500      * @draft ICU 3.2
501      * @provisional This API might change or be removed in a future release.
502      */

503     public static BreakIterator getLineInstance(ULocale where)
504     {
505         return getBreakInstance(where, KIND_LINE);
506     }
507
508     /**
509      * Returns a new instance of BreakIterator that locates logical-character
510      * boundaries. This function assumes that the text being analyzed is
511      * in the default locale's language.
512      * @return A new instance of BreakIterator that locates logical-character
513      * boundaries.
514      * @stable ICU 2.0
515      */

516     public static BreakIterator getCharacterInstance()
517     {
518         return getCharacterInstance(ULocale.getDefault());
519     }
520
521     /**
522      * Returns a new instance of BreakIterator that locates logical-character
523      * boundaries.
524      * @param where A Locale specifying the language of the text being analyzed.
525      * @return A new instance of BreakIterator that locates logical-character
526      * boundaries.
527      * @stable ICU 2.0
528      */

529     public static BreakIterator getCharacterInstance(Locale JavaDoc where)
530     {
531         return getBreakInstance(ULocale.forLocale(where), KIND_CHARACTER);
532     }
533
534     /**
535      * Returns a new instance of BreakIterator that locates logical-character
536      * boundaries.
537      * @param where A Locale specifying the language of the text being analyzed.
538      * @return A new instance of BreakIterator that locates logical-character
539      * boundaries.
540      * @draft ICU 3.2
541      * @provisional This API might change or be removed in a future release.
542      */

543     public static BreakIterator getCharacterInstance(ULocale where)
544     {
545         return getBreakInstance(where, KIND_CHARACTER);
546     }
547
548     /**
549      * Returns a new instance of BreakIterator that locates sentence boundaries.
550      * This function assumes the text being analyzed is in the default locale's
551      * language.
552      * @return A new instance of BreakIterator that locates sentence boundaries.
553      * @stable ICU 2.0
554      */

555     public static BreakIterator getSentenceInstance()
556     {
557         return getSentenceInstance(ULocale.getDefault());
558     }
559
560     /**
561      * Returns a new instance of BreakIterator that locates sentence boundaries.
562      * @param where A Locale specifying the language of the text being analyzed.
563      * @return A new instance of BreakIterator that locates sentence boundaries.
564      * @stable ICU 2.0
565      */

566     public static BreakIterator getSentenceInstance(Locale JavaDoc where)
567     {
568         return getBreakInstance(ULocale.forLocale(where), KIND_SENTENCE);
569     }
570
571     /**
572      * Returns a new instance of BreakIterator that locates sentence boundaries.
573      * @param where A Locale specifying the language of the text being analyzed.
574      * @return A new instance of BreakIterator that locates sentence boundaries.
575      * @draft ICU 3.2
576      * @provisional This API might change or be removed in a future release.
577      */

578     public static BreakIterator getSentenceInstance(ULocale where)
579     {
580         return getBreakInstance(where, KIND_SENTENCE);
581     }
582
583     /**
584      * Returns a new instance of BreakIterator that locates title boundaries.
585      * This function assumes the text being analyzed is in the default locale's
586      * language. The iterator returned locates title boundaries as described for
587      * Unicode 3.2 only. For Unicode 4.0 and above title boundary iteration,
588      * please use a word boundary iterator. {@link #getWordInstance}
589      * @return A new instance of BreakIterator that locates title boundaries.
590      * @stable ICU 2.0
591      */

592     public static BreakIterator getTitleInstance()
593     {
594         return getTitleInstance(ULocale.getDefault());
595     }
596
597     /**
598      * Returns a new instance of BreakIterator that locates title boundaries.
599      * The iterator returned locates title boundaries as described for
600      * Unicode 3.2 only. For Unicode 4.0 and above title boundary iteration,
601      * please use Word Boundary iterator.{@link #getWordInstance}
602      * @param where A Locale specifying the language of the text being analyzed.
603      * @return A new instance of BreakIterator that locates title boundaries.
604      * @stable ICU 2.0
605      */

606     public static BreakIterator getTitleInstance(Locale JavaDoc where)
607     {
608         return getBreakInstance(ULocale.forLocale(where), KIND_TITLE);
609     }
610
611     /**
612      * Returns a new instance of BreakIterator that locates title boundaries.
613      * The iterator returned locates title boundaries as described for
614      * Unicode 3.2 only. For Unicode 4.0 and above title boundary iteration,
615      * please use Word Boundary iterator.{@link #getWordInstance}
616      * @param where A Locale specifying the language of the text being analyzed.
617      * @return A new instance of BreakIterator that locates title boundaries.
618      * @draft ICU 3.2
619      * @provisional This API might change or be removed in a future release.
620      */

621     public static BreakIterator getTitleInstance(ULocale where)
622     {
623         return getBreakInstance(where, KIND_TITLE);
624     }
625
626     /**
627      * Register a new break iterator of the indicated kind, to use in the given locale.
628      * Clones of the iterator will be returned
629      * if a request for a break iterator of the given kind matches or falls back to
630      * this locale.
631      * @param iter the BreakIterator instance to adopt.
632      * @param locale the Locale for which this instance is to be registered
633      * @param kind the type of iterator for which this instance is to be registered
634      * @return a registry key that can be used to unregister this instance
635      * @stable ICU 2.4
636      */

637     public static Object JavaDoc registerInstance(BreakIterator iter, Locale JavaDoc locale, int kind) {
638         return registerInstance(iter, ULocale.forLocale(locale), kind);
639     }
640
641     /**
642      * Register a new break iterator of the indicated kind, to use in the given locale.
643      * Clones of the iterator will be returned
644      * if a request for a break iterator of the given kind matches or falls back to
645      * this locale.
646      * @param iter the BreakIterator instance to adopt.
647      * @param locale the Locale for which this instance is to be registered
648      * @param kind the type of iterator for which this instance is to be registered
649      * @return a registry key that can be used to unregister this instance
650      * @draft ICU 3.2
651      * @provisional This API might change or be removed in a future release.
652      */

653     public static Object JavaDoc registerInstance(BreakIterator iter, ULocale locale, int kind) {
654         // If the registered object matches the one in the cache, then
655
// flush the cached object.
656
if (iterCache[kind] != null) {
657             BreakIteratorCache cache = (BreakIteratorCache) iterCache[kind].get();
658             if (cache != null) {
659                 if (cache.getLocale().equals(locale)) {
660                     iterCache[kind] = null;
661                 }
662             }
663         }
664         return getShim().registerInstance(iter, locale, kind);
665     }
666
667     /**
668      * Unregister a previously-registered BreakIterator using the key returned from the
669      * register call. Key becomes invalid after this call and should not be used again.
670      * @param key the registry key returned by a previous call to registerInstance
671      * @return true if the iterator for the key was successfully unregistered
672      * @stable ICU 2.4
673      */

674     public static boolean unregister(Object JavaDoc key) {
675         if (key == null) {
676             throw new IllegalArgumentException JavaDoc("registry key must not be null");
677         }
678         // TODO: we don't do code coverage for the following lines
679
// because in getBreakInstance we always instantiate the shim,
680
// and test execution is such that we always instantiate a
681
// breakiterator before we get to the break iterator tests.
682
// this is for modularization, and we could remove the
683
// dependencies in getBreakInstance by rewriting part of the
684
// LocaleData code, or perhaps by accepting it into the
685
// module.
686
///CLOVER:OFF
687
if (shim != null) {
688             // Unfortunately, we don't know what is being unregistered
689
// -- what `kind' and what locale -- so we flush all
690
// caches. This is safe but inefficient if people are
691
// actively registering and unregistering.
692
for (int kind=0; kind<KIND_COUNT; ++kind) {
693                 iterCache[kind] = null;
694             }
695             return shim.unregister(key);
696         }
697         return false;
698         ///CLOVER:ON
699
}
700
701     // end of registration
702

703     /**
704      * Get a particular kind of BreakIterator for a locale.
705      * Avoids writing a switch statement with getXYZInstance(where) calls.
706      * @internal
707      * @deprecated This API is ICU internal only.
708      */

709     public static BreakIterator getBreakInstance(ULocale where, int kind) {
710
711         if (iterCache[kind] != null) {
712             BreakIteratorCache cache = (BreakIteratorCache) iterCache[kind].get();
713             if (cache != null) {
714                 if (cache.getLocale().equals(where)) {
715                     return cache.createBreakInstance();
716                 }
717             }
718         }
719
720         // sigh, all to avoid linking in ICULocaleData...
721
BreakIterator result = getShim().createBreakIterator(where, kind);
722
723         BreakIteratorCache cache = new BreakIteratorCache(where, result);
724         iterCache[kind] = new SoftReference JavaDoc(cache);
725         return result;
726     }
727
728
729     /**
730      * Returns a list of locales for which BreakIterators can be used.
731      * @return An array of Locales. All of the locales in the array can
732      * be used when creating a BreakIterator.
733      * @stable ICU 2.6
734      */

735     public static synchronized Locale JavaDoc[] getAvailableLocales()
736     {
737         // to avoid linking ICULocaleData
738
return getShim().getAvailableLocales();
739     }
740
741     /**
742      * Returns a list of locales for which BreakIterators can be used.
743      * @return An array of Locales. All of the locales in the array can
744      * be used when creating a BreakIterator.
745      * @draft ICU 3.2
746      * @provisional This API might change or be removed in a future release.
747      */

748     public static synchronized ULocale[] getAvailableULocales()
749     {
750         // to avoid linking ICULocaleData
751
return getShim().getAvailableULocales();
752     }
753
754     private static final class BreakIteratorCache {
755
756         private BreakIterator iter;
757         private ULocale where;
758
759         BreakIteratorCache(ULocale where, BreakIterator iter) {
760             this.where = where;
761             this.iter = (BreakIterator) iter.clone();
762         }
763
764         ULocale getLocale() {
765             return where;
766         }
767
768         BreakIterator createBreakInstance() {
769             return (BreakIterator) iter.clone();
770         }
771     }
772
773     static abstract class BreakIteratorServiceShim {
774         public abstract Object JavaDoc registerInstance(BreakIterator iter, ULocale l, int k);
775         public abstract boolean unregister(Object JavaDoc key);
776         public abstract Locale JavaDoc[] getAvailableLocales();
777         public abstract ULocale[] getAvailableULocales();
778         public abstract BreakIterator createBreakIterator(ULocale l, int k);
779     }
780
781     private static BreakIteratorServiceShim shim;
782     private static BreakIteratorServiceShim getShim() {
783         // Note: this instantiation is safe on loose-memory-model configurations
784
// despite lack of synchronization, since the shim instance has no state--
785
// it's all in the class init. The worst problem is we might instantiate
786
// two shim instances, but they'll share the same state so that's ok.
787
if (shim == null) {
788             try {
789                 Class JavaDoc cls = Class.forName("com.ibm.icu.text.BreakIteratorFactory");
790                 shim = (BreakIteratorServiceShim)cls.newInstance();
791             }
792             catch (MissingResourceException JavaDoc e)
793             {
794                 throw e;
795             }
796             catch (Exception JavaDoc e) {
797                 ///CLOVER:OFF
798
if(DEBUG){
799                     e.printStackTrace();
800                 }
801                 throw new RuntimeException JavaDoc(e.getMessage());
802                 ///CLOVER:ON
803
}
804         }
805         return shim;
806     }
807
808     // -------- BEGIN ULocale boilerplate --------
809

810     /**
811      * Return the locale that was used to create this object, or null.
812      * This may may differ from the locale requested at the time of
813      * this object's creation. For example, if an object is created
814      * for locale <tt>en_US_CALIFORNIA</tt>, the actual data may be
815      * drawn from <tt>en</tt> (the <i>actual</i> locale), and
816      * <tt>en_US</tt> may be the most specific locale that exists (the
817      * <i>valid</i> locale).
818      *
819      * <p>Note: This method will be implemented in ICU 3.0; ICU 2.8
820      * contains a partial preview implementation. The * <i>actual</i>
821      * locale is returned correctly, but the <i>valid</i> locale is
822      * not, in most cases.
823      * @param type type of information requested, either {@link
824      * com.ibm.icu.util.ULocale#VALID_LOCALE} or {@link
825      * com.ibm.icu.util.ULocale#ACTUAL_LOCALE}.
826      * @return the information specified by <i>type</i>, or null if
827      * this object was not constructed from locale data.
828      * @see com.ibm.icu.util.ULocale
829      * @see com.ibm.icu.util.ULocale#VALID_LOCALE
830      * @see com.ibm.icu.util.ULocale#ACTUAL_LOCALE
831      * @draft ICU 2.8 (retain)
832      * @provisional This API might change or be removed in a future release.
833      */

834     public final ULocale getLocale(ULocale.Type type) {
835         return type == ULocale.ACTUAL_LOCALE ?
836             this.actualLocale : this.validLocale;
837     }
838
839     /**
840      * Set information about the locales that were used to create this
841      * object. If the object was not constructed from locale data,
842      * both arguments should be set to null. Otherwise, neither
843      * should be null. The actual locale must be at the same level or
844      * less specific than the valid locale. This method is intended
845      * for use by factories or other entities that create objects of
846      * this class.
847      * @param valid the most specific locale containing any resource
848      * data, or null
849      * @param actual the locale containing data used to construct this
850      * object, or null
851      * @see com.ibm.icu.util.ULocale
852      * @see com.ibm.icu.util.ULocale#VALID_LOCALE
853      * @see com.ibm.icu.util.ULocale#ACTUAL_LOCALE
854      * @internal
855      */

856     final void setLocale(ULocale valid, ULocale actual) {
857         // Change the following to an assertion later
858
if ((valid == null) != (actual == null)) {
859             ///CLOVER:OFF
860
throw new IllegalArgumentException JavaDoc();
861             ///CLOVER:ON
862
}
863         // Another check we could do is that the actual locale is at
864
// the same level or less specific than the valid locale.
865
this.validLocale = valid;
866         this.actualLocale = actual;
867     }
868
869     /**
870      * The most specific locale containing any resource data, or null.
871      * @see com.ibm.icu.util.ULocale
872      * @internal
873      */

874     private ULocale validLocale;
875
876     /**
877      * The locale containing data used to construct this object, or
878      * null.
879      * @see com.ibm.icu.util.ULocale
880      * @internal
881      */

882     private ULocale actualLocale;
883
884     // -------- END ULocale boilerplate --------
885
}
886
Popular Tags