KickJava   Java API By Example, From Geeks To Geeks.

Java > Open Source Codes > javolution > xml > pull > XmlPullParser


1 package javolution.xml.pull;
2
3 import j2me.lang.CharSequence;
4 import java.io.IOException;
5 import java.io.InputStream;
6 import java.io.Reader;
7
8 /**
9  * <p> This interface represents a XML Pull parser with non-constant
10  * {@link String} replaced with {@link CharSequence} for greater
11  * flexibility and speed.</p>
12  *
13  * XML Pull Parser is an interface that defines parsing functionlity provided
14  * in <a HREF="http://www.xmlpull.org/">XMLPULL V1 API</a> (visit this website to
15  * learn more about API and its implementations).
16  *
17  * <p>There are following different
18  * kinds of parser depending on which features are set:<ul>
19  * <li><b>non-validating</b> parser as defined in XML 1.0 spec when
20  * FEATURE_PROCESS_DOCDECL is set to true
21  * <li><b>validating parser</b> as defined in XML 1.0 spec when
22  * FEATURE_VALIDATION is true (and that implies that FEATURE_PROCESS_DOCDECL is true)
23  * <li>when FEATURE_PROCESS_DOCDECL is false (this is default and
24  * if different value is required necessary must be changed before parsing is started)
25  * then parser behaves like XML 1.0 compliant non-validating parser under condition that
26  * <em>no DOCDECL is present</em> in XML documents
27  * (internal entites can still be defined with defineEntityReplacementText()).
28  * This mode of operation is intened <b>for operation in constrained environments</b> such as J2ME.
29  * </ul>
30  *
31  *
32  * <p>There are two key methods: next() and nextToken(). While next() provides
33  * access to high level parsing events, nextToken() allows access to lower
34  * level tokens.
35  *
36  * <p>The current event state of the parser
37  * can be determined by calling the
38  * <a HREF="#getEventType()">getEventType()</a> method.
39  * Initially, the parser is in the <a HREF="#START_DOCUMENT">START_DOCUMENT</a>
40  * state.
41  *
42  * <p>The method <a HREF="#next()">next()</a> advances the parser to the
43  * next event. The int value returned from next determines the current parser
44  * state and is identical to the value returned from following calls to
45  * getEventType ().
46  *
47  * <p>Th following event types are seen by next()<dl>
48  * <dt><a HREF="#START_TAG">START_TAG</a><dd> An XML start tag was read.
49  * <dt><a HREF="#TEXT">TEXT</a><dd> Text content was read;
50  * the text content can be retreived using the getText() method.
51  * (when in validating mode next() will not report ignorable whitespaces, use nextToken() instead)
52  * <dt><a HREF="#END_TAG">END_TAG</a><dd> An end tag was read
53  * <dt><a HREF="#END_DOCUMENT">END_DOCUMENT</a><dd> No more events are available
54  * </dl>
55  *
56  * <p>after first next() or nextToken() (or any other next*() method)
57  * is called user application can obtain
58  * XML version, standalone and encoding from XML declaration
59  * in following ways:<ul>
60  * <li><b>version</b>:
61  * getProperty(&quot;<a HREF="http://xmlpull.org/v1/doc/properties.html#xmldecl-version">http://xmlpull.org/v1/doc/properties.html#xmldecl-version</a>&quot;)
62  * returns String ("1.0") or null if XMLDecl was not read or if property is not supported
63  * <li><b>standalone</b>:
64  * getProperty(&quot;<a HREF="http://xmlpull.org/v1/doc/features.html#xmldecl-standalone">http://xmlpull.org/v1/doc/features.html#xmldecl-standalone</a>&quot;)
65  * returns Boolean: null if there was no standalone declaration
66  * or if property is not supported
67  * otherwise returns Boolean(true) if standalon="yes" and Boolean(false) when standalone="no"
68  * <li><b>encoding</b>: obtained from getInputEncoding()
69  * null if stream had unknown encoding (not set in setInputStream)
70  * and it was not declared in XMLDecl
71  * </ul>
72  *
73  * A minimal example for using this API may look as follows:
74  * <pre>
75  * import java.io.IOException;
76  * import java.io.StringReader;
77  *
78  * import javolution.xml.pull.XmlPullParser;
79  * import javolution.xml.pull.XmlPullParserException;
80  * import javolution.xml.pull.XmlPullParserImpl;
81  *
82  * public class SimpleXmlPullApp {
83  *
84  * public static void main (String args[]) throws XmlPullParserException, IOException {
85  * XmlPullParser xpp = new XmlPullParserImpl();
86  *
87  * xpp.<a HREF="#setInput">setInput</a>( new StringReader ( "&lt;foo>Hello World!&lt;/foo>" ) );
88  * int eventType = xpp.getEventType();
89  * while (eventType != XmlPullParser.END_DOCUMENT) {
90  * if(eventType == XmlPullParser.START_DOCUMENT) {
91  * System.out.println("Start document");
92  * } else if(eventType == XmlPullParser.END_DOCUMENT) {
93  * System.out.println("End document");
94  * } else if(eventType == XmlPullParser.START_TAG) {
95  * System.out.println("Start tag "+xpp.<a HREF="#getName()">getName()</a>);
96  * } else if(eventType == XmlPullParser.END_TAG) {
97  * System.out.println("End tag "+xpp.getName());
98  * } else if(eventType == XmlPullParser.TEXT) {
99  * System.out.println("Text "+xpp.<a HREF="#getText()">getText()</a>);
100  * }
101  * eventType = xpp.next();
102  * }
103  * }
104  * }</pre>
105  *
106  * <p>The above example will generate the following output:
107  * <pre>
108  * Start document
109  * Start tag foo
110  * Text Hello World!
111  * End tag foo
112  * </pre>
113  *
114  * <p>For more details on API usage, please refer to the
115  * quick Introduction available at <a HREF="http://www.xmlpull.org">http://www.xmlpull.org</a>
116  *
117  * @see #defineEntityReplacementText
118  * @see #getName
119  * @see #getNamespace
120  * @see #getText
121  * @see #next
122  * @see #nextToken
123  * @see #setInput
124  * @see #FEATURE_PROCESS_DOCDECL
125  * @see #FEATURE_VALIDATION
126  * @see #START_DOCUMENT
127  * @see #START_TAG
128  * @see #TEXT
129  * @see #END_TAG
130  * @see #END_DOCUMENT
131  *
132  * @author <a HREF="http://www-ai.cs.uni-dortmund.de/PERSONAL/haustein.html">Stefan Haustein</a>
133  * @author <a HREF="http://www.extreme.indiana.edu/~aslom/">Aleksander Slominski</a>
134  */

135 public interface XmlPullParser {
136
137     /** This constant represents the default namespace (empty string "") */
138     String NO_NAMESPACE = "";
139
140     // ----------------------------------------------------------------------------
141
// EVENT TYPES as reported by next()
142

143     /**
144      * Signalize that parser is at the very beginning of the document
145      * and nothing was read yet.
146      * This event type can only be observed by calling getEvent()
147      * before the first call to next(), nextToken, or nextTag()</a>).
148      *
149      * @see #next
150      * @see #nextToken
151      */

152     int START_DOCUMENT = 0;
153
154     /**
155      * Logical end of the xml document. Returned from getEventType, next()
156      * and nextToken()
157      * when the end of the input document has been reached.
158      * <p><strong>NOTE:</strong> calling again
159      * <a HREF="#next()">next()</a> or <a HREF="#nextToken()">nextToken()</a>
160      * will result in exception being thrown.
161      *
162      * @see #next
163      * @see #nextToken
164      */

165     int END_DOCUMENT = 1;
166
167     /**
168      * Returned from getEventType(),
169      * <a HREF="#next()">next()</a>, <a HREF="#nextToken()">nextToken()</a> when
170      * a start tag was read.
171      * The name of start tag is available from getName(), its namespace and prefix are
172      * available from getNamespace() and getPrefix()
173      * if <a HREF='#FEATURE_PROCESS_NAMESPACES'>namespaces are enabled</a>.
174      * See getAttribute* methods to retrieve element attributes.
175      * See getNamespace* methods to retrieve newly declared namespaces.
176      *
177      * @see #next
178      * @see #nextToken
179      * @see #getName
180      * @see #getPrefix
181      * @see #getNamespace
182      * @see #getAttributeCount
183      * @see #getDepth
184      * @see #getNamespaceCount
185      * @see #getNamespace
186      * @see #FEATURE_PROCESS_NAMESPACES
187      */

188     int START_TAG = 2;
189
190     /**
191      * Returned from getEventType(), <a HREF="#next()">next()</a>, or
192      * <a HREF="#nextToken()">nextToken()</a> when an end tag was read.
193      * The name of start tag is available from getName(), its
194      * namespace and prefix are
195      * available from getNamespace() and getPrefix().
196      *
197      * @see #next
198      * @see #nextToken
199      * @see #getName
200      * @see #getPrefix
201      * @see #getNamespace
202      * @see #FEATURE_PROCESS_NAMESPACES
203      */

204     int END_TAG = 3;
205
206     /**
207      * Character data was read and will is available by calling getText().
208      * <p><strong>Please note:</strong> <a HREF="#next()">next()</a> will
209      * accumulate multiple
210      * events into one TEXT event, skipping IGNORABLE_WHITESPACE,
211      * PROCESSING_INSTRUCTION and COMMENT events,
212      * In contrast, <a HREF="#nextToken()">nextToken()</a> will stop reading
213      * text when any other event is observed.
214      * Also, when the state was reached by calling next(), the text value will
215      * be normalized, whereas getText() will
216      * return unnormalized content in the case of nextToken(). This allows
217      * an exact roundtrip without chnanging line ends when examining low
218      * level events, whereas for high level applications the text is
219      * normalized apropriately.
220      *
221      * @see #next
222      * @see #nextToken
223      * @see #getText
224      */

225     int TEXT = 4;
226
227     // ----------------------------------------------------------------------------
228
// additional events exposed by lower level nextToken()
229

230     /**
231      * A CDATA sections was just read;
232      * this token is available only from calls to <a HREF="#nextToken()">nextToken()</a>.
233      * A call to next() will accumulate various text events into a single event
234      * of type TEXT. The text contained in the CDATA section is available
235      * by callling getText().
236      *
237      * @see #nextToken
238      * @see #getText
239      */

240     int CDSECT = 5;
241
242     /**
243      * An entity reference was just read;
244      * this token is available from <a HREF="#nextToken()">nextToken()</a>
245      * only. The entity name is available by calling getName(). If available,
246      * the replacement text can be obtained by calling getTextt(); otherwise,
247      * the user is responsibile for resolving the entity reference.
248      * This event type is never returned from next(); next() will
249      * accumulate the replacement text and other text
250      * events to a single TEXT event.
251      *
252      * @see #nextToken
253      * @see #getText
254      */

255     int ENTITY_REF = 6;
256
257     /**
258      * Ignorable whitespace was just read.
259      * This token is available only from <a HREF="#nextToken()">nextToken()</a>).
260      * For non-validating
261      * parsers, this event is only reported by nextToken() when outside
262      * the root element.
263      * Validating parsers may be able to detect ignorable whitespace at
264      * other locations.
265      * The ignorable whitespace string is available by calling getText()
266      *
267      * <p><strong>NOTE:</strong> this is different from calling the
268      * isWhitespace() method, since text content
269      * may be whitespace but not ignorable.
270      *
271      * Ignorable whitespace is skipped by next() automatically; this event
272      * type is never returned from next().
273      *
274      * @see #nextToken
275      * @see #getText
276      */

277     int IGNORABLE_WHITESPACE = 7;
278
279     /**
280      * An XML processing instruction declaration was just read. This
281      * event type is available only via <a HREF="#nextToken()">nextToken()</a>.
282      * getText() will return text that is inside the processing instruction.
283      * Calls to next() will skip processing instructions automatically.
284      * @see #nextToken
285      * @see #getText
286      */

287     int PROCESSING_INSTRUCTION = 8;
288
289     /**
290      * An XML comment was just read. This event type is this token is
291      * available via <a HREF="#nextToken()">nextToken()</a> only;
292      * calls to next() will skip comments automatically.
293      * The content of the comment can be accessed using the getText()
294      * method.
295      *
296      * @see #nextToken
297      * @see #getText
298      */

299     int COMMENT = 9;
300
301     /**
302      * An XML document type declaration was just read. This token is
303      * available from <a HREF="#nextToken()">nextToken()</a> only.
304      * The unparsed text inside the doctype is available via
305      * the getText() method.
306      *
307      * @see #nextToken
308      * @see #getText
309      */

310     int DOCDECL = 10;
311
312     /**
313      * This array can be used to convert the event type integer constants
314      * such as START_TAG or TEXT to
315      * to a string. For example, the value of TYPES[START_TAG] is
316      * the string "START_TAG".
317      *
318      * This array is intended for diagnostic output only. Relying
319      * on the contents of the array may be dangerous since malicous
320      * applications may alter the array, although it is final, due
321      * to limitations of the Java language.
322      */

323     String[] TYPES = { "START_DOCUMENT", "END_DOCUMENT", "START_TAG",
324             "END_TAG", "TEXT", "CDSECT", "ENTITY_REF", "IGNORABLE_WHITESPACE",
325             "PROCESSING_INSTRUCTION", "COMMENT", "DOCDECL" };
326
327     // ----------------------------------------------------------------------------
328
// namespace related features
329

330     /**
331      * This feature determines whether the parser processes
332      * namespaces. As for all features, the default value is false.
333      * <p><strong>NOTE:</strong> The value can not be changed during
334      * parsing an must be set before parsing.
335      *
336      * @see #getFeature
337      * @see #setFeature
338      */

339     String FEATURE_PROCESS_NAMESPACES = "http://xmlpull.org/v1/doc/features.html#process-namespaces";
340
341     /**
342      * This feature determines whether namespace attributes are
343      * exposed via the attribute access methods. Like all features,
344      * the default value is false. This feature cannot be changed
345      * during parsing.
346      *
347      * @see #getFeature
348      * @see #setFeature
349      */

350     String FEATURE_REPORT_NAMESPACE_ATTRIBUTES = "http://xmlpull.org/v1/doc/features.html#report-namespace-prefixes";
351
352     /**
353      * This feature determines whether the document declaration
354      * is processed. If set to false,
355      * the DOCDECL event type is reported by nextToken()
356      * and ignored by next().
357      *
358      * If this featue is activated, then the document declaration
359      * must be processed by the parser.
360      *
361      * <p><strong>Please note:</strong> If the document type declaration
362      * was ignored, entity references may cause exceptions
363      * later in the parsing process.
364      * The default value of this feature is false. It cannot be changed
365      * during parsing.
366      *
367      * @see #getFeature
368      * @see #setFeature
369      */

370     String FEATURE_PROCESS_DOCDECL = "http://xmlpull.org/v1/doc/features.html#process-docdecl";
371
372     /**
373      * If this feature is activated, all validation errors as
374      * defined in the XML 1.0 sepcification are reported.
375      * This implies that FEATURE_PROCESS_DOCDECL is true and both, the
376      * internal and external document type declaration will be processed.
377      * <p><strong>Please Note:</strong> This feature can not be changed
378      * during parsing. The default value is false.
379      *
380      * @see #getFeature
381      * @see #setFeature
382      */

383     String FEATURE_VALIDATION = "http://xmlpull.org/v1/doc/features.html#validation";
384
385     /**
386      * Use this call to change the general behaviour of the parser,
387      * such as namespace processing or doctype declaration handling.
388      * This method must be called before the first call to next or
389      * nextToken. Otherwise, an exception is thrown.
390      * <p>Example: call setFeature(FEATURE_PROCESS_NAMESPACES, true) in order
391      * to switch on namespace processing. The initial settings correspond
392      * to the properties requested from the XML Pull Parser factory.
393      * If none were requested, all feautures are deactivated by default.
394      *
395      * @exception XmlPullParserException If the feature is not supported or can not be set
396      * @exception IllegalArgumentException If string with the feature name is null
397      */

398     void setFeature(String name, boolean state) throws XmlPullParserException;
399
400     /**
401      * Returns the current value of the given feature.
402      * <p><strong>Please note:</strong> unknown features are
403      * <strong>always</strong> returned as false.
404      *
405      * @param name The name of feature to be retrieved.
406      * @return The value of the feature.
407      * @exception IllegalArgumentException if string the feature name is null
408      */

409
410     boolean getFeature(String name);
411
412     /**
413      * Set the value of a property.
414      *
415      * The property name is any fully-qualified URI.
416      *
417      * @exception XmlPullParserException If the property is not supported or can not be set
418      * @exception IllegalArgumentException If string with the property name is null
419      */

420     void setProperty(String name, Object value) throws XmlPullParserException;
421
422     /**
423      * Look up the value of a property.
424      *
425      * The property name is any fully-qualified URI.
426      * <p><strong>NOTE:</strong> unknown properties are <strong>always</strong>
427      * returned as null.
428      *
429      * @param name The name of property to be retrieved.
430      * @return The value of named property.
431      */

432     Object getProperty(String name);
433
434     /**
435      * Set the input source for parser to the given reader and
436      * resets the parser. The event type is set to the initial value
437      * START_DOCUMENT.
438      * Setting the reader to null will just stop parsing and
439      * reset parser state,
440      * allowing the parser to free internal resources
441      * such as parsing buffers.
442      */

443     void setInput(Reader in) throws XmlPullParserException;
444
445     /**
446      * Sets the input stream the parser is going to process.
447      * This call resets the parser state and sets the event type
448      * to the initial value START_DOCUMENT.
449      *
450      * <p><strong>NOTE:</strong> If an input encoding string is passed,
451      * it MUST be used. Otherwise,
452      * if inputEncoding is null, the parser SHOULD try to determine
453      * input encoding following XML 1.0 specification (see below).
454      * If encoding detection is supported then following feature
455      * <a HREF="http://xmlpull.org/v1/doc/features.html#detect-encoding">
456      * http://xmlpull.org/v1/doc/features.html#detect-encoding</a>
457      * MUST be true amd otherwise it must be false
458      *
459      * @param inputStream contains a raw byte input stream of possibly
460      * unknown encoding (when inputEncoding is null).
461      *
462      * @param inputEncoding if not null it MUST be used as encoding for inputStream
463      */

464     void setInput(InputStream inputStream, String inputEncoding)
465             throws XmlPullParserException;
466
467     /**
468      * Returns the input encoding if known, null otherwise.
469      * If setInput(InputStream, inputEncoding) was called with an inputEncoding
470      * value other than null, this value must be returned
471      * from this method. Otherwise, if inputEncoding is null and
472      * the parser suppports the encoding detection feature
473      * (http://xmlpull.org/v1/doc/features.html#detect-encoding),
474      * it must return the detected encoding.
475      * If setInput(Reader) was called, null is returned.
476      * After first call to next if XML declaration was present this method
477      * will return encoding declared.
478      */

479     String getInputEncoding();
480
481     /**
482      * Set new value for entity replacement text as defined in
483      * <a HREF="http://www.w3.org/TR/REC-xml#intern-replacement">XML 1.0 Section 4.5
484      * Construction of Internal Entity Replacement Text</a>.
485      * If FEATURE_PROCESS_DOCDECL or FEATURE_VALIDATION are set, calling this
486      * function will result in an exception -- when processing of DOCDECL is
487      * enabled, there is no need to the entity replacement text manually.
488      *
489      * <p>The motivation for this function is to allow very small
490      * implementations of XMLPULL that will work in J2ME environments.
491      * Though these implementations may not be able to process the document type
492      * declaration, they still can work with known DTDs by using this function.
493      *
494      * <p><b>Please notes:</b> The given value is used literally as replacement text
495      * and it corresponds to declaring entity in DTD that has all special characters
496      * escaped: left angle bracket is replaced with &amp;lt;, ampersnad with &amp;amp;
497      * and so on.
498      *
499      * <p><b>Note:</b> The given value is the literal replacement text and must not
500      * contain any other entity reference (if it contains any entity reference
501      * there will be no further replacement).
502      *
503      * <p><b>Note:</b> The list of pre-defined entity names will
504      * always contain standard XML entities such as
505      * amp (&amp;amp;), lt (&amp;lt;), gt (&amp;gt;), quot (&amp;quot;), and apos (&amp;apos;).
506      * Those cannot be redefined by this method!
507      *
508      * @see #setInput
509      * @see #FEATURE_PROCESS_DOCDECL
510      * @see #FEATURE_VALIDATION
511      */

512     void defineEntityReplacementText(CharSequence entityName, CharSequence replacementText)
513             throws XmlPullParserException;
514
515     /**
516      * Returns the numbers of elements in the namespace stack for the given
517      * depth.
518      * If namespaces are not enabled, 0 is returned.
519      *
520      * <p><b>NOTE:</b> when parser is on END_TAG then it is allowed to call
521      * this function with getDepth()+1 argument to retrieve position of namespace
522      * prefixes and URIs that were declared on corresponding START_TAG.
523      * <p><b>NOTE:</b> to retrieve lsit of namespaces declared in current element:<pre>
524      * XmlPullParser pp = ...
525      * int nsStart = pp.getNamespaceCount(pp.getDepth()-1);
526      * int nsEnd = pp.getNamespaceCount(pp.getDepth());
527      * for (int i = nsStart; i < nsEnd; i++) {
528      * String prefix = pp.getNamespacePrefix(i);
529      * String ns = pp.getNamespaceUri(i);
530      * // ...
531      * }
532      * </pre>
533      *
534      * @see #getNamespacePrefix
535      * @see #getNamespaceUri
536      * @see #getNamespace()
537      */

538     int getNamespaceCount(int depth) throws XmlPullParserException;
539
540     /**
541      * Returns the namespace prefixe for the given position
542      * in the namespace stack.
543      * Default namespace declaration (xmlns='...') will have null as prefix.
544      * If the given index is out of range, an exception is thrown.
545      * <p><b>Please note:</b> when the parser is on an END_TAG,
546      * namespace prefixes that were declared
547      * in the corresponding START_TAG are still accessible
548      * although they are no longer in scope.
549      */

550     CharSequence getNamespacePrefix(int pos) throws XmlPullParserException;
551
552     /**
553      * Returns the namespace URI for the given position in the
554      * namespace stack
555      * If the position is out of range, an exception is thrown.
556      * <p><b>NOTE:</b> when parser is on END_TAG then namespace prefixes that were declared
557      * in corresponding START_TAG are still accessible even though they are not in scope
558      */

559     CharSequence getNamespaceUri(int pos) throws XmlPullParserException;
560
561     /**
562      * Returns the URI corresponding to the given prefix,
563      * depending on current state of the parser.
564      *
565      * <p>If the prefix was not declared in the current scope,
566      * null is returned. The default namespace is included
567      * in the namespace table and is available via
568      * getNamespace (null).
569      *
570      * <p>This method is a convenience method for
571      *
572      * <pre>
573      * for (int i = getNamespaceCount(getDepth ())-1; i >= 0; i--) {
574      * if (getNamespacePrefix(i).equals( prefix )) {
575      * return getNamespaceUri(i);
576      * }
577      * }
578      * return null;
579      * </pre>
580      *
581      * <p><strong>Please note:</strong> parser implementations
582      * may provide more efifcient lookup, e.g. using a Hashtable.
583      * The 'xml' prefix is bound to "http://www.w3.org/XML/1998/namespace", as
584      * defined in the
585      * <a HREF="http://www.w3.org/TR/REC-xml-names/#ns-using">Namespaces in XML</a>
586      * specification. Analogous, the 'xmlns' prefix is resolved to
587      * <a HREF="http://www.w3.org/2000/xmlns/">http://www.w3.org/2000/xmlns/</a>
588      *
589      * @see #getNamespaceCount
590      * @see #getNamespacePrefix
591      * @see #getNamespaceUri
592      */

593     CharSequence getNamespace(CharSequence prefix);
594
595     // --------------------------------------------------------------------------
596
// miscellaneous reporting methods
597

598     /**
599      * Returns the current depth of the element.
600      * Outside the root element, the depth is 0. The
601      * depth is incremented by 1 when a start tag is reached.
602      * The depth is decremented AFTER the end tag
603      * event was observed.
604      *
605      * <pre>
606      * &lt;!-- outside --&gt; 0
607      * &lt;root> 1
608      * sometext 1
609      * &lt;foobar&gt; 2
610      * &lt;/foobar&gt; 2
611      * &lt;/root&gt; 1
612      * &lt;!-- outside --&gt; 0
613      * </pre>
614      */

615     int getDepth();
616
617     /**
618      * Returns a short text describing the current parser state, including
619      * the position, a
620      * description of the current event and the data source if known.
621      * This method is especially useful to provide meaningful
622      * error messages and for debugging purposes.
623      */

624     CharSequence getPositionDescription();
625
626     /**
627      * Returns the current line number, starting from 1.
628      * When the parser does not know the current line number
629      * or can not determine it, -1 is returned (e.g. for WBXML).
630      *
631      * @return current line number or -1 if unknown.
632      */

633     int getLineNumber();
634
635     /**
636      * Returns the current column number, starting from 0.
637      * When the parser does not know the current column number
638      * or can not determine it, -1 is returned (e.g. for WBXML).
639      *
640      * @return current column number or -1 if unknown.
641      */

642     int getColumnNumber();
643
644     // --------------------------------------------------------------------------
645
// TEXT related methods
646

647     /**
648      * Checks whether the current TEXT event contains only whitespace
649      * characters.
650      * For IGNORABLE_WHITESPACE, this is always true.
651      * For TEXT and CDSECT, false is returned when the current event text
652      * contains at least one non-white space character. For any other
653      * event type an exception is thrown.
654      *
655      * <p><b>Please note:</b> non-validating parsers are not
656      * able to distinguish whitespace and ignorable whitespace,
657      * except from whitespace outside the root element. Ignorable
658      * whitespace is reported as separate event, which is exposed
659      * via nextToken only.
660      *
661      */

662     boolean isWhitespace() throws XmlPullParserException;
663
664     /**
665      * Returns the text content of the current event as String.
666      * The value returned depends on current event type,
667      * for example for TEXT event it is element content
668      * (this is typical case when next() is used).
669      *
670      * See description of nextToken() for detailed description of
671      * possible returned values for different types of events.
672      *
673      * <p><strong>NOTE:</strong> in case of ENTITY_REF, this method returns
674      * the entity replacement text (or null if not available). This is
675      * the only case where
676      * getText() and getTextCharacters() return different values.
677      *
678      * @see #getEventType
679      * @see #next
680      * @see #nextToken
681      */

682     CharSequence getText();
683
684     /**
685      * Returns the buffer that contains the text of the current event,
686      * as well as the start offset and length relevant for the current
687      * event. See getText(), next() and nextToken() for description of possible returned values.
688      *
689      * <p><strong>Please note:</strong> this buffer must not
690      * be modified and its content MAY change after a call to
691      * next() or nextToken(). This method will always return the
692      * same value as getText(), except for ENTITY_REF. In the case
693      * of ENTITY ref, getText() returns the replacement text and
694      * this method returns the actual input buffer containing the
695      * entity name.
696      * If getText() returns null, this method returns null as well and
697      * the values returned in the holder array MUST be -1 (both start
698      * and length).
699      *
700      * @see #getText
701      * @see #next
702      * @see #nextToken
703      *
704      * @param holderForStartAndLength Must hold an 2-element int array
705      * into which the start offset and length values will be written.
706      * @return char buffer that contains the text of the current event
707      * (null if the current event has no text associated).
708      */

709     char[] getTextCharacters(int[] holderForStartAndLength);
710
711     // --------------------------------------------------------------------------
712
// START_TAG / END_TAG shared methods
713

714     /**
715      * Returns the namespace URI of the current element.
716      * The default namespace is represented
717      * as empty string.
718      * If namespaces are not enabled, an empty String ("") is always returned.
719      * The current event must be START_TAG or END_TAG; otherwise,
720      * null is returned.
721      */

722     CharSequence getNamespace();
723
724     /**
725      * For START_TAG or END_TAG events, the (local) name of the current
726      * element is returned when namespaces are enabled. When namespace
727      * processing is disabled, the raw name is returned.
728      * For ENTITY_REF events, the entity name is returned.
729      * If the current event is not START_TAG, END_TAG, or ENTITY_REF,
730      * null is returned.
731      * <p><b>Please note:</b> To reconstruct the raw element name
732      * when namespaces are enabled and the prefix is not null,
733      * you will need to add the prefix and a colon to localName..
734      *
735      */

736     CharSequence getName();
737
738     /**
739      * Returns the prefix of the current element.
740      * If the element is in the default namespace (has no prefix),
741      * null is returned.
742      * If namespaces are not enabled, or the current event
743      * is not START_TAG or END_TAG, null is returned.
744      */

745     CharSequence getPrefix();
746
747     /**
748      * Returns true if the current event is START_TAG and the tag
749      * is degenerated
750      * (e.g. &lt;foobar/&gt;).
751      * <p><b>NOTE:</b> if the parser is not on START_TAG, an exception
752      * will be thrown.
753      */

754     boolean isEmptyElementTag() throws XmlPullParserException;
755
756     // --------------------------------------------------------------------------
757
// START_TAG Attributes retrieval methods
758

759     /**
760      * Returns the number of attributes of the current start tag, or
761      * -1 if the current event type is not START_TAG
762      *
763      * @see #getAttributeNamespace
764      * @see #getAttributeName
765      * @see #getAttributePrefix
766      * @see #getAttributeValue
767      */

768     int getAttributeCount();
769
770     /**
771      * Returns the namespace URI of the attribute
772      * with the given index (starts from 0).
773      * Returns an empty string ("") if namespaces are not enabled
774      * or the attribute has no namespace.
775      * Throws an IndexOutOfBoundsException if the index is out of range
776      * or the current event type is not START_TAG.
777      *
778      * <p><strong>NOTE:</strong> if FEATURE_REPORT_NAMESPACE_ATTRIBUTES is set
779      * then namespace attributes (xmlns:ns='...') must be reported
780      * with namespace
781      * <a HREF="http://www.w3.org/2000/xmlns/">http://www.w3.org/2000/xmlns/</a>
782      * (visit this URL for description!).
783      * The default namespace attribute (xmlns="...") will be reported with empty namespace.
784      * <p><strong>NOTE:</strong>The xml prefix is bound as defined in
785      * <a HREF="http://www.w3.org/TR/REC-xml-names/#ns-using">Namespaces in XML</a>
786      * specification to "http://www.w3.org/XML/1998/namespace".
787      *
788      * @param index zero based index of attribute
789      * @return attribute namespace,
790      * empty string ("") is returned if namesapces processing is not enabled or
791      * namespaces processing is enabled but attribute has no namespace (it has no prefix).
792      */

793     CharSequence getAttributeNamespace(int index);
794
795     /**
796      * Returns the local name of the specified attribute
797      * if namespaces are enabled or just attribute name if namespaces are disabled.
798      * Throws an IndexOutOfBoundsException if the index is out of range
799      * or current event type is not START_TAG.
800      *
801      * @param index zero based index of attribute
802      * @return attribute name (null is never returned)
803      */

804     CharSequence getAttributeName(int index);
805
806     /**
807      * Returns the prefix of the specified attribute
808      * Returns null if the element has no prefix.
809      * If namespaces are disabled it will always return null.
810      * Throws an IndexOutOfBoundsException if the index is out of range
811      * or current event type is not START_TAG.
812      *
813      * @param index zero based index of attribute
814      * @return attribute prefix or null if namespaces processing is not enabled.
815      */

816     CharSequence getAttributePrefix(int index);
817
818     /**
819      * Returns the type of the specified attribute
820      * If parser is non-validating it MUST return CDATA.
821      *
822      * @param index zero based index of attribute
823      * @return attribute type (null is never returned)
824      */

825     String getAttributeType(int index);
826
827     /**
828      * Returns if the specified attribute was not in input was declared in XML.
829      * If parser is non-validating it MUST always return false.
830      * This information is part of XML infoset:
831      *
832      * @param index zero based index of attribute
833      * @return false if attribute was in input
834      */

835     boolean isAttributeDefault(int index);
836
837     /**
838      * Returns the given attributes value.
839      * Throws an IndexOutOfBoundsException if the index is out of range
840      * or current event type is not START_TAG.
841      *
842      * <p><strong>NOTE:</strong> attribute value must be normalized
843      * (including entity replacement text if PROCESS_DOCDECL is false) as described in
844      * <a HREF="http://www.w3.org/TR/REC-xml#AVNormalize">XML 1.0 section
845      * 3.3.3 Attribute-Value Normalization</a>
846      *
847      * @see #defineEntityReplacementText
848      *
849      * @param index zero based index of attribute
850      * @return value of attribute (null is never returned)
851      */

852     CharSequence getAttributeValue(int index);
853
854     /**
855      * Returns the attributes value identified by namespace URI and namespace localName.
856      * If namespaces are disabled namespace must be null.
857      * If current event type is not START_TAG then IndexOutOfBoundsException will be thrown.
858      *
859      * <p><strong>NOTE:</strong> attribute value must be normalized
860      * (including entity replacement text if PROCESS_DOCDECL is false) as described in
861      * <a HREF="http://www.w3.org/TR/REC-xml#AVNormalize">XML 1.0 section
862      * 3.3.3 Attribute-Value Normalization</a>
863      *
864      * @see #defineEntityReplacementText
865      *
866      * @param namespace Namespace of the attribute if namespaces are enabled otherwise must be null
867      * @param name If namespaces enabled local name of attribute otherwise just attribute name
868      * @return value of attribute or null if attribute with given name does not exist
869      */

870     CharSequence getAttributeValue(CharSequence namespace, CharSequence name);
871
872     // --------------------------------------------------------------------------
873
// actual parsing methods
874

875     /**
876      * Returns the type of the current event (START_TAG, END_TAG, TEXT, etc.)
877      *
878      * @see #next()
879      * @see #nextToken()
880      */

881     int getEventType() throws XmlPullParserException;
882
883     /**
884      * Get next parsing event - element content wil be coalesced and only one
885      * TEXT event must be returned for whole element content
886      * (comments and processing instructions will be ignored and emtity references
887      * must be expanded or exception mus be thrown if entity reerence can not be exapnded).
888      * If element content is empty (content is "") then no TEXT event will be reported.
889      *
890      * <p><b>NOTE:</b> empty element (such as &lt;tag/>) will be reported
891      * with two separate events: START_TAG, END_TAG - it must be so to preserve
892      * parsing equivalency of empty element to &lt;tag>&lt;/tag>.
893      * (see isEmptyElementTag ())
894      *
895      * @see #isEmptyElementTag
896      * @see #START_TAG
897      * @see #TEXT
898      * @see #END_TAG
899      * @see #END_DOCUMENT
900      */

901
902     int next() throws XmlPullParserException, IOException;
903
904     /**
905      * This method works similarly to next() but will expose
906      * additional event types (COMMENT, CDSECT, DOCDECL, ENTITY_REF, PROCESSING_INSTRUCTION, or
907      * IGNORABLE_WHITESPACE) if they are available in input.
908      *
909      * <p>If special feature
910      * <a HREF="http://xmlpull.org/v1/doc/features.html#xml-roundtrip">FEATURE_XML_ROUNDTRIP</a>
911      * (identified by URI: http://xmlpull.org/v1/doc/features.html#xml-roundtrip)
912      * is enabled it is possible to do XML document round trip ie. reproduce
913      * exectly on output the XML input using getText():
914      * returned content is always unnormalized (exactly as in input).
915      * Otherwise returned content is end-of-line normalized as described
916      * <a HREF="http://www.w3.org/TR/REC-xml#sec-line-ends">XML 1.0 End-of-Line Handling</a>
917      * and. Also when this feature is enabled exact content of START_TAG, END_TAG,
918      * DOCDECL and PROCESSING_INSTRUCTION is available.
919      *
920      * <p>Here is the list of tokens that can be returned from nextToken()
921      * and what getText() and getTextCharacters() returns:<dl>
922      * <dt>START_DOCUMENT<dd>null
923      * <dt>END_DOCUMENT<dd>null
924      * <dt>START_TAG<dd>null unless FEATURE_XML_ROUNDTRIP
925      * enabled and then returns XML tag, ex: &lt;tag attr='val'>
926      * <dt>END_TAG<dd>null unless FEATURE_XML_ROUNDTRIP
927      * id enabled and then returns XML tag, ex: &lt;/tag>
928      * <dt>TEXT<dd>return element content.
929      * <br>Note: that element content may be delivered in multiple consecutive TEXT events.
930      * <dt>IGNORABLE_WHITESPACE<dd>return characters that are determined to be ignorable white
931      * space. If the FEATURE_XML_ROUNDTRIP is enabled all whitespace content outside root
932      * element will always reported as IGNORABLE_WHITESPACE otherise rteporting is optional.
933      * <br>Note: that element content may be delevered in multiple consecutive IGNORABLE_WHITESPACE events.
934      * <dt>CDSECT<dd>
935      * return text <em>inside</em> CDATA
936      * (ex. 'fo&lt;o' from &lt;!CDATA[fo&lt;o]]>)
937      * <dt>PROCESSING_INSTRUCTION<dd>
938      * if FEATURE_XML_ROUNDTRIP is true
939      * return exact PI content ex: 'pi foo' from &lt;?pi foo?>
940      * otherwise it may be exact PI content or concatenation of PI target,
941      * space and data so for example for
942      * &lt;?target data?> string &quot;target data&quot; may
943      * be returned if FEATURE_XML_ROUNDTRIP is false.
944      * <dt>COMMENT<dd>return comment content ex. 'foo bar' from &lt;!--foo bar-->
945      * <dt>ENTITY_REF<dd>getText() MUST return entity replacement text if PROCESS_DOCDECL is false
946      * otherwise getText() MAY return null,
947      * additionally getTextCharacters() MUST return entity name
948      * (for example 'entity_name' for &amp;entity_name;).
949      * <br><b>NOTE:</b> this is the only place where value returned from getText() and
950      * getTextCharacters() <b>are different</b>
951      * <br><b>NOTE:</b> it is user responsibility to resolve entity reference
952      * if PROCESS_DOCDECL is false and there is no entity replacement text set in
953      * defineEntityReplacementText() method (getText() will be null)
954      * <br><b>NOTE:</b> character entities (ex. &amp;#32;) and standard entities such as
955      * &amp;amp; &amp;lt; &amp;gt; &amp;quot; &amp;apos; are reported as well
956      * and are <b>not</b> reported as TEXT tokens but as ENTITY_REF tokens!
957      * This requirement is added to allow to do roundtrip of XML documents!
958      * <dt>DOCDECL<dd>
959      * if FEATURE_XML_ROUNDTRIP is true or PROCESS_DOCDECL is false
960      * then return what is inside of DOCDECL for example it returns:<pre>
961      * &quot; titlepage SYSTEM "http://www.foo.bar/dtds/typo.dtd"
962      * [&lt;!ENTITY % active.links "INCLUDE">]&quot;</pre>
963      * <p>for input document that contained:<pre>
964      * &lt;!DOCTYPE titlepage SYSTEM "http://www.foo.bar/dtds/typo.dtd"
965      * [&lt;!ENTITY % active.links "INCLUDE">]></pre>
966      * otherwise if FEATURE_XML_ROUNDTRIP is false and PROCESS_DOCDECL is true
967      * then what is returned is undefined (it may be even null)
968      * </dd>
969      * </dl>
970      *
971      * <p><strong>NOTE:</strong> there is no gurantee that there will only one TEXT or
972      * IGNORABLE_WHITESPACE event from nextToken() as parser may chose to deliver element content in
973      * multiple tokens (dividing element content into chunks)
974      *
975      * <p><strong>NOTE:</strong> whether returned text of token is end-of-line normalized
976      * is depending on FEATURE_XML_ROUNDTRIP.
977      *
978      * <p><strong>NOTE:</strong> XMLDecl (&lt;?xml ...?&gt;) is not reported but its content
979      * is available through optional properties (see class description above).
980      *
981      * @see #next
982      * @see #START_TAG
983      * @see #TEXT
984      * @see #END_TAG
985      * @see #END_DOCUMENT
986      * @see #COMMENT
987      * @see #DOCDECL
988      * @see #PROCESSING_INSTRUCTION
989      * @see #ENTITY_REF
990      * @see #IGNORABLE_WHITESPACE
991      */

992     int nextToken() throws XmlPullParserException, IOException;
993
994     //-----------------------------------------------------------------------------
995
// utility methods to mak XML parsing easier ...
996

997     /**
998      * Test if the current event is of the given type and if the
999      * namespace and name do match. null will match any namespace
1000     * and any name. If the test is not passed, an exception is
1001     * thrown. The exception text indicates the parser position,
1002     * the expected event and the current event that is not meeting the
1003     * requirement.
1004     *
1005     * <p>Essentially it does this
1006     * <pre>
1007     * if (type != getEventType()
1008     * || (namespace != null &amp;&amp; !namespace.equals( getNamespace () ) )
1009     * || (name != null &amp;&amp; !name.equals( getName() ) ) )
1010     * throw new XmlPullParserException( "expected "+ TYPES[ type ]+getPositionDescription());
1011     * </pre>
1012     */

1013    void require(int type, CharSequence namespace, CharSequence name)
1014            throws XmlPullParserException, IOException;
1015
1016    /**
1017     * If current event is START_TAG then if next element is TEXT then element content is returned
1018     * or if next event is END_TAG then empty string is returned, otherwise exception is thrown.
1019     * After calling this function successfully parser will be positioned on END_TAG.
1020     *
1021     * <p>The motivation for this function is to allow to parse consistently both
1022     * empty elements and elements that has non empty content, for example for input: <ol>
1023     * <li>&lt;tag&gt;foo&lt;/tag&gt;
1024     * <li>&lt;tag&gt;&lt;/tag&gt; (which is equivalent to &lt;tag/&gt;
1025     * both input can be parsed with the same code:
1026     * <pre>
1027     * p.nextTag()
1028     * p.requireEvent(p.START_TAG, "", "tag");
1029     * String content = p.nextText();
1030     * p.requireEvent(p.END_TAG, "", "tag");
1031     * </pre>
1032     * This function together with nextTag make it very easy to parse XML that has
1033     * no mixed content.
1034     *
1035     *
1036     * <p>Essentially it does this
1037     * <pre>
1038     * if(getEventType() != START_TAG) {
1039     * throw new XmlPullParserException(
1040     * "parser must be on START_TAG to read next text", this, null);
1041     * }
1042     * int eventType = next();
1043     * if(eventType == TEXT) {
1044     * String result = getText();
1045     * eventType = next();
1046     * if(eventType != END_TAG) {
1047     * throw new XmlPullParserException(
1048     * "event TEXT it must be immediately followed by END_TAG", this, null);
1049     * }
1050     * return result;
1051     * } else if(eventType == END_TAG) {
1052     * return "";
1053     * } else {
1054     * throw new XmlPullParserException(
1055     * "parser must be on START_TAG or TEXT to read text", this, null);
1056     * }
1057     * </pre>
1058     */

1059    CharSequence nextText() throws XmlPullParserException, IOException;
1060
1061    /**
1062     * Call next() and return event if it is START_TAG or END_TAG
1063     * otherwise throw an exception.
1064     * It will skip whitespace TEXT before actual tag if any.
1065     *
1066     * <p>essentially it does this
1067     * <pre>
1068     * int eventType = next();
1069     * if(eventType == TEXT &amp;&amp; isWhitespace()) { // skip whitespace
1070     * eventType = next();
1071     * }
1072     * if (eventType != START_TAG &amp;&amp; eventType != END_TAG) {
1073     * throw new XmlPullParserException("expected start or end tag", this, null);
1074     * }
1075     * return eventType;
1076     * </pre>
1077     */

1078    int nextTag() throws XmlPullParserException, IOException;
1079
1080}
1081
Popular Tags