KickJava   Java API By Example, From Geeks To Geeks.

Java > Open Source Codes > au > id > jericho > lib > html > Tag


1 // Jericho HTML Parser - Java based library for analysing and manipulating HTML
2
// Version 2.2
3
// Copyright (C) 2006 Martin Jericho
4
// http://sourceforge.net/projects/jerichohtml/
5
//
6
// This library is free software; you can redistribute it and/or
7
// modify it under the terms of the GNU Lesser General Public
8
// License as published by the Free Software Foundation; either
9
// version 2.1 of the License, or (at your option) any later version.
10
// http://www.gnu.org/copyleft/lesser.html
11
//
12
// This library is distributed in the hope that it will be useful,
13
// but WITHOUT ANY WARRANTY; without even the implied warranty of
14
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
15
// Lesser General Public License for more details.
16
//
17
// You should have received a copy of the GNU Lesser General Public
18
// License along with this library; if not, write to the Free Software
19
// Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
20

21 package au.id.jericho.lib.html;
22
23 import java.util.*;
24
25 /**
26  * Represents either a {@link StartTag} or {@link EndTag} in a specific {@linkplain Source source} document.
27  *
28  * <h3><a name="ParsingProcess">Tag Parsing Process</a></h3>
29  * The following process describes how each tag is identified by the parser:
30  * <ol class="Separated">
31  * <li>
32  * Every '<code>&lt;</code>' character found in the source document is considered to be the start of a tag.
33  * The characters following it are compared with the {@linkplain TagType#getStartDelimiter() start delimiters}
34  * of all the {@linkplain TagType#register() registered} {@linkplain TagType tag types}, and a list of matching tag types
35  * is determined.
36  * <li>
37  * A more detailed analysis of the source is performed according to the features of each matching tag type from the first step,
38  * in order of <a HREF="TagType.html#Precedence">precedence</a>, until a valid tag is able to be constructed.
39  * <p>
40  * The analysis performed in relation to each candidate tag type is a two-stage process:
41  * <ol>
42  * <li>
43  * The position of the tag is checked to determine whether it is {@linkplain TagType#isValidPosition(Source,int) valid}.
44  * In theory, a {@linkplain TagType#isServerTag() server tag} is valid in any position, but a non-server tag is not valid inside another non-server tag.
45  * <p>
46  * The {@link TagType#isValidPosition(Source, int pos)} method is responsible for this check and has a common default implementation for all tag types
47  * (although <a HREF="TagType.html#custom">custom</a> tag types can override it if necessary).
48  * Its behaviour differs depending on whether or not a {@linkplain Source#fullSequentialParse() full sequential parse} is peformed.
49  * See the documentation of the {@link TagType#isValidPosition(Source,int) isValidPosition} method for full details.
50  * <li>
51  * A final analysis is performed by the {@link TagType#constructTagAt(Source, int pos)} method of the candidate tag type.
52  * This method returns a valid {@link Tag} object if all conditions of the candidate tag type are met, otherwise it returns
53  * <code>null</code> and the process continues with the next candidate tag type.
54  * </ol>
55  * <li>
56  * If the source does not match the start delimiter or syntax of any registered tag type, the segment spanning it and the next
57  * '<code>&gt;</code>' character is taken to be an {@linkplain #isUnregistered() unregistered} tag.
58  * Some tag search methods ignore unregistered tags. See the {@link #isUnregistered()} method for more information.
59  * </ol>
60  * <p>
61  * See the documentation of the {@link TagType} class for more details on how tags are recognised.
62  *
63  * <h3><a name="TagSearchMethods">Tag Search Methods</a></h3>
64  * <p>
65  * Methods that find tags in a source document are collectively referred to as <i>Tag Search Methods</i>.
66  * They are found mostly in the {@link Source} and {@link Segment} classes, and can be generally categorised as follows:
67  * <dl class="Separated">
68  * <dt><a name="OpenSearch">Open Search:</a>
69  * <dd>These methods search for tags of any {@linkplain #getName() name} and {@linkplain #getTagType() type}.
70  * <ul class="Unseparated">
71  * <li>{@link Tag#findNextTag()}
72  * <li>{@link Tag#findPreviousTag()}
73  * <li>{@link Segment#findAllElements()}
74  * <li>{@link Segment#findAllTags()}
75  * <li>{@link Source#getTagAt(int pos)}
76  * <li>{@link Source#findPreviousTag(int pos)}
77  * <li>{@link Source#findNextTag(int pos)}
78  * <li>{@link Source#findEnclosingTag(int pos)}
79  * <li>{@link Segment#findAllStartTags()}
80  * <li>{@link Source#findPreviousStartTag(int pos)}
81  * <li>{@link Source#findNextStartTag(int pos)}
82  * <li>{@link Source#findPreviousEndTag(int pos)}
83  * <li>{@link Source#findNextEndTag(int pos)}
84  * </ul>
85  * <dt><a name="NamedSearch">Named Search:</a>
86  * <dd>These methods usually include a parameter called <code>name</code> which is used to specify the {@linkplain #getName() name} of the
87  * tag to search for. In some cases named search methods do not require this parameter because the context or name of the method implies
88  * the name to search for.
89  * In tag search methods specifically looking for start tags, specifying a name that ends in a colon (<code>:</code>)
90  * searches for all start tags in the specified XML namespace.
91  * <ul class="Unseparated">
92  * <li>{@link Segment#findAllElements(String name)}
93  * <li>{@link Segment#findAllStartTags(String name)}
94  * <li>{@link Source#findPreviousStartTag(int pos, String name)}
95  * <li>{@link Source#findNextStartTag(int pos, String name)}
96  * <li>{@link Source#findPreviousEndTag(int pos, String name)}
97  * <li>{@link Source#findNextEndTag(int pos, String name)}
98  * <li>{@link Source#findNextEndTag(int pos, String name, EndTagType)}
99  * </ul>
100  * <dt><a name="TagTypeSearch">Tag Type Search:</a>
101  * <dd>These methods usually include a parameter called <code>tagType</code> which is used to specify the {@linkplain #getTagType() type} of the
102  * tag to search for. In some methods the search parameter is restricted to the {@link StartTagType} subclass of <code>TagType</code>.
103  * <ul class="Unseparated">
104  * <li>{@link Segment#findAllElements(StartTagType)}
105  * <li>{@link Segment#findAllTags(TagType)}
106  * <li>{@link Source#findPreviousTag(int pos, TagType)}
107  * <li>{@link Source#findNextTag(int pos, TagType)}
108  * <li>{@link Source#findEnclosingTag(int pos, TagType)}
109  * <li>{@link Source#findNextEndTag(int pos, String name, EndTagType)}
110  * </ul>
111  * <dt><a name="OtherSearch">Other Search:</a>
112  * <dd>A small number of methods do not fall into any of the above categories, such as the methods that search on
113  * {@linkplain Source#findNextStartTag(int pos, String attributeName, String value, boolean valueCaseSensitive) attribute values}.
114  * <ul class="Unseparated">
115  * <li>{@link Segment#findAllStartTags(String attributeName, String value, boolean valueCaseSensitive)}
116  * <li>{@link Source#findNextStartTag(int pos, String attributeName, String value, boolean valueCaseSensitive)}
117  * </ul>
118  * </dl>
119  */

120 public abstract class Tag extends Segment implements HTMLElementName {
121     String JavaDoc name=null; // always lower case, can always use == operator to compare with constants in HTMLElementName interface
122
Element element=Element.NOT_CACHED; // cache
123
int allTagsArrayIndex=-1;
124     private Object JavaDoc userData=null;
125
126     /**
127      * {@linkplain StartTagType#XML_PROCESSING_INSTRUCTION XML processing instruction}
128      * @deprecated Use {@link StartTagType#XML_PROCESSING_INSTRUCTION} in combination with <a HREF="#TagTypeSearch">tag type search</a> methods instead.
129      */

130     public static final String JavaDoc PROCESSING_INSTRUCTION=StartTagType.XML_PROCESSING_INSTRUCTION.getNamePrefixForTagConstant();
131
132     /**
133      * {@linkplain StartTagType#XML_DECLARATION XML declaration}
134      * @deprecated Use {@link StartTagType#XML_DECLARATION} in combination with <a HREF="#TagTypeSearch">tag type search</a> methods instead.
135      */

136     public static final String JavaDoc XML_DECLARATION=StartTagType.XML_DECLARATION.getNamePrefixForTagConstant();
137
138     /**
139      * {@linkplain StartTagType#DOCTYPE_DECLARATION document type declaration}
140      * @deprecated Use {@link StartTagType#DOCTYPE_DECLARATION} in combination with <a HREF="#TagTypeSearch">tag type search</a> methods instead.
141      */

142     public static final String JavaDoc DOCTYPE_DECLARATION=StartTagType.DOCTYPE_DECLARATION.getNamePrefixForTagConstant();
143
144     /**
145      * {@linkplain PHPTagTypes#PHP_STANDARD Standard PHP} tag (<code>&lt;&#63;php &#46;&#46;&#46; &#63;&gt;</code>)
146      * @deprecated Use {@link PHPTagTypes#PHP_STANDARD} in combination with <a HREF="#TagTypeSearch">tag type search</a> methods instead.
147      */

148     public static final String JavaDoc SERVER_PHP=PHPTagTypes.PHP_STANDARD.getNamePrefixForTagConstant();
149
150     /**
151      * Common {@linkplain StartTagType#SERVER_COMMON server} tag (<code>&lt;% &#46;&#46;&#46; %&gt;</code>)
152      * @deprecated Use {@link StartTagType#SERVER_COMMON} in combination with <a HREF="#TagTypeSearch">tag type search</a> methods instead.
153      */

154     public static final String JavaDoc SERVER_COMMON=StartTagType.SERVER_COMMON.getNamePrefixForTagConstant();
155
156     /**
157      * {@linkplain MasonTagTypes#MASON_NAMED_BLOCK Mason named block} (<code>&lt;%<i>name</i> &#46;&#46;&#46; &gt; &#46;&#46;&#46; &lt;/%<i>name</i>&gt;</code>)
158      * @deprecated Use {@link MasonTagTypes#MASON_NAMED_BLOCK} in combination with <a HREF="#TagTypeSearch">tag type search</a> methods instead.
159      */

160     public static final String JavaDoc SERVER_MASON_NAMED_BLOCK=MasonTagTypes.MASON_NAMED_BLOCK.getNamePrefixForTagConstant(); // NOTE: this value is the same value as SERVER_COMMON
161

162     /**
163      * {@linkplain MasonTagTypes#MASON_COMPONENT_CALL Mason component call} (<code>&lt;&amp; &#46;&#46;&#46; &amp;&gt;</code>)
164      * @deprecated Use {@link MasonTagTypes#MASON_COMPONENT_CALL} in combination with <a HREF="#TagTypeSearch">tag type search</a> methods instead.
165      */

166     public static final String JavaDoc SERVER_MASON_COMPONENT_CALL=MasonTagTypes.MASON_COMPONENT_CALL.getNamePrefixForTagConstant();
167
168     /**
169      * {@linkplain MasonTagTypes#MASON_COMPONENT_CALLED_WITH_CONTENT Mason component called with content} (<code>&lt;&amp;| &#46;&#46;&#46; &amp;&gt; &#46;&#46;&#46; &lt;/&amp;&gt;</code>)
170      * @deprecated Use {@link MasonTagTypes#MASON_COMPONENT_CALLED_WITH_CONTENT} in combination with <a HREF="#TagTypeSearch">tag type search</a> methods instead.
171      */

172     public static final String JavaDoc SERVER_MASON_COMPONENT_CALLED_WITH_CONTENT=MasonTagTypes.MASON_COMPONENT_CALLED_WITH_CONTENT.getNamePrefixForTagConstant();
173
174     private static final boolean INCLUDE_UNREGISTERED_IN_SEARCH=false; // determines whether unregistered tags are included in searches
175

176     Tag(final Source source, final int begin, final int end, final String JavaDoc name) {
177         super(source, begin, end);
178         this.name=HTMLElements.getConstantElementName(name.toLowerCase());
179     }
180
181     /**
182      * Returns the {@linkplain Element element} that is started or ended by this tag.
183      * <p>
184      * {@link StartTag#getElement()} is guaranteed not <code>null</code>.
185      * <p>
186      * {@link EndTag#getElement()} can return <code>null</code> if the end tag is not properly matched to a start tag.
187      *
188      * @return the {@linkplain Element element} that is started or ended by this tag.
189      */

190     public abstract Element getElement();
191
192     /**
193      * Returns the name of this tag, always in lower case.
194      * <p>
195      * The name always starts with the {@linkplain TagType#getNamePrefix() name prefix} defined in this tag's {@linkplain TagType type}.
196      * For some tag types, the name consists only of this prefix, while in others it must be followed by a valid
197      * <a target="_blank" HREF="http://www.w3.org/TR/REC-xml/#NT-Name">XML name</a>
198      * (see {@link StartTagType#isNameAfterPrefixRequired()}).
199      * <p>
200      * If the name is equal to one of the constants defined in the {@link HTMLElementName} interface, this method is guaranteed to return
201      * the constant itself.
202      * This allows comparisons to be performed using the <code>==</code> operator instead of the less efficient
203      * <code>String.equals(Object)</code> method.
204      * <p>
205      * For example, the following expression can be used to test whether a {@link StartTag} is from a
206      * <code><a target="_blank" HREF="http://www.w3.org/TR/html401/interact/forms.html#edef-SELECT">SELECT</a></code> element:
207      * <br /><code>startTag.getName()==HTMLElementName.SELECT</code>
208      * <p>
209      * To get the name of this tag in its original case, use {@link #getNameSegment()}<code>.toString()</code>.
210      *
211      * @return the name of this tag, always in lower case.
212      */

213     public final String JavaDoc getName() {
214         return name;
215     }
216
217     /**
218      * Returns the segment spanning the {@linkplain #getName() name} of this tag.
219      * <p>
220      * The code <code>getNameSegment().toString()</code> can be used to retrieve the name of this tag in its original case.
221      * <p>
222      * Every call to this method constructs a new <code>Segment</code> object.
223      *
224      * @return the segment spanning the {@linkplain #getName() name} of this tag.
225      * @see #getName()
226      */

227     public Segment getNameSegment() {
228         final int nameSegmentBegin=begin+getTagType().startDelimiterPrefix.length();
229         return new Segment(source,nameSegmentBegin,nameSegmentBegin+name.length());
230     }
231
232     /**
233      * Returns the {@linkplain TagType type} of this tag.
234      * @return the {@linkplain TagType type} of this tag.
235      */

236     public abstract TagType getTagType();
237
238     /**
239      * Returns the general purpose user data object that has previously been associated with this tag via the {@link #setUserData(Object)} method.
240      * <p>
241      * If {@link #setUserData(Object)} has not been called, this method returns <code>null</code>.
242      *
243      * @return the generic data object that has previously been associated with this tag via the {@link #setUserData(Object)} method.
244      */

245     public Object JavaDoc getUserData() {
246         return userData;
247     }
248
249     /**
250      * Associates the specified general purpose user data object with this tag.
251      * <p>
252      * This property can be useful for applications that need to associate extra information with tags.
253      * The object can be retrieved later via the {@link #getUserData()} method.
254      *
255      * @param userData general purpose user data of any type.
256      */

257     public void setUserData(final Object JavaDoc userData) {
258         this.userData=userData;
259     }
260
261     /**
262      * Returns the next tag in the source document.
263      * <p>
264      * If a {@linkplain Source#fullSequentialParse() full sequential parse} has been performed, this method is very efficient.
265      * <p>
266      * If not, it is equivalent to <code>source.</code>{@link Source#findNextTag(int) findNextTag}<code>(</code>{@link #getBegin()}<code>+1)</code>.
267      * <p>
268      * See the {@link Tag} class documentation for more details about the behaviour of this method.
269      *
270      * @return the next tag in the source document, or <code>null</code> if this is the last tag.
271      */

272     public Tag findNextTag() {
273         final Tag[] allTagsArray=source.allTagsArray;
274         if (allTagsArray!=null) {
275             final int nextAllTagsArrayIndex=allTagsArrayIndex+1;
276             if (allTagsArray.length==nextAllTagsArrayIndex) return null;
277             return allTagsArray[nextAllTagsArrayIndex];
278         } else {
279             return source.findNextTag(begin+1);
280         }
281     }
282
283     /**
284      * Returns the previous tag in the source document.
285      * <p>
286      * If a {@linkplain Source#fullSequentialParse() full sequential parse} has been performed, this method is very efficient.
287      * <p>
288      * If not, it is equivalent to <code>source.</code>{@link Source#findPreviousTag(int) findPreviousTag}<code>(</code>{@link #getBegin()}<code>-1)</code>.
289      * <p>
290      * See the {@link Tag} class documentation for more details about the behaviour of this method.
291      *
292      * @return the previous tag in the source document, or <code>null</code> if this is the first tag.
293      */

294     public Tag findPreviousTag() {
295         final Tag[] allTagsArray=source.allTagsArray;
296         if (allTagsArray!=null) {
297             if (allTagsArrayIndex==0) return null;
298             return allTagsArray[allTagsArrayIndex-1];
299         } else {
300             if (begin==0) return null;
301             return source.findPreviousTag(begin-1);
302         }
303     }
304
305     /**
306      * Indicates whether this tag has a syntax that does not match any of the {@linkplain TagType#register() registered} {@linkplain TagType tag types}.
307      * <p>
308      * The only requirement of an unregistered tag type is that it {@linkplain TagType#getStartDelimiter() starts} with
309      * '<code>&lt;</code>' and there is a {@linkplain TagType#getClosingDelimiter() closing} '<code>&gt;</code>' character
310      * at some position after it in the source document.
311      * <p>
312      * The absence or presence of a '<code>/</code>' character after the initial '<code>&lt;</code>' determines whether an
313      * unregistered tag is respectively a
314      * {@link StartTag} with a {@linkplain #getTagType() type} of {@link StartTagType#UNREGISTERED} or an
315      * {@link EndTag} with a {@linkplain #getTagType() type} of {@link EndTagType#UNREGISTERED}.
316      * <p>
317      * There are no restrictions on the characters that might appear between these delimiters, including other '<code>&lt;</code>'
318      * characters. This may result in a '<code>&gt;</code>' character that is identified as the closing delimiter of two
319      * separate tags, one an unregistered tag, and the other a tag of any type that {@linkplain #getBegin() begins} in the middle
320      * of the unregistered tag. As explained below, unregistered tags are usually only found when specifically looking for them,
321      * so it is up to the user to detect and deal with any such nonsensical results.
322      * <p>
323      * Unregistered tags are only returned by the {@link Source#getTagAt(int pos)} method,
324      * <a HREF="Tag.html#NamedSearch">named search</a> methods, where the specified <code>name</code>
325      * matches the first characters inside the tag, and by <a HREF="Tag.html#TagTypeSearch">tag type search</a> methods, where the
326      * specified <code>tagType</code> is either {@link StartTagType#UNREGISTERED} or {@link EndTagType#UNREGISTERED}.
327      * <p>
328      * <a HREF="Tag.html#OpenSearch">Open</a> tag searches and <a HREF="Tag.html#OtherSearch">other</a> searches always ignore
329      * unregistered tags, although every discovery of an unregistered tag is {@linkplain Source#setLogWriter(Writer) logged} by the parser.
330      * <p>
331      * The logic behind this design is that unregistered tag types are usually the result of a '<code>&lt;</code>' character
332      * in the text that was mistakenly left {@linkplain CharacterReference#encode(CharSequence) unencoded}, or a less-than
333      * operator inside a script, or some other occurrence which is of no interest to the user.
334      * By returning unregistered tags in <a HREF="Tag.html#NamedSearch">named</a> and <a HREF="Tag.html#TagTypeSearch">tag type</a>
335      * search methods, the library allows the user to specifically search for tags with a certain syntax that does not match any
336      * existing {@link TagType}. This expediency feature avoids the need for the user to create a
337      * <a HREF="TagType.html#Custom">custom tag type</a> to define the syntax before searching for these tags.
338      * By not returning unregistered tags in the less specific search methods, it is providing only the information that
339      * most users are interested in.
340      *
341      * @return <code>true</code> if this tag has a syntax that does not match any of the {@linkplain TagType#register() registered} {@linkplain TagType tag types}, otherwise <code>false</code>.
342      */

343     public abstract boolean isUnregistered();
344
345     /**
346      * Returns an XML representation of this tag.
347      * <p>
348      * This is an abstract method which is implemented in the {@link StartTag} and {@link EndTag} subclasses.
349      * See the documentation of the {@link StartTag#tidy()} and {@link EndTag#tidy()} methods for details.
350      *
351      * @return an XML representation of this tag.
352      */

353     public abstract String JavaDoc tidy();
354
355     /**
356      * Indicates whether the specified text is a valid <a target="_blank" HREF="http://www.w3.org/TR/REC-xml/#NT-Name">XML Name</a>.
357      * <p>
358      * This implementation first checks that the first character of the specified text is a valid XML Name start character
359      * as defined by the {@link #isXMLNameStartChar(char)} method, and then checks that the rest of the characters are valid
360      * XML Name characters as defined by the {@link #isXMLNameChar(char)} method.
361      * <p>
362      * Note that this implementation does not exactly adhere to the
363      * <a target="_blank" HREF="http://www.w3.org/TR/REC-xml/#NT-Name">formal definition of an XML Name</a>,
364      * but the differences are unlikely to be significant in real-world XML or HTML documents.
365      *
366      * @param text the text to test.
367      * @return <code>true</code> if the specified text is a valid <a target="_blank" HREF="http://www.w3.org/TR/REC-xml/#NT-Name">XML Name</a>, otherwise <code>false</code>.
368      * @see Source#findNameEnd(int pos)
369      */

370     public static final boolean isXMLName(final CharSequence JavaDoc text) {
371         if (text==null || text.length()==0 || !isXMLNameStartChar(text.charAt(0))) return false;
372         for (int i=1; i<text.length(); i++)
373             if (!isXMLNameChar(text.charAt(i))) return false;
374         return true;
375     }
376
377     /**
378      * Indicates whether the specified character is valid at the start of an
379      * <a target="_blank" HREF="http://www.w3.org/TR/REC-xml/#NT-Name">XML Name</a>.
380      * <p>
381      * The <a target="_blank" HREF="http://www.w3.org/TR/REC-xml/#sec-common-syn">XML 1.0 specification section 2.3</a> defines a
382      * <code><a target="_blank" HREF="http://www.w3.org/TR/REC-xml/#NT-Name">Name</a></code> as starting with one of the characters
383      * <br /><code>(<a target="_blank" HREF="http://www.w3.org/TR/REC-xml/#NT-Letter">Letter</a> | '_' | ':')</code>.
384      * <p>
385      * This method uses the expression
386      * <br /><code>Character.isLetter(ch) || ch=='_' || ch==':'</code>.
387      * <p>
388      * Note that there are many differences between the <code>Character.isLetter()</code> definition of a Letter and the
389      * <a target="_blank" HREF="http://www.w3.org/TR/REC-xml/#NT-Letter">XML definition of a Letter</a>,
390      * but these differences are unlikely to be significant in real-world XML or HTML documents.
391      *
392      * @param ch the character to test.
393      * @return <code>true</code> if the specified character is valid at the start of an <a target="_blank" HREF="http://www.w3.org/TR/REC-xml/#NT-Name">XML Name</a>, otherwise <code>false</code>.
394      * @see Source#findNameEnd(int pos)
395      */

396     public static final boolean isXMLNameStartChar(final char ch) {
397         return Character.isLetter(ch) || ch=='_' || ch==':';
398     }
399
400     /**
401      * Indicates whether the specified character is valid anywhere in an
402      * <a target="_blank" HREF="http://www.w3.org/TR/REC-xml/#NT-Name">XML Name</a>.
403      * <p>
404      * The <a target="_blank" HREF="http://www.w3.org/TR/REC-xml/#sec-common-syn">XML 1.0 specification section 2.3</a> uses the
405      * entity <code><a target="_blank" HREF="http://www.w3.org/TR/REC-xml/#NT-NameChar">NameChar</a></code> to represent this set of
406      * characters, which is defined as
407      * <br /><code>(<a target="_blank" HREF="http://www.w3.org/TR/REC-xml/#NT-Letter">Letter</a>
408      * | <a target="_blank" HREF="http://www.w3.org/TR/REC-xml/#NT-Digit">Digit</a> | '.' | '-' | '_' | ':'
409      * | <a target="_blank" HREF="http://www.w3.org/TR/REC-xml/#NT-CombiningChar">CombiningChar</a>
410      * | <a target="_blank" HREF="http://www.w3.org/TR/REC-xml/#NT-Extender">Extender</a>)</code>.
411      * <p>
412      * This method uses the expression
413      * <br /><code>Character.isLetterOrDigit(ch) || ch=='.' || ch=='-' || ch=='_' || ch==':'</code>.
414      * <p>
415      * Note that there are many differences between these definitions,
416      * but these differences are unlikely to be significant in real-world XML or HTML documents.
417      *
418      * @param ch the character to test.
419      * @return <code>true</code> if the specified character is valid anywhere in an <a target="_blank" HREF="http://www.w3.org/TR/REC-xml/#NT-Name">XML Name</a>, otherwise <code>false</code>.
420      * @see Source#findNameEnd(int pos)
421      */

422     public static final boolean isXMLNameChar(final char ch) {
423         return Character.isLetterOrDigit(ch) || ch=='.' || ch=='-' || ch=='_' || ch==':';
424     }
425
426     /**
427      * Regenerates the HTML text of this tag.
428      * <p>
429      * This method has been deprecated as of version 2.2 and replaced with the exactly equivalent {@link #tidy()} method.
430      *
431      * @return the regenerated HTML text of this tag.
432      * @deprecated Use {@link #tidy()} instead.
433      */

434     public abstract String JavaDoc regenerateHTML();
435
436     final boolean includeInSearch() {
437         return INCLUDE_UNREGISTERED_IN_SEARCH || !isUnregistered();
438     }
439
440     static final Tag findPreviousOrNextTag(final Source source, final int pos, final boolean previous) {
441         // returns null if pos is out of range.
442
return source.useAllTypesCache
443             ? source.cache.findPreviousOrNextTag(pos,previous)
444             : findPreviousOrNextTagUncached(source,pos,previous,ParseText.NO_BREAK);
445     }
446         
447     static final Tag findPreviousOrNextTagUncached(final Source source, final int pos, final boolean previous, final int breakAtPos) {
448         // returns null if pos is out of range.
449
try {
450             final ParseText parseText=source.getParseText();
451             int begin=pos;
452             do {
453                 begin=previous?parseText.lastIndexOf('<',begin,breakAtPos):parseText.indexOf('<',begin,breakAtPos); // this assumes that all tags start with '<'
454
// parseText.lastIndexOf and indexOf return -1 if pos is out of range.
455
if (begin==-1) return null;
456                 final Tag tag=getTagAt(source,begin);
457                 if (tag!=null && tag.includeInSearch()) return tag;
458             } while (previous ? (begin-=1)>=0 : (begin+=1)<source.end);
459         } catch (IndexOutOfBoundsException JavaDoc ex) {
460             // this should only happen when the end of file is reached in the middle of a tag.
461
// we don't have to do anything to handle it as there are no more tags anyway.
462
}
463         return null;
464     }
465
466     static final Tag findPreviousOrNextTag(final Source source, final int pos, final TagType tagType, final boolean previous) {
467         // returns null if pos is out of range.
468
if (source.useSpecialTypesCache) return source.cache.findPreviousOrNextTag(pos,tagType,previous);
469         return findPreviousOrNextTagUncached(source,pos,tagType,previous,ParseText.NO_BREAK);
470     }
471
472     static final Tag findPreviousOrNextTagUncached(final Source source, final int pos, final TagType tagType, final boolean previous, final int breakAtPos) {
473         // returns null if pos is out of range.
474
if (tagType==null) return findPreviousOrNextTagUncached(source,pos,previous,breakAtPos);
475         final char[] startDelimiterCharArray=tagType.getStartDelimiterCharArray();
476         try {
477             final ParseText parseText=source.getParseText();
478             int begin=pos;
479             do {
480                 begin=previous?parseText.lastIndexOf(startDelimiterCharArray,begin,breakAtPos):parseText.indexOf(startDelimiterCharArray,begin,breakAtPos);
481                 // parseText.lastIndexOf and indexOf return -1 if pos is out of range.
482
if (begin==-1) return null;
483                 final Tag tag=getTagAt(source,begin);
484                 if (tag!=null && tag.getTagType()==tagType) return tag;
485             } while (previous ? (begin-=1)>=0 : (begin+=1)<source.end);
486         } catch (IndexOutOfBoundsException JavaDoc ex) {
487             // this should only happen when the end of file is reached in the middle of a tag.
488
// we don't have to do anything to handle it as there are no more tags anyway.
489
}
490         return null;
491     }
492
493     static final Tag getTagAt(final Source source, final int pos) {
494         // returns null if pos is out of range.
495
return source.useAllTypesCache
496             ? source.cache.getTagAt(pos)
497             : getTagAtUncached(source,pos);
498     }
499
500     static final Tag getTagAtUncached(final Source source, final int pos) {
501         // returns null if pos is out of range.
502
return TagType.getTagAt(source,pos,false);
503     }
504
505     static final Tag[] parseAll(final Source source, final boolean assumeNoNestedTags) {
506         int registeredTagCount=0;
507         int registeredStartTagCount=0;
508         final ArrayList list=new ArrayList();
509         if (source.end!=0) {
510             final ParseText parseText=source.getParseText();
511             Tag tag=parseAllFindNextTag(source,parseText,0,assumeNoNestedTags);
512             while (tag!=null) {
513                 list.add(tag);
514                 if (!tag.isUnregistered()) {
515                     registeredTagCount++;
516                     if (tag instanceof StartTag) registeredStartTagCount++;
517                 }
518                 // Look for next tag after end of next tag if we're assuming tags don't appear inside other tags, as long as the last tag found was not an unregistered tag:
519
final int pos=(assumeNoNestedTags && !tag.isUnregistered()) ? tag.end : tag.begin+1;
520                 if (pos==source.end) break;
521                 tag=parseAllFindNextTag(source,parseText,pos,assumeNoNestedTags);
522             }
523         }
524         final Tag[] allRegisteredTags=new Tag[registeredTagCount];
525         final StartTag[] allRegisteredStartTags=new StartTag[registeredStartTagCount];
526         source.cache.loadAllTags(list,allRegisteredTags,allRegisteredStartTags);
527         source.allTagsArray=allRegisteredTags;
528         source.allTags=Arrays.asList(allRegisteredTags);
529         source.allStartTags=Arrays.asList(allRegisteredStartTags);
530         for (int i=0; i<allRegisteredTags.length; i++) allRegisteredTags[i].allTagsArrayIndex=i;
531         return allRegisteredTags;
532     }
533
534     private static final Tag parseAllFindNextTag(final Source source, final ParseText parseText, final int pos, final boolean assumeNoNestedTags) {
535         try {
536             int begin=pos;
537             do {
538                 begin=parseText.indexOf('<',begin); // this assumes that all tags start with '<'
539
if (begin==-1) return null;
540                 final Tag tag=TagType.getTagAt(source,begin,assumeNoNestedTags);
541                 if (tag!=null) {
542                     if (!assumeNoNestedTags) {
543                         final TagType tagType=tag.getTagType();
544                         if (tag.end>source.endOfLastTagIgnoringEnclosedMarkup
545                                 && !tagType.isServerTag()
546                                 && tagType!=StartTagType.DOCTYPE_DECLARATION
547                                 && tagType!=StartTagType.UNREGISTERED && tagType!=EndTagType.UNREGISTERED)
548                             source.endOfLastTagIgnoringEnclosedMarkup=tag.end;
549                     }
550                     return tag;
551                 }
552             } while ((begin+=1)<source.end);
553         } catch (IndexOutOfBoundsException JavaDoc ex) {
554             // this should only happen when the end of file is reached in the middle of a tag.
555
// we don't have to do anything to handle it as there are no more tags anyway.
556
}
557         return null;
558     }
559
560     // delete when deprecated Source.getNextTagIterator method is removed
561
static Iterator getNextTagIterator(final Source source, final int pos) {
562         return new NextTagIterator(source,pos);
563     }
564
565     private static final class NextTagIterator implements Iterator {
566         private Tag nextTag=null;
567
568         public NextTagIterator(final Source source, final int pos) {
569             nextTag=findPreviousOrNextTag(source,pos,false);
570         }
571
572         public boolean hasNext() {
573             return nextTag!=null;
574         }
575
576         public Object JavaDoc next() {
577             final Tag result=nextTag;
578             try {
579                 nextTag=findPreviousOrNextTag(result.source,result.begin+1,false);
580             } catch (NullPointerException JavaDoc ex) {
581                 throw new NoSuchElementException();
582             }
583             return result;
584         }
585
586         public void remove() {
587             throw new UnsupportedOperationException JavaDoc();
588         }
589     }
590 }
591
Popular Tags