KickJava   Java API By Example, From Geeks To Geeks.

Java > Open Source Codes > org > dom4j > io > HTMLWriter


1 /*
2  * Copyright 2001-2005 (C) MetaStuff, Ltd. All Rights Reserved.
3  *
4  * This software is open source.
5  * See the bottom of this file for the licence.
6  */

7
8 package org.dom4j.io;
9
10 import java.io.IOException JavaDoc;
11 import java.io.OutputStream JavaDoc;
12 import java.io.StringWriter JavaDoc;
13 import java.io.UnsupportedEncodingException JavaDoc;
14 import java.io.Writer JavaDoc;
15 import java.util.HashSet JavaDoc;
16 import java.util.Iterator JavaDoc;
17 import java.util.Set JavaDoc;
18 import java.util.Stack JavaDoc;
19
20 import org.dom4j.Document;
21 import org.dom4j.DocumentHelper;
22 import org.dom4j.Element;
23 import org.dom4j.Entity;
24 import org.dom4j.Node;
25
26 import org.xml.sax.SAXException JavaDoc;
27
28 /**
29  * <p>
30  * <code>HTMLWriter</code> takes a DOM4J tree and formats it to a stream as
31  * HTML. This formatter is similar to XMLWriter but it outputs the text of CDATA
32  * and Entity sections rather than the serialised format as in XML, it has an
33  * XHTML mode, it retains whitespace in certain elements such as &lt;PRE&gt;,
34  * and it supports certain elements which have no corresponding close tag such
35  * as for &lt;BR&gt; and &lt;P&gt;.
36  * </p>
37  *
38  * <p>
39  * The OutputFormat passed in to the constructor is checked for isXHTML() and
40  * isExpandEmptyElements(). See {@link OutputFormat OutputFormat}for details.
41  * Here are the rules for <b>this class </b> based on an OutputFormat, "format",
42  * passed in to the constructor: <br/><br/>
43  *
44  * <ul>
45  * <li>If an element is in {@link #getOmitElementCloseSet()
46  * getOmitElementCloseSet}, then it is treated specially:
47  *
48  * <ul>
49  * <li>It never expands, since some browsers treat this as two separate
50  * Horizontal Rules: &lt;HR&gt;&lt;/HR&gt;</li>
51  * <li>If {@link org.dom4j.io.OutputFormat#isXHTML() format.isXHTML()}, then
52  * it has a space before the closing single-tag slash, since Netscape 4.x-
53  * treats this: &lt;HR /&gt; as an element named "HR" with an attribute named
54  * "/", but that's better than when it refuses to recognize this: &lt;hr/&gt;
55  * which it thinks is an element named "HR/".</li>
56  * </ul>
57  *
58  * </li>
59  * <li>If {@link org.dom4j.io.OutputFormat#isXHTML() format.isXHTML()}, all
60  * elements must have either a close element, or be a closed single tag.</li>
61  * <li>If {@link org.dom4j.io.OutputFormat#isExpandEmptyElements()
62  * format.isExpandEmptyElements()}() is true, all elements are expanded except
63  * as above.</li>
64  * </ul>
65  *
66  * <b>Examples </b>
67  * </p>
68  *
69  * <p>
70  * </p>
71  *
72  * <p>
73  * If isXHTML == true, CDATA sections look like this:
74  *
75  * <PRE>
76  *
77  * <b>&lt;myelement&gt;&lt;![CDATA[My data]]&gt;&lt;/myelement&gt; </b>
78  *
79  * </PRE>
80  *
81  * Otherwise, they look like this:
82  *
83  * <PRE>
84  *
85  * <b>&lt;myelement&gt;My data&lt;/myelement&gt; </b>
86  *
87  * </PRE>
88  *
89  * </p>
90  *
91  * <p>
92  * Basically, {@link OutputFormat.isXHTML() OutputFormat.isXHTML()} ==
93  * <code>true</code> will produce valid XML, while {@link
94  * org.dom4j.io.OutputFormat#isExpandEmptyElements()
95  * format.isExpandEmptyElements()} determines whether empty elements are
96  * expanded if isXHTML is true, excepting the special HTML single tags.
97  * </p>
98  *
99  * <p>
100  * Also, HTMLWriter handles tags whose contents should be preformatted, that is,
101  * whitespace-preserved. By default, this set includes the tags &lt;PRE&gt;,
102  * &lt;SCRIPT&gt;, &lt;STYLE&gt;, and &lt;TEXTAREA&gt;, case insensitively. It
103  * does not include &lt;IFRAME&gt;. Other tags, such as &lt;CODE&gt;,
104  * &lt;KBD&gt;, &lt;TT&gt;, &lt;VAR&gt;, are usually rendered in a different
105  * font in most browsers, but don't preserve whitespace, so they also don't
106  * appear in the default list. HTML Comments are always whitespace-preserved.
107  * However, the parser you use may store comments with linefeed-only text nodes
108  * (\n) even if your platform uses another line.separator character, and
109  * HTMLWriter outputs Comment nodes exactly as the DOM is set up by the parser.
110  * See examples and discussion here: {@link#setPreformattedTags(java.util.Set)
111  * setPreformattedTags}
112  * </p>
113  *
114  * <p>
115  * <b>Examples </b>
116  * </p>
117  * <blockquote>
118  * <p>
119  * <b>Pretty Printing </b>
120  * </p>
121  *
122  * <p>
123  * This example shows how to pretty print a string containing a valid HTML
124  * document to a string. You can also just call the static methods of this
125  * class: <br>
126  * {@link #prettyPrintHTML(String) prettyPrintHTML(String)}or <br>
127  * {@link #prettyPrintHTML(String,boolean,boolean,boolean,boolean)
128  * prettyPrintHTML(String,boolean,boolean,boolean,boolean)} or, <br>
129  * {@link #prettyPrintXHTML(String) prettyPrintXHTML(String)}for XHTML (note
130  * the X)
131  * </p>
132  *
133  * <pre>
134  * String testPrettyPrint(String html) {
135  * StringWriter sw = new StringWriter();
136  * OutputFormat format = OutputFormat.createPrettyPrint();
137  * // These are the default values for createPrettyPrint,
138  * // so you needn't set them:
139  * // format.setNewlines(true);
140  * // format.setTrimText(true);&lt;/font&gt;
141  * format.setXHTML(true);
142  * HTMLWriter writer = new HTMLWriter(sw, format);
143  * Document document = DocumentHelper.parseText(html);
144  * writer.write(document);
145  * writer.flush();
146  * return sw.toString();
147  * }
148  * </pre>
149  *
150  * <p>
151  * This example shows how to create a "squeezed" document, but one that will
152  * work in browsers even if the browser line length is limited. No newlines are
153  * included, no extra whitespace at all, except where it it required by
154  * {@link #setPreformattedTags(java.util.Set) setPreformattedTags}.
155  * </p>
156  *
157  * <pre>
158  * String testCrunch(String html) {
159  * StringWriter sw = new StringWriter();
160  * OutputFormat format = OutputFormat.createPrettyPrint();
161  * format.setNewlines(false);
162  * format.setTrimText(true);
163  * format.setIndent(&quot;&quot;);
164  * format.setXHTML(true);
165  * format.setExpandEmptyElements(false);
166  * format.setNewLineAfterNTags(20);
167  * org.dom4j.io.HTMLWriter writer = new HTMLWriter(sw, format);
168  * org.dom4j.Document document = DocumentHelper.parseText(html);
169  * writer.write(document);
170  * writer.flush();
171  * return sw.toString();
172  * }
173  * </pre>
174  *
175  * </blockquote>
176  *
177  * @author <a HREF="mailto:james.strachan@metastuff.com">James Strachan </a>
178  * @author Laramie Crocker
179  * @version $Revision: 1.21 $
180  */

181 public class HTMLWriter extends XMLWriter {
182     private static String JavaDoc lineSeparator = System.getProperty("line.separator");
183
184     protected static final HashSet JavaDoc DEFAULT_PREFORMATTED_TAGS;
185
186     static {
187         // If you change this list, update the javadoc examples, above in the
188
// class javadoc, in writeElement, and in setPreformattedTags().
189
DEFAULT_PREFORMATTED_TAGS = new HashSet JavaDoc();
190         DEFAULT_PREFORMATTED_TAGS.add("PRE");
191         DEFAULT_PREFORMATTED_TAGS.add("SCRIPT");
192         DEFAULT_PREFORMATTED_TAGS.add("STYLE");
193         DEFAULT_PREFORMATTED_TAGS.add("TEXTAREA");
194     }
195
196     protected static final OutputFormat DEFAULT_HTML_FORMAT;
197
198     static {
199         DEFAULT_HTML_FORMAT = new OutputFormat(" ", true);
200         DEFAULT_HTML_FORMAT.setTrimText(true);
201         DEFAULT_HTML_FORMAT.setSuppressDeclaration(true);
202     }
203
204     private Stack JavaDoc formatStack = new Stack JavaDoc();
205
206     private String JavaDoc lastText = "";
207
208     private int tagsOuput = 0;
209
210     // legal values are 0+, but -1 signifies lazy initialization.
211
private int newLineAfterNTags = -1;
212
213     private HashSet JavaDoc preformattedTags = DEFAULT_PREFORMATTED_TAGS;
214
215     /**
216      * Used to store the qualified element names which should have no close
217      * element tag
218      */

219     private HashSet JavaDoc omitElementCloseSet;
220
221     public HTMLWriter(Writer JavaDoc writer) {
222         super(writer, DEFAULT_HTML_FORMAT);
223     }
224
225     public HTMLWriter(Writer JavaDoc writer, OutputFormat format) {
226         super(writer, format);
227     }
228
229     public HTMLWriter() throws UnsupportedEncodingException JavaDoc {
230         super(DEFAULT_HTML_FORMAT);
231     }
232
233     public HTMLWriter(OutputFormat format) throws UnsupportedEncodingException JavaDoc {
234         super(format);
235     }
236
237     public HTMLWriter(OutputStream JavaDoc out) throws UnsupportedEncodingException JavaDoc {
238         super(out, DEFAULT_HTML_FORMAT);
239     }
240
241     public HTMLWriter(OutputStream JavaDoc out, OutputFormat format)
242             throws UnsupportedEncodingException JavaDoc {
243         super(out, format);
244     }
245
246     public void startCDATA() throws SAXException JavaDoc {
247     }
248
249     public void endCDATA() throws SAXException JavaDoc {
250     }
251
252     // Overloaded methods
253
// added isXHTML() stuff so you get the CDATA brackets if you desire.
254
protected void writeCDATA(String JavaDoc text) throws IOException JavaDoc {
255         // XXX: Should we escape entities?
256
// writer.write( escapeElementEntities( text ) );
257
if (getOutputFormat().isXHTML()) {
258             super.writeCDATA(text);
259         } else {
260             writer.write(text);
261         }
262
263         lastOutputNodeType = Node.CDATA_SECTION_NODE;
264     }
265
266     protected void writeEntity(Entity entity) throws IOException JavaDoc {
267         writer.write(entity.getText());
268         lastOutputNodeType = Node.ENTITY_REFERENCE_NODE;
269     }
270
271     protected void writeDeclaration() throws IOException JavaDoc {
272     }
273
274     protected void writeString(String JavaDoc text) throws IOException JavaDoc {
275         /*
276          * DOM stores \n at the end of text nodes that are newlines. This is
277          * significant if we are in a PRE section. However, we only want to
278          * output the system line.separator, not \n. This is a little brittle,
279          * but this function appears to be called with these lineseparators as a
280          * separate TEXT_NODE. If we are in a preformatted section, output the
281          * right line.separator, otherwise ditch. If the single \n character is
282          * not the text, then do the super thing to output the text.
283          *
284          * Also, we store the last text that was not a \n since it may be used
285          * by writeElement in this class to line up preformatted tags.
286          */

287         if (text.equals("\n")) {
288             if (!formatStack.empty()) {
289                 super.writeString(lineSeparator);
290             }
291
292             return;
293         }
294
295         lastText = text;
296
297         if (formatStack.empty()) {
298             super.writeString(text.trim());
299         } else {
300             super.writeString(text);
301         }
302     }
303
304     /**
305      * Overriden method to not close certain element names to avoid wierd
306      * behaviour from browsers for versions up to 5.x
307      *
308      * @param qualifiedName
309      * DOCUMENT ME!
310      *
311      * @throws IOException
312      * DOCUMENT ME!
313      */

314     protected void writeClose(String JavaDoc qualifiedName) throws IOException JavaDoc {
315         if (!omitElementClose(qualifiedName)) {
316             super.writeClose(qualifiedName);
317         }
318     }
319
320     protected void writeEmptyElementClose(String JavaDoc qualifiedName)
321             throws IOException JavaDoc {
322         if (getOutputFormat().isXHTML()) {
323             // xhtml, always check with format object whether to expand or not.
324
if (omitElementClose(qualifiedName)) {
325                 // it was a special omit tag, do it the XHTML way: "<br/>",
326
// ignoring the expansion option, since <br></br> is OK XML,
327
// but produces twice the linefeeds desired in the browser.
328
// for netscape 4.7, though all are fine with it, write a space
329
// before the close slash.
330
writer.write(" />");
331             } else {
332                 super.writeEmptyElementClose(qualifiedName);
333             }
334         } else {
335             // html, not xhtml
336
if (omitElementClose(qualifiedName)) {
337                 // it was a special omit tag, do it the old html way: "<br>".
338
writer.write(">");
339             } else {
340                 // it was NOT a special omit tag, check with format object
341
// whether to expand or not.
342
super.writeEmptyElementClose(qualifiedName);
343             }
344         }
345     }
346
347     protected boolean omitElementClose(String JavaDoc qualifiedName) {
348         return internalGetOmitElementCloseSet().contains(
349                 qualifiedName.toUpperCase());
350     }
351
352     private HashSet JavaDoc internalGetOmitElementCloseSet() {
353         if (omitElementCloseSet == null) {
354             omitElementCloseSet = new HashSet JavaDoc();
355             loadOmitElementCloseSet(omitElementCloseSet);
356         }
357
358         return omitElementCloseSet;
359     }
360
361     // If you change this, change the javadoc for getOmitElementCloseSet.
362
protected void loadOmitElementCloseSet(Set JavaDoc set) {
363         set.add("AREA");
364         set.add("BASE");
365         set.add("BR");
366         set.add("COL");
367         set.add("HR");
368         set.add("IMG");
369         set.add("INPUT");
370         set.add("LINK");
371         set.add("META");
372         set.add("P");
373         set.add("PARAM");
374     }
375
376     // let the people see the set, but not modify it.
377

378     /**
379      * A clone of the Set of elements that can have their close-tags omitted. By
380      * default it should be "AREA", "BASE", "BR", "COL", "HR", "IMG", "INPUT",
381      * "LINK", "META", "P", "PARAM"
382      *
383      * @return A clone of the Set.
384      */

385     public Set JavaDoc getOmitElementCloseSet() {
386         return (Set JavaDoc) (internalGetOmitElementCloseSet().clone());
387     }
388
389     /**
390      * To use the empty set, pass an empty Set, or null:
391      *
392      * <pre>
393      *
394      *
395      * setOmitElementCloseSet(new HashSet());
396      * or
397      * setOmitElementCloseSet(null);
398      *
399      *
400      * </pre>
401      *
402      * @param newSet
403      * DOCUMENT ME!
404      */

405     public void setOmitElementCloseSet(Set JavaDoc newSet) {
406         // resets, and safely empties it out if newSet is null.
407
omitElementCloseSet = new HashSet JavaDoc();
408
409         if (newSet != null) {
410             omitElementCloseSet = new HashSet JavaDoc();
411
412             Object JavaDoc aTag;
413             Iterator JavaDoc iter = newSet.iterator();
414
415             while (iter.hasNext()) {
416                 aTag = iter.next();
417
418                 if (aTag != null) {
419                     omitElementCloseSet.add(aTag.toString().toUpperCase());
420                 }
421             }
422         }
423     }
424
425     /**
426      * @see #setPreformattedTags(java.util.Set) setPreformattedTags
427      */

428     public Set JavaDoc getPreformattedTags() {
429         return (Set JavaDoc) (preformattedTags.clone());
430     }
431
432     /**
433      * <p>
434      * Override the default set, which includes PRE, SCRIPT, STYLE, and
435      * TEXTAREA, case insensitively.
436      * </p>
437      *
438      * <p>
439      * <b>Setting Preformatted Tags </b>
440      * </p>
441      *
442      * <p>
443      * Pass in a Set of Strings, one for each tag name that should be treated
444      * like a PRE tag. You may pass in null or an empty Set to assign the empty
445      * set, in which case no tags will be treated as preformatted, except that
446      * HTML Comments will continue to be preformatted. If a tag is included in
447      * the set of preformatted tags, all whitespace within the tag will be
448      * preserved, including whitespace on the same line preceding the close tag.
449      * This will generally make the close tag not line up with the start tag,
450      * but it preserves the intention of the whitespace within the tag.
451      * </p>
452      *
453      * <p>
454      * The browser considers leading whitespace before the close tag to be
455      * significant, but leading whitespace before the open tag to be
456      * insignificant. For example, if the HTML author doesn't put the close
457      * TEXTAREA tag flush to the left margin, then the TEXTAREA control in the
458      * browser will have spaces on the last line inside the control. This may be
459      * the HTML author's intent. Similarly, in a PRE, the browser treats a
460      * flushed left close PRE tag as different from a close tag with leading
461      * whitespace. Again, this must be left up to the HTML author.
462      * </p>
463      *
464      * <p>
465      * <b>Examples </b>
466      * </p>
467      * <blockquote>
468      * <p>
469      * Here is an example of how you can set the PreformattedTags list using
470      * setPreformattedTags to include IFRAME, as well as the default set, if you
471      * have an instance of this class named myHTMLWriter:
472      *
473      * <pre>
474      * Set current = myHTMLWriter.getPreformattedTags();
475      * current.add(&quot;IFRAME&quot;);
476      * myHTMLWriter.setPreformattedTags(current);
477      *
478      * //The set is now &lt;b&gt;PRE, SCRIPT, STYLE, TEXTAREA, IFRAME&lt;/b&gt;
479      *
480      *
481      * </pre>
482      *
483      * Similarly, you can simply replace it with your own:
484      *
485      * <pre>
486      *
487      *
488      * HashSet newset = new HashSet();
489      * newset.add(&quot;PRE&quot;);
490      * newset.add(&quot;TEXTAREA&quot;);
491      * myHTMLWriter.setPreformattedTags(newset);
492      *
493      * //The set is now &lt;b&gt;{PRE, TEXTAREA}&lt;/b&gt;
494      *
495      *
496      * </pre>
497      *
498      * You can remove all tags from the preformatted tags list, with an empty
499      * set, like this:
500      *
501      * <pre>
502      *
503      *
504      * myHTMLWriter.setPreformattedTags(new HashSet());
505      *
506      * //The set is now &lt;b&gt;{}&lt;/b&gt;
507      *
508      *
509      * </pre>
510      *
511      * or with null, like this:
512      *
513      * <pre>
514      *
515      *
516      * myHTMLWriter.setPreformattedTags(null);
517      *
518      * //The set is now &lt;b&gt;{}&lt;/b&gt;
519      *
520      *
521      * </pre>
522      *
523      * </p>
524      * </blockquote>
525      *
526      * @param newSet
527      * DOCUMENT ME!
528      */

529     public void setPreformattedTags(Set JavaDoc newSet) {
530         // no fancy merging, just set it, assuming they did a
531
// getExcludeTrimTags() first if they wanted to preserve the default
532
// set.
533
// resets, and safely empties it out if newSet is null.
534
preformattedTags = new HashSet JavaDoc();
535
536         if (newSet != null) {
537             Object JavaDoc aTag;
538             Iterator JavaDoc iter = newSet.iterator();
539
540             while (iter.hasNext()) {
541                 aTag = iter.next();
542
543                 if (aTag != null) {
544                     preformattedTags.add(aTag.toString().toUpperCase());
545                 }
546             }
547         }
548     }
549
550     /**
551      * DOCUMENT ME!
552      *
553      * @param qualifiedName
554      * DOCUMENT ME!
555      *
556      * @return true if the qualifiedName passed in matched (case-insensitively)
557      * a tag in the preformattedTags set, or false if not found or if
558      * the set is empty or null.
559      *
560      * @see #setPreformattedTags(java.util.Set) setPreformattedTags
561      */

562     public boolean isPreformattedTag(String JavaDoc qualifiedName) {
563         // A null set implies that the user called setPreformattedTags(null),
564
// which means they want no tags to be preformatted.
565
return (preformattedTags != null)
566                 && (preformattedTags.contains(qualifiedName.toUpperCase()));
567     }
568
569     /**
570      * This override handles any elements that should not remove whitespace,
571      * such as &lt;PRE&gt;, &lt;SCRIPT&gt;, &lt;STYLE&gt;, and &lt;TEXTAREA&gt;.
572      * Note: the close tags won't line up with the open tag, but we can't alter
573      * that. See javadoc note at setPreformattedTags.
574      *
575      * @param element
576      * DOCUMENT ME!
577      *
578      * @throws IOException
579      * When the stream could not be written to.
580      *
581      * @see #setPreformattedTags(java.util.Set) setPreformattedTags
582      */

583     protected void writeElement(Element element) throws IOException JavaDoc {
584         if (newLineAfterNTags == -1) { // lazy initialization check
585
lazyInitNewLinesAfterNTags();
586         }
587
588         if (newLineAfterNTags > 0) {
589             if ((tagsOuput > 0) && ((tagsOuput % newLineAfterNTags) == 0)) {
590                 super.writer.write(lineSeparator);
591             }
592         }
593
594         tagsOuput++;
595
596         String JavaDoc qualifiedName = element.getQualifiedName();
597         String JavaDoc saveLastText = lastText;
598         int size = element.nodeCount();
599
600         if (isPreformattedTag(qualifiedName)) {
601             OutputFormat currentFormat = getOutputFormat();
602             boolean saveNewlines = currentFormat.isNewlines();
603             boolean saveTrimText = currentFormat.isTrimText();
604             String JavaDoc currentIndent = currentFormat.getIndent();
605
606             // You could have nested PREs, or SCRIPTS within PRE... etc.,
607
// therefore use push and pop.
608
formatStack.push(new FormatState(saveNewlines, saveTrimText,
609                     currentIndent));
610
611             try {
612                 // do this manually, since it won't be done while outputting
613
// the tag.
614
super.writePrintln();
615
616                 if ((saveLastText.trim().length() == 0)
617                         && (currentIndent != null)
618                         && (currentIndent.length() > 0)) {
619                     // We are indenting, but we want to line up with the close
620
// tag. lastText was the indent (whitespace, no \n) before
621
// the preformatted start tag. So write it out instead of
622
// the current indent level. This makes it line up with its
623
// close tag.
624
super.writer.write(justSpaces(saveLastText));
625                 }
626
627                 // actually, newlines are handled in this class by writeString,
628
// depending on if the stack is empty.
629
currentFormat.setNewlines(false);
630                 currentFormat.setTrimText(false);
631                 currentFormat.setIndent("");
632
633                 // This line is the recursive one:
634
super.writeElement(element);
635             } finally {
636                 FormatState state = (FormatState) formatStack.pop();
637                 currentFormat.setNewlines(state.isNewlines());
638                 currentFormat.setTrimText(state.isTrimText());
639                 currentFormat.setIndent(state.getIndent());
640             }
641         } else {
642             super.writeElement(element);
643         }
644     }
645
646     private String JavaDoc justSpaces(String JavaDoc text) {
647         int size = text.length();
648         StringBuffer JavaDoc res = new StringBuffer JavaDoc(size);
649         char c;
650
651         for (int i = 0; i < size; i++) {
652             c = text.charAt(i);
653
654             switch (c) {
655                 case '\r':
656                 case '\n':
657
658                     continue;
659
660                 default:
661                     res.append(c);
662             }
663         }
664
665         return res.toString();
666     }
667
668     private void lazyInitNewLinesAfterNTags() {
669         if (getOutputFormat().isNewlines()) {
670             // don't bother, newlines are going to happen anyway.
671
newLineAfterNTags = 0;
672         } else {
673             newLineAfterNTags = getOutputFormat().getNewLineAfterNTags();
674         }
675     }
676
677     // Convenience methods, static, with bunch-o-defaults
678

679     /**
680      * Convenience method to just get a String result.
681      *
682      * @param html
683      * DOCUMENT ME!
684      *
685      * @return a pretty printed String from the source string, preserving
686      * whitespace in the defaultPreformattedTags set, and leaving the
687      * close tags off of the default omitElementCloseSet set. Use one of
688      * the write methods if you want stream output.
689      *
690      * @throws java.io.IOException
691      * @throws java.io.UnsupportedEncodingException
692      * @throws org.dom4j.DocumentException
693      */

694     public static String JavaDoc prettyPrintHTML(String JavaDoc html)
695             throws java.io.IOException JavaDoc, java.io.UnsupportedEncodingException JavaDoc,
696             org.dom4j.DocumentException {
697         return prettyPrintHTML(html, true, true, false, true);
698     }
699
700     /**
701      * Convenience method to just get a String result, but <b>As XHTML </b>.
702      *
703      * @param html
704      * DOCUMENT ME!
705      *
706      * @return a pretty printed String from the source string, preserving
707      * whitespace in the defaultPreformattedTags set, but conforming to
708      * XHTML: no close tags are omitted (though if empty, they will be
709      * converted to XHTML empty tags: &lt;HR/&gt; Use one of the write
710      * methods if you want stream output.
711      *
712      * @throws java.io.IOException
713      * @throws java.io.UnsupportedEncodingException
714      * @throws org.dom4j.DocumentException
715      */

716     public static String JavaDoc prettyPrintXHTML(String JavaDoc html)
717             throws java.io.IOException JavaDoc, java.io.UnsupportedEncodingException JavaDoc,
718             org.dom4j.DocumentException {
719         return prettyPrintHTML(html, true, true, true, false);
720     }
721
722     /**
723      * DOCUMENT ME!
724      *
725      * @param html
726      * DOCUMENT ME!
727      * @param newlines
728      * DOCUMENT ME!
729      * @param trim
730      * DOCUMENT ME!
731      * @param isXHTML
732      * DOCUMENT ME!
733      * @param expandEmpty
734      * DOCUMENT ME!
735      *
736      * @return a pretty printed String from the source string, preserving
737      * whitespace in the defaultPreformattedTags set, and leaving the
738      * close tags off of the default omitElementCloseSet set. This
739      * override allows you to specify various formatter options. Use one
740      * of the write methods if you want stream output.
741      *
742      * @throws java.io.IOException
743      * @throws java.io.UnsupportedEncodingException
744      * @throws org.dom4j.DocumentException
745      */

746     public static String JavaDoc prettyPrintHTML(String JavaDoc html, boolean newlines,
747             boolean trim, boolean isXHTML, boolean expandEmpty)
748             throws java.io.IOException JavaDoc, java.io.UnsupportedEncodingException JavaDoc,
749             org.dom4j.DocumentException {
750         StringWriter JavaDoc sw = new StringWriter JavaDoc();
751         OutputFormat format = OutputFormat.createPrettyPrint();
752         format.setNewlines(newlines);
753         format.setTrimText(trim);
754         format.setXHTML(isXHTML);
755         format.setExpandEmptyElements(expandEmpty);
756
757         HTMLWriter writer = new HTMLWriter(sw, format);
758         Document document = DocumentHelper.parseText(html);
759         writer.write(document);
760         writer.flush();
761
762         return sw.toString();
763     }
764
765     // Allows us to the current state of the format in this struct on the
766
// formatStack.
767
private class FormatState {
768         private boolean newlines = false;
769
770         private boolean trimText = false;
771
772         private String JavaDoc indent = "";
773
774         public FormatState(boolean newLines, boolean trimText, String JavaDoc indent) {
775             this.newlines = newLines;
776             this.trimText = trimText;
777             this.indent = indent;
778         }
779
780         public boolean isNewlines() {
781             return newlines;
782         }
783
784         public boolean isTrimText() {
785             return trimText;
786         }
787
788         public String JavaDoc getIndent() {
789             return indent;
790         }
791     }
792 }
793
794 /*
795  * <html> <head> <title>My Title </title> <style> .foo { text-align: Right; }
796  * </style> <script> function mojo(){ return "bar"; } </script> <script
797  * language="JavaScript"> <!-- //this is the canonical javascript hiding.
798  * function foo(){ return "foo"; } //--> </script> </head> <!-- this is a
799  * comment --> <body bgcolor="#A4BFDD" mojo="&amp;"> entities: &#160; &amp;
800  * &quot; &lt; &gt; %23 <p></p> <mojo> </mojo> <foo /> <table border="1"> <tr>
801  * <td><pre> line0 <hr /> line1 <b>line2, should line up, indent-wise </b> line
802  * 3 line 4 </pre></td><td></td></tr> </table> <myCDATAElement> <![CDATA[My
803  * data]]> </myCDATAElement> </body> </html>
804  */

805
806 /*
807  * Redistribution and use of this software and associated documentation
808  * ("Software"), with or without modification, are permitted provided that the
809  * following conditions are met:
810  *
811  * 1. Redistributions of source code must retain copyright statements and
812  * notices. Redistributions must also contain a copy of this document.
813  *
814  * 2. Redistributions in binary form must reproduce the above copyright notice,
815  * this list of conditions and the following disclaimer in the documentation
816  * and/or other materials provided with the distribution.
817  *
818  * 3. The name "DOM4J" must not be used to endorse or promote products derived
819  * from this Software without prior written permission of MetaStuff, Ltd. For
820  * written permission, please contact dom4j-info@metastuff.com.
821  *
822  * 4. Products derived from this Software may not be called "DOM4J" nor may
823  * "DOM4J" appear in their names without prior written permission of MetaStuff,
824  * Ltd. DOM4J is a registered trademark of MetaStuff, Ltd.
825  *
826  * 5. Due credit should be given to the DOM4J Project - http://www.dom4j.org
827  *
828  * THIS SOFTWARE IS PROVIDED BY METASTUFF, LTD. AND CONTRIBUTORS ``AS IS'' AND
829  * ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
830  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
831  * ARE DISCLAIMED. IN NO EVENT SHALL METASTUFF, LTD. OR ITS CONTRIBUTORS BE
832  * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
833  * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
834  * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
835  * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
836  * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
837  * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
838  * POSSIBILITY OF SUCH DAMAGE.
839  *
840  * Copyright 2001-2005 (C) MetaStuff, Ltd. All Rights Reserved.
841  */

842
Popular Tags