- HotSAX is a small fast SAX2 parser for HTML, XHTML and XML.
SAX parsers parse XML by generating events for start tags, text, and end tags which trigger event handlers in your code. They are meant to be faster and use less memory than an equivalent DOM parser. SAX2 adds lexical handling extensions like comments and CDATA.blocks.
- HTMLParser - a super-fast real-time parser for real-world HTML. What has attracted most developers to HTMLParser has been its simplicity in design, speed and ability to handle streaming real-world html
J.A.D.E. Java Addition to Default Environment
- javolution real-time goals are simple: To make your application faster and more time predictable!
That being accomplished through:
* Safe/transparent object recycling (including threads reuse with concurrent contexts).
* Class initialization and object preallocation at start-up or at any time of your convenience.
* Fast and highly deterministic util / lang / io / xml base classes (e.g. Text insertion/deletion in O[Log(n)] instead of O[n] for standard String/StringBuffer/StringBuilder).
* First (and unique) RTSJ-Compliant Collections Classes (pdf presentation)
- JTidy is a Java port of HTML Tidy,a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML.
- NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and "fix up" many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements; automatically closes elements with optional end tags; and can handle mismatched inline element tags.
- TagSoup, a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. By providing a SAX interface, it allows standard XML tools to be applied to even the worst HTML.