haskell@gentoo.org
Gentoo Haskell
TagSoup is a library for parsing HTML/XML. It supports the HTML 5 specification,
and can be used to parse either well-formed XML, or unstructured and malformed HTML
from the web. The library also provides useful functions to extract information
from an HTML document, making it ideal for screen-scraping.
Users should start from the "Text.HTML.TagSoup" module.
ndmitchell/tagsoup