Whats the advantage of using the libxml2 library for parsing HTML?
Swish-e may be linked with libxml2, a library for working with HTML and XML documents. Swish-e can use libxml2 for parsing HTML and XML documents. The libxml2 parser is a better parser than Swish-e’s built-in HTML parser. It offers more features, and it does a much better job at extracting out the text from a web page. In addition, you can use the ParserWarningLevel configuration setting to find structural errors in your documents that could (and would with Swish-e’s HTML parser) cause documents to be indexed incorrectly. Libxml2 is not required, but is strongly recommended for parsing HTML documents. It’s also recommended for parsing XML, as it offers many more features than the internal Expat xml.c parser. The internal HTML parser will have limited support, and does have a number of bugs. For example, HTML entities may not always be correctly converted and properties do not have entities converted. The internal parser tends to get confused when invalid HTML is parsed where the libxml