Kategorie: Tim’s Weblog
-
Turn HTML into plain text with proper whitespace (in XSLT and PHP)
Turning HTML into (unformatted) plain text seems simple at first: PHP has <a href=“http://www.php.net/strip_tags“>strip_tags()</a>, XSLT has <a href=“https://www.w3.org/TR/xslt#value-of“>xsl:value-of</a>. In practice, though, you’ll frequently find that words are glued together which should have whitespace between them. Take this example – extra weirdly-formatted to get the point across: If you select and copy this text in the…