Turn HTML into plain text with proper whitespace (in XSLT and PHP)
Turning HTML into (unformatted) plain text seems simple at first: PHP has strip_tags()
, XSLT has xsl:value-of
. In practice, though, you’ll frequently find that words are glued together which should have whitespace between them.
Take this example – extra weirdly-formatted to get the point across:
If you select and copy this text in the browser, the result will look similar to the following:
Hello
World
First line
Second line.
1
2
Now look what we get if we feed the same HTML source code into strip_tags()
or xsl:value-of
:
HelloWorld
First lineSecond line.
12
Words (“HelloWorld” instead of “Hello World”) and lines are glued together!
Thu, 10 Mar 2016 12:42:00 +0000