Tim’s Weblog Tim's Weblog
Tim Strehle’s links and thoughts on Web apps, managing software development and Digital Asset Management, since 2002.

Turn HTML into plain text with proper whitespace (in XSLT and PHP)

Turning HTML into (unformatted) plain text seems simple at first: PHP has strip_tags(), XSLT has xsl:value-of. In practice, though, you’ll frequently find that words are glued together which should have whitespace between them.

Take this example – extra weirdly-formatted to get the point across:

If you select and copy this text in the browser, the result will look similar to the following:


First line
Second line.


Now look what we get if we feed the same HTML source code into strip_tags() or xsl:value-of:

First lineSecond line.

Words (“HelloWorld” instead of “Hello World”) and lines are glued together!

Read the full article…

Thu, 10 Mar 2016 12:42:00 +0000