2016-03-10

Turn HTML into plain text with proper whitespace (in XSLT and PHP)

Turning HTML into (unformatted) plain text seems simple at first: PHP has strip_tags(), XSLT has xsl:value-of. In practice, though, you’ll frequently find that words are glued together which should have whitespace between them.

Take this example – extra weirdly-formatted to get the point across:

If you select and copy this text in the browser, the result will look similar to the following:

Hello
World

First line
Second line.

    1
    2

Now look what we get if we feed the same HTML source code into strip_tags() or xsl:value-of:

HelloWorld
First lineSecond line.
12

Words (“HelloWorld” instead of “Hello World”) and lines are glued together!

Read the full article…

Thu, 10 Mar 2016 12:42:00 +0000