Turning HTML into (unformatted) plain text seems simple at first: PHP has
strip_tags(), XSLT has
xsl:value-of. In practice, though, you’ll frequently find that words are glued together which should have whitespace between them.
Take this example – extra weirdly-formatted to get the point across:
If you select and copy this text in the browser, the result will look similar to the following:
Now look what we get if we feed the same HTML source code into
First lineSecond line.
Words (“HelloWorld” instead of “Hello World”) and lines are glued together!