On the Goodness of Unicode
Tim Bray On the Goodness of Unicode:
- "Embrace Unicode, don't fight it; it's probably the right thing to do, and if it weren't you'd probably have to anyhow.
- Inside your software, store text as UTF-8 or UTF-16; that is to say, pick one of the two and stick with it.
- Interchange data with the outside world using XML whenever possible; this makes a whole bunch of potential problems go away.
- Try to make your application browser-based rather than write your own client; the browsers are getting really quite good at dealing with the texts of the world.
- If you're using someone else's library code (and of course you are), assume its Unicode handling is broken until proved to be correct.
- If you're doing search, try to hand the linguistic and character-handling problems off to someone who understands them."