Tim's Weblog
Tim Strehle’s links and thoughts on Web apps, software development and Digital Asset Management, since 2002.
2003-10-14

On the Goodness of Unicode

Tim Bray On the Goodness of Unicode:

  • "Embrace Unicode, don't fight it; it's probably the right thing to do, and if it weren't you'd probably have to anyhow.
  • Inside your software, store text as UTF-8 or UTF-16; that is to say, pick one of the two and stick with it.
  • Interchange data with the outside world using XML whenever possible; this makes a whole bunch of potential problems go away.
  • Try to make your application browser-based rather than write your own client; the browsers are getting really quite good at dealing with the texts of the world.
  • If you're using someone else's library code (and of course you are), assume its Unicode handling is broken until proved to be correct.
  • If you're doing search, try to hand the linguistic and character-handling problems off to someone who understands them."