Tuesday, May 09, 2006

Zen of Unicode

I attended David Goodger's Unicode talk at PyCon earlier this year and I thought I'm well on my way to Unicode enlightenment. It turns out I still need to chop a lot of wood, carry a lot of water before I attain this particular Zen...In the hope that other people will find it useful, here's a mini-tutorial on Unicode in the form of an email message from David, who responded in excruciating detail to some Unicode-related questions I sent him. I tried to copy and paste the text into the Blogger editor, only to get all sorts of markup-related errors, so I just put it on a Trac wiki. Hopefully David will soon publish his Unicode tutorial on the Web. Until then, happy Unicode hacking!

1 comment:

Florian said...

Now consider this:

- You send utf-8 encoded links to a browser.
- User clicks on a link
- What do you get back in the request url on your webserver?

I'l enlighten you:
- In general the url gets urlquoted, so you have to urlunquote.
- IE will return you the url in the encoding you send it
- Mozilla/Firefox will send you latin-1 if there's no non latin-1 characters in your link. It will send you utf-8 however if there are...