dsandler.org

New essay!

November 18th, 2003

New essay!   In which I invoke the unholy gods of ASCII art in the name of programmer education:

1100 0011 | 1010 1001  <= this is how it looks in the 
                          page content.
110x xxxx | 10xx xxxx  <= this is the UTF-8 template for
                          "character between 0x80 and 
                          0x7FF".
---0 0011 | --10 1001     To reconstruct the Unicode for 
   | || \_   || |  |     which character that is, take 
    \||  \_\  || |  |     all the x's and mush them together 
     \   \ || |  |     at the end of a 16-bit field.
0000 0000 | 1110 1001  <= Lo, it is Unicode 0x00E9, commonly 
                          written "U00E9", which is "é"

The full essay is here: “Why can't Amar read (Unicode)?

newer: older: