Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

What’s the algorithm to convert from UTF-16 to character codes?

0
Posted

What’s the algorithm to convert from UTF-16 to character codes?

0

The Unicode Standard used to contain a short algorithm, now there is just a bit distribution table. Here are three short code snippets that translate the information from the bit distribution table into C code that will convert to and from UTF-16. Using the following type definitions typedef unsigned int16 UTF16; typedef unsigned int32 UTF32; the first snippet calculates the high (or leading) surrogate from a character code C. const UTF16 HI_SURROGATE_START = 0xD800 UTF16 X = (UTF16) C; UTF32 U = (C >> 16) & ((1 << 5) - 1); UTF16 W = (UTF16) U - 1; UTF16 HiSurrogate = HI_SURROGATE_START | (W << 6) | X >> 10; where X, U and W correspond to the labels used in Table 3-4 UTF-16 Bit Distribution. The next snippet does the same for the low surrogate. const UTF16 LO_SURROGATE_START = 0xDC00 UTF16 X = (UTF16) C; UTF16 LoSurrogate = (UTF16) (LO_SURROGATE_START | X & ((1 << 10) - 1)); Finally, the reverse, where hi and lo are the high and low surrogate, and C the resulting character UTF32 X = (h

0

Asked in Computers & Technology at 2:08 PM on November 10, 2008 Tags: algorithm, convert, character, codes

Related Questions

What is your question?

*Sadly, we had to bring back ads too. Hopefully more targeted.