Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

Why doesn the Unicode Standard adopt a compositional model for encoding Han ideographs? Wouldn that save a large number of code points?

0
Posted

Why doesn the Unicode Standard adopt a compositional model for encoding Han ideographs? Wouldn that save a large number of code points?

0

The Han ideographic script is largely compositional in nature. The overwhelming number of characters created over the centuries (and still being coined) are made by adjoining two or more old characters in simple geometric relationships. For example, the Cantonese- specific character U+55F0 嗰 was created by adjoining the two older characters, U+53E3 口 and U+500B 個, one next to the other. The compositional nature of the script—and, more to the point, the fact that this compositional nature is well-known—means that over time tens of thousands of ideographs have been created, and these are currently encoded in Unicode by using one code point per ideograph. The result is that some 71,000 code points are consumed by ideographs in Unicode 5.0, nearly three-quarters of the characters encoded. The compositional nature of the script makes it attractive to propose a compositional encoding model, such as can be used for Hangul. Such a mechanism would result in the savings of thousands of code poin

Related Questions

Thanksgiving questions

*Sadly, we had to bring back ads too. Hopefully more targeted.