Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

What is unicode and UTF-8?

April 26, 2017Unicode utf-8

0

Posted

What is unicode and UTF-8?

1 Answer

0

Posted

Unicode is a standard for font-independent and orthographically accurate digital representation of written language using character codes. The role unicode plays for general languages is identical to the role played by the ASCII code for English. In particular, there is a perfect one-to-one correspondence between Bangla unicode and written Bangla which preserves all spellings. In this way, unicode can be viewed as an extension of ASCII to encode the characters of all other languages. In fact, a specific unicode encoding scheme called UTF-8 is designed in such a way that it is a direct superset of ASCII. Thus a UTF-8 text document can contain ASCII characters, and an ASCII text document is simply a special type of UTF-8 text document. To learn more, see the UTF-8 and Unicode FAQ for Unix/Linux. On recent linux systems, you can look up the manpages for unicode(7), utf-8(7), and charsets(7).