What is unicode and UTF-8?
Unicode is a standard for font-independent and orthographically accurate digital representation of written language using character codes. The role unicode plays for general languages is identical to the role played by the ASCII code for English. In particular, there is a perfect one-to-one correspondence between Bangla unicode and written Bangla which preserves all spellings. In this way, unicode can be viewed as an extension of ASCII to encode the characters of all other languages. In fact, a specific unicode encoding scheme called UTF-8 is designed in such a way that it is a direct superset of ASCII. Thus a UTF-8 text document can contain ASCII characters, and an ASCII text document is simply a special type of UTF-8 text document. To learn more, see the UTF-8 and Unicode FAQ for Unix/Linux. On recent linux systems, you can look up the manpages for unicode(7), utf-8(7), and charsets(7).