What is a Character Encoding?
The underlying question for the digital exchange of text is basically the following: how do you get from a character to an indentifiable digital representation of that character and back again? Here is a simple approach. Associate each character with a unique integer, and then give each of these integers a unique digital representation. Roughly, the first step creates a coded character set, and the second step creates a character encoding scheme. We will ignore two complications. Apparently it is not unheard of for a coded character set to assign more than one number to a single character (Connolly 1995). We also ignore a problem in the defintion of a character. E.g., are the minus sign and the hyphen different characters? We will say: yes in some character repertoires (e.g., ASCII and Latin-1), and no in others (e.g., UCS)! We define a coded character set to be a one-to-one mapping from a set of characters to a set of integers. (Calling this mapping “one-to-one” just means that we c