Can I version control files with characters in the full Unicode spectrum?
Darcs is almost agnostic to character-encodings, the big exception being that you have to use some sort of 8-bit encoding in order for darcs to treat your files as text. UTF-8 works, as files would usually contain no embedded ^Z or \0 (encoding of characters outside the ASCII range uses bytes with the high bit set). See the manual section on characters sets: “UTF-8 will work if you set DARCS_DONT_ESCAPE_8BIT to 1”, otherwise all non-ASCII characters will be escaped when output. Apart from output issues, UTF-8 works largely because of its compatibility with ASCII — common end of line markers (U+000A and U+000D) are identified and files can be treated as text, whilst the rest of the Unicode range is encoded only using specific sets of bytes with the high bit set. UTF-16 is not well supported as it is (usually) treated as binary due to \0 bytes (basic European alphabet ranges include \0, e.g. ‘A’ is encoded as 00 41).
Related Questions
- My help topics and control files have been translated into languages with non-ASCII characters. How do I set up Oracle Help to display these characters correctly?
- After moving ASCII text files from Windows to UNIX they have control M characters (^M) in them?
- Is it possible to use Unicode and extended characters with the English version ColdFusion?