Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

Can I version control files with characters in the full Unicode spectrum?

April 26, 2017characters CONTROL files spectrum Unicode version

0

10 Posted

Can I version control files with characters in the full Unicode spectrum?

1 Answer

0

10 Posted

Darcs is almost agnostic to character-encodings, the big exception being that you have to use some sort of 8-bit encoding in order for darcs to treat your files as text. UTF-8 works, as files would usually contain no embedded ^Z or \0 (encoding of characters outside the ASCII range uses bytes with the high bit set). See the manual section on characters sets: “UTF-8 will work if you set DARCS_DONT_ESCAPE_8BIT to 1”, otherwise all non-ASCII characters will be escaped when output. Apart from output issues, UTF-8 works largely because of its compatibility with ASCII — common end of line markers (U+000A and U+000D) are identified and files can be treated as text, whilst the rest of the Unicode range is encoded only using specific sets of bytes with the high bit set. UTF-16 is not well supported as it is (usually) treated as binary due to \0 bytes (basic European alphabet ranges include \0, e.g. ‘A’ is encoded as 00 41).