Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

Unicode doesn seem to distinguish between tréma and umlaut, but I need to distinguish. What shall I do?

April 26, 2017distinguish shall umlaut Unicode

0

Posted

Unicode doesn seem to distinguish between tréma and umlaut, but I need to distinguish. What shall I do?

1 Answer

0

Posted

A. For some purposes, it may be necessary to maintain a distinction between tréma and umlaut, for example, in bibliographic records kept by the German library network. For the Latin script, the Unicode Standard does not distinguish identically appearing diacritical marks with different functions. Doing so would result in confusion in implementations and among users. The character U+034F COMBINING GRAPHEME JOINER (CGJ) may be used to make the relevant sorting, searching, and data mapping distinctions required for umlaut versus tréma. The semantics of CGJ are such that it should impact only searching and sorting, for systems which have been tailored to distinguish it, while being otherwise ignored in interpretation. The CGJ character was encoded with this purpose in mind. The sequences and are not canonically equivalent. this means that the distinction will not be normalized away on conversion in and out of bibliographic systems. This eases the interoperabili