Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

What does do shell script do with non-ASCII text (accented characters, Japanese, etc.)?

April 26, 2017accented characters Japanese non-ASCII Script Shell text

0

Posted

What does do shell script do with non-ASCII text (accented characters, Japanese, etc.)?

1 Answer

0

Posted

From AppleScript’s point of view, do shell script accepts and produces Unicode text. do shell script passes the commands to the shell and interprets their output using UTF-8. If a command produces bytes that are not valid UTF-8, do shell script will interpret them using the primary system encoding. Realize that most shell commands are completely ignorant of Unicode and UTF-8. UTF-8 looks like ASCII for ASCII characters — for example, A is the byte 0x41 in both ASCII and UTF-8 — but any non-ASCII character is represented as a sequence of bytes. As far as the shell commands are concerned, however, one byte equals one character, and they make no attempt to interpret anything outside the ASCII range. This means that they will preserve UTF-8 sequences and can do exact byte-for-byte matches: for example, echo “™” will produce a trademark symbol, and grep “©” will find every line with a copyright symbol. However, they cannot intelligently sort, alter, or compare UTF-8 sequences: for example,