What does do shell script do with non-ASCII text (accented characters, Japanese, etc.)?
From AppleScript’s point of view, do shell script accepts and produces Unicode text. do shell script passes the commands to the shell and interprets their output using UTF-8. If a command produces bytes that are not valid UTF-8, do shell script will interpret them using the primary system encoding. Realize that most shell commands are completely ignorant of Unicode and UTF-8. UTF-8 looks like ASCII for ASCII characters — for example, A is the byte 0x41 in both ASCII and UTF-8 — but any non-ASCII character is represented as a sequence of bytes. As far as the shell commands are concerned, however, one byte equals one character, and they make no attempt to interpret anything outside the ASCII range. This means that they will preserve UTF-8 sequences and can do exact byte-for-byte matches: for example, echo “™” will produce a trademark symbol, and grep “©” will find every line with a copyright symbol. However, they cannot intelligently sort, alter, or compare UTF-8 sequences: for example,
Related Questions
- Bricolage seems to be automatically changing my accented and other non-ASCII characters: when I enter element content with accented characters às become ès and similar things occur to other accented characters. What’s going on?
- Why do accented and other non-ASCII characters look right in Bricolage’s internal preview, but not on my own preview server or my production server?
- What does do shell script do with non-ASCII text (accented characters, Japanese, etc.)?