What can Perl do with a UTF-8 string?
Perl versions prior to 5.6 had no knowledge of UTF-8 encoded characters. You can still work with UTF-8 data in these older Perl versions but you’ll probably need the help of a module like Unicode::String to deal with the non-ASCII characters. The built-in functions in Perl 5.6 and later are UTF-8 aware so for example length will return the number of characters rather than the number of bytes in a string, and ord can return values greater than 255. The regular expression engine will also correctly match against multi-byte characters and character classes have been extended to include Unicode properties and block ranges. None of this added functionality comes at the expense of support for binary data. Perl’s internal SV data structure (used to represent scalar values) includes a flag to indicate whether the string value is UTF-8 encoded. If this flag is not set, byte semantics will be used by all functions that operate on the string, eg: length will return the number of bytes regardless