Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

What can Perl do with a UTF-8 string?

April 26, 2017Perl String UTF utf-8

0

10 Posted

What can Perl do with a UTF-8 string?

1 Answer

0

Posted

Perl versions prior to 5.6 had no knowledge of UTF-8 encoded characters. You can still work with UTF-8 data in these older Perl versions but you’ll probably need the help of a module like Unicode::String to deal with the non-ASCII characters. The built-in functions in Perl 5.6 and later are UTF-8 aware so for example length will return the number of characters rather than the number of bytes in a string, and ord can return values greater than 255. The regular expression engine will also correctly match against multi-byte characters and character classes have been extended to include Unicode properties and block ranges. None of this added functionality comes at the expense of support for binary data. Perl’s internal SV data structure (used to represent scalar values) includes a flag to indicate whether the string value is UTF-8 encoded. If this flag is not set, byte semantics will be used by all functions that operate on the string, eg: length will return the number of bytes regardless