Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

Why are my supplementary characters rejected by MySQL?

0
Posted

Why are my supplementary characters rejected by MySQL?

0

MySQL does not support supplementary characters — that is, characters which need more than 3 bytes — for UTF-8. We support only what Unicode calls the Basic Multilingual Plane / Plane 0. Only a few very rare Han characters are supplementary; support for them is uncommon. This has led to reports such as that found in Bug#12600, which we rejected as “not a bug”. With utf8, we must truncate an input string when we encounter bytes that we don’t understand. Otherwise, we wouldn’t know how long the bad multi-byte character is. One possible workaround is to use ucs2 instead of utf8, in which case the “bad” characters are changed to question marks; however, no truncation takes place. You can also change the data type to BLOB or BINARY, which perform no validity checking. We intend at some point in the future to add support for UTF-16, which would solve such issues by allowing 4-byte characters. However, we have as yet set no definite timetable for doing so.

0

Before MySQL 6.0.4, MySQL does not support supplementary characters — that is, characters which need more than 3 bytes — for UTF-8. We support only what Unicode calls the Basic Multilingual Plane / Plane 0. Only a few very rare Han characters are supplementary; support for them is uncommon. This has led to reports such as that found in Bug#12600, which we rejected as “not a bug”. With utf8, we must truncate an input string when we encounter bytes that we don’t understand. Otherwise, we wouldn’t know how long the bad multi-byte character is. One possible workaround is to use ucs2 instead of utf8, in which case the “bad” characters are changed to question marks; however, no truncation takes place. You can also change the data type to BLOB or BINARY, which perform no validity checking. As of MySQL 6.0.4, Unicode support is extended to include supplementary characters by means of additional Unicode character sets: utf16, utf32, and 4-byte utf8. These character sets support supplementary Un

0

MySQL does not support supplementary characters that is, characters which need more than 3 bytes for UTF-8. We support only what Unicode calls the Basic Multilingual Plane / Plane 0. Only a few very rare Han characters are supplementary; support for them is uncommon. This has led to reports such as that found in Bug#12600, which we rejected as not a bug. With utf8, we must truncate an input string when we encounter bytes that we don’t understand. Otherwise, we wouldn’t know how long the bad multi-byte character is. One possible workaround is to use ucs2 instead of utf8, in which case the bad characters are changed to question marks; however, no truncation takes place. You can also change the data type to BLOB or BINARY, which perform no validity checking. We intend at some point in the future to add support for UTF-16, which would solve such issues by allowing 4-byte characters. However, we have as yet set no definite timetable for doing so.

Related Questions

What is your question?

*Sadly, we had to bring back ads too. Hopefully more targeted.