The following charsets are different:-
- utf8: 1 ~ 3 bytes
- utf8mb3: 1 ~ 3 bytes
- utf8mb4: 1 ~ 4 bytes
What characters can be used and what characters can not be used by utf8?
Any character that its code exceeds U+FFFF can't be saved by utf8.
Character | Code | Can save/Can't save with utf8 |
---|---|---|
崏 | U+5D0E | Can save with utf8 |
﨓 | U+FA13 | Can save with utf8 |
😁 | U+1F601 | Can't save with utf8 |
𩸱 | U+29E31 | Can't save with utf8 |
If you need to save emoticons in MySQL DB, you must use utf8mb4.
The MySQL's utf8 is not UTF-8
UTF-8 is one of the characters encoding methods to represent Unicode which is made to represent all of the characters of the world.
UTF-8 is capable of encoding all 1,112,064 character code points in Unicode using one to four bytes.
So MySQL's utf8 is not UTF-8 simply because it is only 1 ~ 3 bytes. It's actually a fake UTF-8.
So I've learnt about this rule for the first time.
Even though it's called utf8, it only support 3 bytes.
On the contrary utf8mb4 supports 4 bytes so shouldn't it be called UTF-8. I hope they change names to avoid confusion.
That is what I've thought but probably I'm not the only one to think of that.
In short, if you are going to create an application using MySQL in the future, you should choose utf8mb4 instead of utf8.
MySQL specs may change in the future
Note
The utf8mb3 character set is deprecated and you should expect it to be removed in a future MySQL release. Please use utf8mb4 instead. utf8 is currently an alias for utf8mb3, but it is now deprecated as such, and utf8 is expected subsequently to become a reference to utf8mb4. Beginning with MySQL 8.0.28, utf8mb3 is also displayed in place of utf8 in columns of Information Schema tables, and in the output of SQL SHOW statements.
To avoid ambiguity about the meaning of utf8, consider specifying utf8mb4 explicitly for character set references.
https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8.html
So as per MySQL documentation, utf8mb3 is depreciated and utf8mb4 should be used. And in future release, utf8 will link to utf8mb4 instead of utf8mb3.
No comments:
Post a Comment