This topic might be confusing due to the fact that the concept of unicode, UTF-8, hexdecimals and binaries are all mixed together. To clarify this topic, I am going to start with this:
English is the only language on the planet, then we don't need the concept of Unicode and UTF-8. ASCII would be enough.
packed(think of a box) as one unit. This "boxing" method is called UTF-8.
|Number of bytes||Bits for code point (empty spaces)||Byte 1||Byte 2||Byte 3||Byte 4|
As you can see above, the
x represents the number of bits you can use for storing a character. Think of the 0s and 1s as headers.
A chinese character: 汉
16 bits are need for packing this character. According to the UTF-8 format table above, 3 bytes (16 empty spaces) are need.
|Byte 1||Byte 2||Byte 3|