Ad

Do UTF-8, UTF-16, And UTF-32 Differ In The Number Of Characters They Can Store?

- 1 answer

Okay. I know this looks like the typical "Why didn't he just Google it or go to www.unicode.org and look it up?" question, but for such a simple question the answer still eludes me after checking both sources.

I am pretty sure that all three of these encoding systems support all of the Unicode characters, but I need to confirm it before I make that claim in a presentation.

Bonus question: Do these encodings differ in the number of characters they can be extended to support?

Ad

Answer

No, they're simply different encoding methods. They all support encoding the same set of characters.

UTF-8 uses anywhere from one to four bytes per character depending on what character you're encoding. Characters within the ASCII range take only one byte while very unusual characters take four.

UTF-32 uses four bytes per character regardless of what character it is, so it will always use more space than UTF-8 to encode the same string. The only advantage is that you can calculate the number of characters in a UTF-32 string by only counting bytes.

UTF-16 uses two bytes for most charactes, four bytes for unusual ones.

http://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings

Ad
source: stackoverflow.com
Ad