Unicode is an international standard developed by the Unicode Consortium, that defines nearly every character used in all written languages of the world. The first version of the standard was published in 1991 and covered over 7000 characters. Since the number of characters has increased significantly, currently Unicode contains 112956 different characters of the modern world's language (alphabetic scripts of Europe, the Middle East, Asia and Africa), ancient language (such as Latin, Sanskrit, classical Greek) and many other archaic and historic scripts. Furthermore, the standard encodes many important symbol sets, punctuation marks, mathematical symbols, technical symbols, geometric shapes, dingbats, and emoji.
Unicode may be seen as a character superset. It combines the character sets represented in many international and national standards of ISO, ANSI/NISO and so on. It also includes character sets from Adobe, Apple, Fujitsu, IBM, Lotus, Microsoft and much more. Therefore, the Unicode Standard offers the most complete and one of the largest character set in the world. Nearly all characters are encoded in Unicode, unambiguously defined and represented independent of any computer system or application used.
Unicode defines a name and a numerical value for each character, in three encoding forms: a 32-bit form (UTF-32), a 16-bit form (UTF-16), and an 8-bit form (UTF-8). These various forms make it easy to transform data in a byte, word or double word format.
The character codes of the Unicode Standard and the standard ISO/IEC 10646 (Universal Character Set) are identical and fully compatible with each other.
- The Unicode Consortium. (2014). The Unicode Standard: Version 7.0 – Core Specification [Online]. Available: http://www.unicode.org/versions/Unicode7.0.0/
- The Unicode Consortium
Legend: | |
|
isSimilarTo |
|
uses |
|
hasPart |
|
isUsedBy |
|
isVersionOf |