Show all info regardless research infrastructures. Switch to CLARIN environment and show only relevant info to CLARIN, e.g. format recommendations by CLARIN centres. Switch to Text+ environment and show only relevant info to Text+, e.g. format recommendations by Text+ centres. Switch to DARIAH environment and show only relevant info to DARIAH, e.g. format recommendations by DARIAH centres.
Universal Coded Character Set
Abbreviation: UCS
Scope: Standard for character encoding of text documents
Topic: Character encoding
Standard body: ISO
Keywords: international character, character encoding, character set, character, Unicode, UTF-8
Description:

ISO/IEC 10646 is an international standard developed by ISO and IEC. It defines the Universal Character Set (UCS). The purpose of the standardisation is to gradually collect and define all characters used in all written languages in the world. Currently it covers 120585 characters from the world scripts and a wide range of additional common technical symbols.

The standardisation of the characters should improve the process of collecting and exchanging of data between any computer system or application. The ISO/IEC 10646:2014 standard provides:

  • a specification of the architecture for ISO/IEC 10646;
  • a definition of terms what are used in ISO/IEC 10646;
  • a description of the general structure of the UCS codespace;
  • a specification of characters defined in various planes such as the Basic Multilingual Plane (BMP, a set of majority characters from scripts in active modern use), Supplementary Multilingual Plane (SMP, a set for those multilingual characters that are not included in the BMP), Supplementary Ideographic Plane (SIP, a set for ideographic symbols, for example for Chinese characters), Tertiary Ideographic Plane (TIP, a set for ancient Chinese characters), the Supplementary Special-purpose Plane (SSP, a set for special characters).
  • a definition of a set for graphic characters used in scripts and the written form of languages on a world-wide scale;
  • a specification of the names for the graphic characters and format characters of the BMP, SMP, SIP, SSP and their coded representations within the UCS codespace;
  • a specification of the coded representations for control characters and private use characters;
  • a specification of UCS three encoding forms : UTF-8, UTF-16, and UTF-32;
  • a specification of UCS seven encoding schemes: UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, and UTF-32LE;
  • a specification of the management of future additions to this coded character set.

(see Abstract of ISO/IEC 10646:2014)

Since 1991 the ISO 10646 Working Group (SC 2/WG 2) and the Unicode Consortium work closely together. As a result of this work, UCS is fully compatible with the Unicode Standard. All character codes and encoding forms are synchronized between the two standards. Although the ISO 10646 and Unicode are very similar, they are not equivalent. UCS uses its own form of reference and separate terminologies to a certain degree.

Related Standard(s):
  • IPA

    The UCS contains an IPA section

  • Unicode
Other standards in the same topic(s):

Version Title: Information technology -- Universal Coded Character Set
Abbreviation: UCS-2014
Version Number: ISO/IEC 10646:2014
Status: International Standard
Release Date: 2014-08-29
Editor:
  1. ISO/IEC JTC 1/SC 2
Relations
Legend:

isSimilarTo

hasPart

isVersionOf