- Darwin Information Typing Architecture
- Guidelines for Electronic Text Encoding and Interchange
- Journal Article Tag Suite
- Language Resources Management — Multilingual Information Framework
- Language resource management — Linguistic annotation framework
- Language resource management — Word segmentation of written texts
- NLM Journal Archiving and Interchange Tag Suite
- Nancy Ide
- Greg Priest-Dorman
CES is an encoding standard for corpus annotation. It was developed within the framework of the EAGLES (Expert Advisory Group on Language Engineering Standards) project. The aim of the CES is to provide a unitary coding standard for linguistic corpus annotation. The CES can be used to encode corpora as resources for natural language processing.
SGML (ISO 8879:1986 Standard Generalized Markup Language) was the foundation of the CES. Beside the SGML, TEI (Text Encoding Initiative) Guidelines were taken into account for the development of the CES. Like the TEI, the CES standardizes the document structure (e.g. title, caption, break) or document info (metadata). In addition to that, the CES standardizes the linguistic annotation of a text (e.g. morpho-syntactic tagging, parallel text alignment, prosody, phonetic transcription, etc.). TEI P3 and the CES are compatible to each other, so they can be used side by side. As an XML-based version of the CES, XCES (Corpus Encoding Standard for XML) has also been developed .
The CES can be applied in monolingual, multi-lingual and parallel corpora.
- metaLanguage: SGML
- constraintLanguage: DTD
- grammarClass: LTG
- formalModel: Tree
- notation: Standoff
- multipleHierarchies: standoff annotation
- SGML
CES is based on the SGML.
- TEI Guidelines-1994
CES is an application of the SGML-based TEI P3 using the TEI modification layer.
- XCES
CES is the SGML ancestor of the XML-based XCES.
- Nancy Ide
- Patrice Bonhomme
XCES is the XML version of the Corpus Encoding Standard (CES). It was developed by the Department of Computer Science, Vassar College, and Equipe Langue et Dialogue, LORIA/CNRS, for the simple reason that XML is the standard for data representation and exchange on the World Wide Web. Some of the aims of this conversion were to offer a state-of-the-art representation of the corpus data and to be accessible for the language engineering community as well.
XCES offers DTDs and XML schemas for encoding basic document structure and linguistic annotation. The implementation of CES in XML allows not only the morpho-syntactic annotation but also the syntactic annotation. With the aid of XLink und XPointer, XCES gives more complex and superior method to refer to the standoff annotated corpus data, in contrast to the SGML based CES.
Furthermore XCES currently includes XML Schemas for validation and some XSLT scripts to transform into HTML document.
XCES is developed continually and planned be compliant with TEI P5. Currently the stages of development between the TEI Guidelines and XCES are so large that the TEI Guidelines P5 cannot be used in XCES.
- metaLanguage: XML
- constraintLanguage: XSD
- grammarClass: LTG
- formalModel: Graph
- notation: Standoff
- multipleHierarchies: standoff annotation
- CES
XCES is the XML instantiation of CES.
- LAF-2012
- TEI Guidelines
The XCES specification is based on the TEI P3 Standard.
- XML
XCES is an application of the Extensible Markup Language (XML), for instance it uses the XML syntax.
- XSD
XCES uses XML Schema 1.0 as a constraint language.
Legend: | |
|
isBasedOn |
|
isApplicationOf |
|
isVersionOf |
|
hasPart |
|
isVersionOf |