Show all info regardless research infrastructures. Switch to CLARIN environment and show only relevant info to CLARIN, e.g. format recommendations by CLARIN centres. Switch to Text+ environment and show only relevant info to Text+, e.g. format recommendations by Text+ centres. Switch to DARIAH environment and show only relevant info to DARIAH, e.g. format recommendations by DARIAH centres.
Text Encoding Initiative
suggest a fix or extension
Abbreviation: TEI
Identifiers:
Type Id
SIS ID fTEI Copy ID to clipboardSIS ID copied
Media type(s):
File extension(s): .tei, .xml
Format family: XML
Functional domains:
  • Audiovisual Annotation
  • Documentation
  • Text Annotation
Recommendations:
Centre Domain Level Comments
ACDH-ARCHE DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
ACDH-ARCHE Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
BBAW DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
BBAW Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
CLARIN-DK-UCPH DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
CLARIN-DK-UCPH Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
CLARIN.SI DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
CLARIN.SI Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
COCOON DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
COCOON Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
EKUT DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
EKUT Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
FIN-CLARIN DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
FIN-CLARIN Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
IDS Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended with ODD or other schema
LAC Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. recommended
SAW Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. recommended
SAW Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
Sprakbanken DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
Sprakbanken Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended with ODD or other schema
ZIM DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
ZIM Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
Description:

In the context of format recommendations, the TEI has the same status as XML: too general to be a meaningful recommendation. The TEI Guidelines, together with the surrounding infrastructure (Stylesheets and dedicated utilities) is a toolkit for creating formats (also formats that serialize a standard).

By rule of thumb, centres should expect that each TEI document submitted to them is either accompanied by the ODD file, or conforms to one of the publicly available ODD specifications (either a named TEI out-of-the-box customization or a publicly documented extension), preferably bearing the stamp of a standards body, or encoding established best practices.

An ODD file defines the semantics of a TEI format, by means of prose, datatype restrictions and Schematron constraints. It is the basis for creating document grammars (schemas) that the conforming TEI document can be validated against. Schemas can be derived e.g. with the help of TEI Stylesheets or Roma, XML editor add-ons, etc.

In the context of CLARIN and LRT applications, the following TEI-based formats are recognizable:

  • ISO/TEI Transcriptions of Spoken Language -- serializing the corresponding ISO standard
  • DTABf -- Deutsches Textarchiv Basisformat, used at BBAW
  • ISO LMF -- TEI serialization of the Lexical Markup Framework (part 4 of the 2020 LMF standard)
  • TEI Lex0 -- an OASIS standard for the encoding of (a.o.) retrodigitised dictionaries, maintained by DARIAH
  • I5 -- tool format of DeReKo (Deutsches Referenzkorpus), used at IDS Mannheim

The above list is not meant to be exhaustive. While the above items would generally qualify as TEI extensions, the environment of the TEI Guidelines defines a series of restrictions on the overall TEI schema, referred to as 'templates' and meant as bases for further customization, as well as three fully-defined and maintained customizations: TEI Lite, TEI simplePrint, TEI Tite.

For the purpose of metadata descriptions, free-standing TEI headers can often be found of use.

Keywords: data format, annotation format, format family
Related Standard(s):
Relations
Legend:

isDefinedBy