Show all info regardless research infrastructures. Switch to CLARIN environment and show only relevant info to CLARIN, e.g. format recommendations by CLARIN centres. Switch to Text+ environment and show only relevant info to Text+, e.g. format recommendations by Text+ centres. Switch to DARIAH environment and show only relevant info to DARIAH, e.g. format recommendations by DARIAH centres.
eXtensible Markup Language
suggest a fix or extension
Abbreviation: XML
Identifiers:
Type Id
SIS ID fXML Copy ID to clipboardSIS ID copied
LOCLibrary of Congress fdd000075
PRONOMUK National Archives fmt/101
Wikidata Q2115
Media type(s):
File extension(s): .xml
Format family: Markup.Full
Functional domains:
  • Audiovisual Annotation
  • Catalogue Metadata
  • Contextual Information
  • Documentation
  • Geodata
  • Image Annotation
  • Language Description
  • Lexical Resource
  • Metadata
  • Text Annotation
  • Textual Source Language Data
  • Tool Support
Recommendations:
Centre Domain Level Comments
ACDH-ARCHE DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
ACDH-ARCHE Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
BBAW DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
BBAW Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
CLARIN-DK-UCPH DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
CLARIN-DK-UCPH Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
CLARIN.SI DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
CLARIN.SI Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
CLARINO_Bergen Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. acceptable Well known and defined standards of XML-formats are preferred. When depositing non-standard, less known formats consider depositing also schema documents,(ODD, XSD, DTD or RelaxNG), guidelines and documentation to improve usability.
DANS Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. recommended See more info from DANS
DANS Catalogue MetadataBasic structured information for discoverability and general description, to be openly provided for harvesting. recommended See more info from DANS
DANS Contextual InformationStructured information on the communicative event or text and its creators (i.e. participants or authors) relevant for analysis. recommended See more info from DANS
DANS DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended See more info from DANS
DANS GeodataInformation on geographic locations. recommended See more info from DANS
DANS Image AnnotationAnnotations of image sources. recommended See more info from DANS
DANS Language DescriptionStructured or unstructured descriptions of linguistic varieties or phenomena, typological databases etc. recommended See more info from DANS
DANS Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) recommended See more info from DANS
DANS MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. recommended See more info from DANS
DANS Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended See more info from DANS
DANS Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended See more info from DANS
DANS Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) recommended See more info from DANS
EKUT DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
EKUT Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
FIN-CLARIN Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. acceptable
IDS MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. acceptable
ILC4CLARIN DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
ILC4CLARIN Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
LAC MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. acceptable
MI DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
MI Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
MPI-PL DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
MPI-PL Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
OTA DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
OTA MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. acceptable
SAW Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. recommended
SAW Image AnnotationAnnotations of image sources. recommended
SAW Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
SAW Catalogue MetadataBasic structured information for discoverability and general description, to be openly provided for harvesting. acceptable
SAW Contextual InformationStructured information on the communicative event or text and its creators (i.e. participants or authors) relevant for analysis. acceptable
SAW MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. acceptable
SAW Language DescriptionStructured or unstructured descriptions of linguistic varieties or phenomena, typological databases etc. recommended
Sprakbanken DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
Sprakbanken MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. acceptable
ZIM DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
ZIM Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
Description:

In the context of format recommendations, "XML" is too general a pointer to provide a meaningful recommendation. In fringe cases, it might happen that a centre receives deposition of a prose text, encoded with a series of <paragraph> elements within a <text> element, but in regular cases, and especially if the text has internal structure and is accompanied by annotations and a header containing metadata, it is best to adhere to one of the established formats that are more or less standardly recognized by tools at the disposal of CLARIN centres.

The following is an emphatically non-exhaustive list of XML-based formats recognized by CLARIN:

  • TEI-based formats
  • XHTML
  • PAULA
  • XCES
  • TigerXML
  • TCF
  • EXMARaLDA
  • FOLKER/OrthoNormal
  • ALTO
  • FoLiA
  • SVG (for graphics)
  • XSD, RNG, Schematron (for document grammars)
  • ...
  • many others: the SIS is going to feature a format-family browsing facility in the "near future".
Keywords: data format, annotation format, format family
Related Standard(s):
Relations
Legend:

isDefinedBy