eXtensible Markup Language
Abbreviation: XML
Identifiers:
Type | Id | |
---|---|---|
SIS ID | fXML | Copy ID to clipboardSIS ID copied |
LOCLibrary of Congress | fdd000075 | |
PRONOMUK National Archives | fmt/101 | |
Wikidata | Q2115 |
Media type(s):
-
application/xml
File extension(s): .xml
Format family: Markup.Full
Functional domains:
- Audiovisual Annotation
- Catalogue Metadata
- Contextual Information
- Documentation
- Geodata
- Image Annotation
- Language Description
- Lexical Resource
- Metadata
- Text Annotation
- Textual Source Language Data
- Tool Support
Recommendations:
Centre | Domain | Level | Comments |
---|---|---|---|
ACDH-ARCHE | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | recommended | |
ACDH-ARCHE | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | recommended | |
BBAW | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | recommended | |
BBAW | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | recommended | |
CLARIN-CH | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | recommended | |
CLARIN-CH | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | recommended | |
CLARIN-CH | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | acceptable | |
CLARIN-CH | MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. | acceptable | |
CLARIN-CH | Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. | recommended | |
CLARIN-CH | Language DescriptionStructured or unstructured descriptions of linguistic varieties or phenomena, typological databases etc. | recommended | |
CLARIN-CH | Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) | recommended | |
CLARIN-DK-UCPH | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | recommended | |
CLARIN-DK-UCPH | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | recommended | |
CLARIN.SI | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | recommended | |
CLARIN.SI | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | recommended | |
CLARINO_Bergen | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | acceptable |
|
DANS | Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. | recommended |
|
DANS | Catalogue MetadataBasic structured information for discoverability and general description, to be openly provided for harvesting. | recommended |
|
DANS | Contextual InformationStructured information on the communicative event or text and its creators (i.e. participants or authors) relevant for analysis. | recommended |
|
DANS | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | recommended |
|
DANS | GeodataInformation on geographic locations. | recommended |
|
DANS | Image AnnotationAnnotations of image sources. | recommended |
|
DANS | Language DescriptionStructured or unstructured descriptions of linguistic varieties or phenomena, typological databases etc. | recommended |
|
DANS | Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) | recommended |
|
DANS | MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. | recommended |
|
DANS | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | recommended |
|
DANS | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | recommended |
|
DANS | Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) | recommended |
|
EKUT | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | recommended | |
EKUT | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | recommended | |
FIN-CLARIN | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | acceptable | |
IDS | MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. | acceptable | |
ILC4CLARIN | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | recommended | |
ILC4CLARIN | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | recommended | |
LAC | MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. | acceptable | |
MI | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | recommended | |
MI | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | recommended | |
MPI-PL | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | recommended | |
MPI-PL | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | recommended | |
OTA | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | recommended | |
OTA | MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. | acceptable | |
PORTULAN-CLARIN | Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) | acceptable | |
PORTULAN-CLARIN | MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. | acceptable | |
SAW | Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. | recommended | |
SAW | Image AnnotationAnnotations of image sources. | recommended | |
SAW | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | recommended | |
SAW | Catalogue MetadataBasic structured information for discoverability and general description, to be openly provided for harvesting. | acceptable | |
SAW | Contextual InformationStructured information on the communicative event or text and its creators (i.e. participants or authors) relevant for analysis. | acceptable | |
SAW | MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. | acceptable | |
SAW | Language DescriptionStructured or unstructured descriptions of linguistic varieties or phenomena, typological databases etc. | recommended | |
Sprakbanken | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | recommended | |
Sprakbanken | MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. | acceptable | |
ZIM | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | recommended | |
ZIM | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | recommended |
Description:
In the context of format recommendations, "XML" is too general a pointer to provide a meaningful recommendation. In fringe cases, it might happen that a centre receives deposition of a prose text, encoded with a series of <paragraph> elements within a <text> element, but in regular cases, and especially if the text has internal structure and is accompanied by annotations and a header containing metadata, it is best to adhere to one of the established formats that are more or less standardly recognized by tools at the disposal of CLARIN centres.
The following is an emphatically non-exhaustive list of XML-based formats recognized by CLARIN:
- TEI-based formats
- XHTML
- PAULA
- XCES
- TigerXML
- TCF
- EXMARaLDA
- FOLKER/OrthoNormal
- ALTO
- FoLiA
- SVG (for graphics)
- XSD, RNG, Schematron (for document grammars)
- ...
- many others: the SIS is going to feature a format-family browsing facility in the "near future".
Keywords: data format, annotation format, format family
Related Standard(s):
Relations
Legend: | |
|
isDefinedBy |