CLARINO Bergen Center
Abbreviation: CLARINO_Bergen
Registry: CLARIN:
Research infrastructure:
- CLARIN (B-centre,K-centre)
- Øyvind Gjesdal (Juni 5, 2024)
The CLARINO Bergen Centre recommends the listed file formats in the CLARIN Standards Information System.
However we accept most file-formats (paired with documentation and supplemental files) even if not listed, to provide an archive for research output and activities for Language Technologies.
We have a more restrictive policy for accepting and importing files for use in our tools. See for instance Valid treebank formats for INESS.
Data functions covered by the recommendations: ...
Format recommendations:
Format | Domain | Level | Comments |
TigerClick to add or suggest missing format information | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | acceptable |
DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | acceptable | ||
TEI | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | acceptable |
EAF | Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. | acceptable | |
CSV | Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) | acceptable | |
WAVE | Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. | acceptable |
MP3 | Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. | acceptable | |
MP4 | Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. | acceptable | |
LMF | Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) | acceptable | |
XML | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | acceptable |
CoNLL-X | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | acceptable |
PDF/A | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | recommended | |
TSV | Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) | recommended | |
WAVE | Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. | recommended |
CoNLL-U | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | recommended | |
Menota | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | recommended |
FST | Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) | recommended |
Last update commit-id: 3a6bae8e