Show all info regardless research infrastructures. Switch to CLARIN environment and show only relevant info to CLARIN, e.g. format recommendations by CLARIN centres. Switch to Text+ environment and show only relevant info to Text+, e.g. format recommendations by Text+ centres. Switch to DARIAH environment and show only relevant info to DARIAH, e.g. format recommendations by DARIAH centres.
CLARINO Bergen Center
Abbreviation: CLARINO_Bergen
Registry: CLARIN: https://centres.clarin.eu/centre/29
Research infrastructure:
  • CLARIN (B-centre,K-centre)
Curation:
Description:

The CLARINO Bergen Centre recommends the listed file formats in the CLARIN Standards Information System.

However we accept most file-formats (paired with documentation and supplemental files) even if not listed, to provide an archive for research output and activities for Language Technologies.

We have a more restrictive policy for accepting and importing files for use in our tools. See for instance Valid treebank formats for INESS.

Data functions covered by the recommendations: ...
Format recommendations:
Format Domain Level Comments
TigerClick to add or suggest missing format information Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. acceptable Used internally for our tools and services.
PDF DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. acceptable
TEI Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. acceptable More specific dialects/customizations using ODD-documents to specify/extend. Consider reusing existing dialects (e.g. Menota) over creating your own.
EAF Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. acceptable
CSV Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) acceptable
WAVE Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. acceptable PCM-WAV above 22 kHz/16 bit
MP3 Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. acceptable
MP4 Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. acceptable
LMF Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) acceptable
XML Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. acceptable Well known and defined standards of XML-formats are preferred. When depositing non-standard, less known formats consider depositing also schema documents,(ODD, XSD, DTD or RelaxNG), guidelines and documentation to improve usability.
CoNLL-X Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. acceptable consider using CoNLL-U instead
PDF/A DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
TSV Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) recommended
WAVE Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended PCM-WAV, 48 kHz, 16 bit
CoNLL-U Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
Menota Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended TEI extensions for Medieval Nordic texts
FST Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) recommended Source code format for both XFST and HFST
Last update commit-id: 3a6bae8e
Suggest a fix or extension