Show all info regardless research infrastructures. Switch to CLARIN environment and show only relevant info to CLARIN, e.g. format recommendations by CLARIN centres. Switch to Text+ environment and show only relevant info to Text+, e.g. format recommendations by Text+ centres. Switch to DARIAH environment and show only relevant info to DARIAH, e.g. format recommendations by DARIAH centres.
Format Recommendations

This page presents formats of data depositions that various CLARIN centres are ready to accept. Each format, for each centre, can be "recommended", "acceptable" or "discouraged" in the context of several domains that represent the functions that the deposited data can play. The level of recommendation should always be viewed as relative to the profile of the given centre.

  • "recommended" should be interpreted as meaning that the centre in question will in most cases be able to process the data without much manipulation and that it is likely that the data will be preserved long-term in that format (the specifics are up to that centre);
  • "acceptable" should be interpreted as meaning that the centre may need to spend some time and resources on the up-conversion of the data, and that the data may be preserved in one of the recommended formats instead;
  • "discouraged" should be understood as indicating that the centre may find it problematic to up-convert the data.

Use the dropboxes to select the particular domain, centre, and/or level of recommendation. Columns can be sorted, and your results can be downloaded as XML.

The exported XML files for a specified centre can be used to extend or modify the recommendations for that centre, by an authorised person. In order to aid in the process, please consult the separate lists of all available file formats and of the functional groupings of formats (functional domains).

As of mid-2022, not every centre with depositing services has submitted the information to the SIS; in some cases, the information had to be unreliably mapped from lists provided on centre homepages onto the feature matrix offered by the SIS (created on the basis of the SIS functional domains and levels of recommendation). If you think you see an error, please kindly help us get it right.

Format Centre Domain Recommendation
CMDI BAS MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. acceptable
CMDI FIN-CLARIN Catalogue MetadataBasic structured information for discoverability and general description, to be openly provided for harvesting. recommended
CMDI IDS MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. recommended
CMDI ZIM Catalogue MetadataBasic structured information for discoverability and general description, to be openly provided for harvesting. recommended
CMDI LAC MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. recommended Profile BLAM-bundle-repository-v0.14
CMDI LAC MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. acceptable
CMDI BBAW Catalogue MetadataBasic structured information for discoverability and general description, to be openly provided for harvesting. recommended
CMDI SAW Catalogue MetadataBasic structured information for discoverability and general description, to be openly provided for harvesting. recommended
CMDI SAW Contextual InformationStructured information on the communicative event or text and its creators (i.e. participants or authors) relevant for analysis. recommended
CMDI SAW MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. recommended
Coma HZSK Contextual InformationStructured information on the communicative event or text and its creators (i.e. participants or authors) relevant for analysis. recommended
Coma IDS MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. recommended For transcriptions of speech. Coma is the EXMARaLDA Corpus-Manager.
CoNLL-U EKUT Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
CoNLL-U Sprakbanken Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
CoNLL-U Sprakbanken Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
CoNLL-U CLARIN.SI Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
CoNLL-U FIN-CLARIN Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
CoNLL-U IDS Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
CoNLL-U IDS Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
CoNLL-U ZIM Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
CoNLL-U SAW Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
CoNLL-U SAW Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
CoNLL-X SAW Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. acceptable consider using CoNLL-U instead
CoNLL-X SAW Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable consider using CoNLL-U instead
CSSClick to add or suggest missing format information EKUT Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) recommended
CSSClick to add or suggest missing format information ILC4CLARIN Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) recommended
CSSClick to add or suggest missing format information CLARIN.SI Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) recommended
CSSClick to add or suggest missing format information DANS Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) recommended See more info from DANS
CSSClick to add or suggest missing format information MI Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) recommended
CSSClick to add or suggest missing format information ZIM Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) recommended
CSV EKUT Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) recommended
CSV Sprakbanken MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. acceptable
CSV CLARIN.SI Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) recommended
CSV FIN-CLARIN Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) recommended
CSV FIN-CLARIN MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. acceptable
CSV CLARIN-DK-UCPH Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) recommended
CSV IDS MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. acceptable
CSV ACDH-ARCHE Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) recommended
CSV DANS OtherAny other function that cannot be included in an existing domain. The content of this domain will be periodically examined for potential patterns that may give rise to new domains. recommended See more info from DANS
CSV MI Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) recommended
CSV ZIM Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) recommended
CSV LAC MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. acceptable preferably with W3C Metadata for Tabular Data annotations
CSV SAW MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. acceptable
CSV SAW Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) recommended
CutClick to add or suggest missing format information MPI-PL Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) recommended
CWB-VRT FIN-CLARIN Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
DBASEClick to add or suggest missing format information DANS OtherAny other function that cannot be included in an existing domain. The content of this domain will be periodically examined for potential patterns that may give rise to new domains. acceptable See more info from DANS
DC XML Sprakbanken MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. recommended
DC XML BAS MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. acceptable
DC XML IDS MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. recommended
DC XML SAW MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. recommended
1 < 2 > 3 > 4 > 5 > 6 > 7 > 8 > 9 > 10 > 11 > 12 > 13 > 14