Show all info regardless research infrastructures. Switch to CLARIN environment and show only relevant info to CLARIN, e.g. format recommendations by CLARIN centres. Switch to Text+ environment and show only relevant info to Text+, e.g. format recommendations by Text+ centres. Switch to DARIAH environment and show only relevant info to DARIAH, e.g. format recommendations by DARIAH centres.
Format Recommendations

This page presents formats of data depositions that various CLARIN centres are ready to accept. Each format, for each centre, can be "recommended", "acceptable" or "discouraged" in the context of several domains that represent the functions that the deposited data can play. The level of recommendation should always be viewed as relative to the profile of the given centre.

  • "recommended" should be interpreted as meaning that the centre in question will in most cases be able to process the data without much manipulation and that it is likely that the data will be preserved long-term in that format (the specifics are up to that centre);
  • "acceptable" should be interpreted as meaning that the centre may need to spend some time and resources on the up-conversion of the data, and that the data may be preserved in one of the recommended formats instead;
  • "discouraged" should be understood as indicating that the centre may find it problematic to up-convert the data.

Use the dropboxes to select the particular domain, centre, and/or level of recommendation. Columns can be sorted, and your results can be downloaded as XML.

The exported XML files for a specified centre can be used to extend or modify the recommendations for that centre, by an authorised person. In order to aid in the process, please consult the separate lists of all available file formats and of the functional groupings of formats (functional domains).

As of mid-2022, not every centre with depositing services has submitted the information to the SIS; in some cases, the information had to be unreliably mapped from lists provided on centre homepages onto the feature matrix offered by the SIS (created on the basis of the SIS functional domains and levels of recommendation). If you think you see an error, please kindly help us get it right.

Format Centre Domain Recommendation
FLAC CLARIN.SI Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
FLAC FIN-CLARIN Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. acceptable
FLAC CLARIN-DK-UCPH Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
FLAC IDS Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. acceptable
FLAC ACDH-ARCHE Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
FLAC DANS Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended See more info from DANS
FLAC MI Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
FLAC COCOON Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
FLEx HZSK Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) recommended
FLEx LAC Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. acceptable
FLN HZSK Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. recommended
FLN CLARIN.SI Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. recommended
FLN IDS Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. recommended
FoLiA CLST Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
FoLiA CLARIN.SI Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
GeoJSON Sprakbanken GeodataInformation on geographic locations. recommended
GeoJSON DANS GeodataInformation on geographic locations. recommended
GeoTIFF Sprakbanken GeodataInformation on geographic locations. recommended
GeoTIFF DANS GeodataInformation on geographic locations. recommended See more info from DANS
GeoTIFF MI GeodataInformation on geographic locations. recommended
GIF EKUT Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
GIF ILC4CLARIN Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
GIF UdS Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
GIF CLARIN.SI Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
GIF ZIM Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
GML Sprakbanken GeodataInformation on geographic locations. recommended
GML DANS GeodataInformation on geographic locations. recommended See more info from DANS
GML MI GeodataInformation on geographic locations. recommended
GML ZIM GeodataInformation on geographic locations. recommended
GZIP EKUT Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) recommended
GZIP ILC4CLARIN Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) recommended
GZIP UdS Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) recommended
GZIP CLARIN.SI Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) recommended
GZIP FIN-CLARIN Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) acceptable
GZIP ZIM Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) recommended
HDF5 DANS OtherAny other function that cannot be included in an existing domain. The content of this domain will be periodically examined for potential patterns that may give rise to new domains. acceptable See more info from DANS
HDF5 ZIM Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) recommended
HTML EKUT DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
HTML MPI-PL DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
HTML Sprakbanken Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable without js etc. and with generic markup
HTML CMU DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
HTML ILC4CLARIN DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
HTML UdS DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
HTML CLARIN.SI DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
HTML IDS Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable without js etc. and with generic markup
HTML ACDH-ARCHE DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
HTML DANS DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended See more info from DANS
HTML DANS Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended See more info from DANS
HTML MI DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
HTML ZIM DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
HTML BBAW DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
1 < 2 < 3 < 4 > 5 > 6 > 7 > 8 > 9 > 10 > 11 > 12 > 13 > 14