Show all info regardless research infrastructures. Switch to CLARIN environment and show only relevant info to CLARIN, e.g. format recommendations by CLARIN centres. Switch to Text+ environment and show only relevant info to Text+, e.g. format recommendations by Text+ centres. Switch to DARIAH environment and show only relevant info to DARIAH, e.g. format recommendations by DARIAH centres.
Format Recommendations

This page presents formats of data depositions that various CLARIN centres are ready to accept. Each format, for each centre, can be "recommended", "acceptable" or "discouraged" in the context of several domains that represent the functions that the deposited data can play. The level of recommendation should always be viewed as relative to the profile of the given centre.

  • "recommended" should be interpreted as meaning that the centre in question will in most cases be able to process the data without much manipulation and that it is likely that the data will be preserved long-term in that format (the specifics are up to that centre);
  • "acceptable" should be interpreted as meaning that the centre may need to spend some time and resources on the up-conversion of the data, and that the data may be preserved in one of the recommended formats instead;
  • "discouraged" should be understood as indicating that the centre may find it problematic to up-convert the data.

Use the dropboxes to select the particular domain, centre, and/or level of recommendation. Columns can be sorted, and your results can be downloaded as XML.

The exported XML files for a specified centre can be used to extend or modify the recommendations for that centre, by an authorised person. In order to aid in the process, please consult the separate lists of all available file formats and of the functional groupings of formats (functional domains).

As of mid-2022, not every centre with depositing services has submitted the information to the SIS; in some cases, the information had to be unreliably mapped from lists provided on centre homepages onto the feature matrix offered by the SIS (created on the basis of the SIS functional domains and levels of recommendation). If you think you see an error, please kindly help us get it right.

Format Centre Domain Recommendation
TIFF CLARIN-DK-UCPH Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
TIFF ACDH-ARCHE Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
TIFF DANS Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended See more info from DANS
TIFF MI Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
TIFF ZIM Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
TIFF LAC Contextual DataImages (photos or drawings) or documents relevant to the communicative event or text but not part of the source language data. recommended
TigerClick to add or suggest missing format information UdS Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
TigerClick to add or suggest missing format information CLARIN.SI Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
TigerClick to add or suggest missing format information CLARINO_Bergen Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
Toolbox LAC Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. acceptable Toolbox files are still acceptable, but the format itself has proven to be problematic and should be avoided if possible.
Transana IDS Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. discouraged
TrigClick to add or suggest missing format information MI Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
TRS HZSK Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. recommended
TRS CLARIN.SI Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. recommended
TRS IDS Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. acceptable
TRS COCOON Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. recommended
TSV EKUT Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) recommended
TSV CLARIN.SI Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) recommended
TSV FIN-CLARIN Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) recommended
TSV SAW Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) acceptable
TurtleClick to add or suggest missing format information MI Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
TurtleClick to add or suggest missing format information ZIM Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
VRTClick to add or suggest missing format information UdS Language DescriptionStructured or unstructured descriptions of linguistic varieties or phenomena, typological databases etc. recommended
VRTClick to add or suggest missing format information CLARIN.SI Language DescriptionStructured or unstructured descriptions of linguistic varieties or phenomena, typological databases etc. recommended
VRTClick to add or suggest missing format information CLARIN-DK-UCPH Language DescriptionStructured or unstructured descriptions of linguistic varieties or phenomena, typological databases etc. recommended
WAVE EKUT Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
WAVE MPI-PL Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
WAVE HZSK Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended (L)PCM-WAV, 48kHz, 16bit
WAVE Sprakbanken Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended PCM-WAV, 48 kHz, 16 bit
WAVE Sprakbanken Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. acceptable PCM-WAV with non-recommended parameters (not 48 kHz, 16 bit)
WAVE CMU Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
WAVE CLST Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
WAVE ILC4CLARIN Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
WAVE ORTOLANG Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
WAVE UdS Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
WAVE CLARIN.SI Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
WAVE BAS Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
WAVE FIN-CLARIN Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended PCM-WAV, 48 kHz, 16 bit
WAVE FIN-CLARIN Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. acceptable PCM-WAV above 22 kHz/16 bit
WAVE IDS Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended PCM-WAV, 48 kHz, 16 bit
WAVE IDS Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. acceptable PCM-WAV with non-recommended parameters (not 48 kHz, 16 bit)
WAVE ACDH-ARCHE Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
WAVE DANS Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. acceptable See more info from DANS
WAVE MI Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
WAVE ZIM Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
WAVE LAC Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended LPCM audio (preferred sampling rate 48 kHz and bit depth 16 bit)
WAVE COCOON Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
WFObjClick to add or suggest missing format information MI Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
WFObjClick to add or suggest missing format information ZIM Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
WMFClick to add or suggest missing format information DANS Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). acceptable See more info from DANS
Worldfile.jpgw DANS GeodataInformation on geographic locations. acceptable See more info from DANS
1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9 < 10 < 11 < 12 > 13 > 14