Show all info regardless research infrastructures. Switch to CLARIN environment and show only relevant info to CLARIN, e.g. format recommendations by CLARIN centres. Switch to Text+ environment and show only relevant info to Text+, e.g. format recommendations by Text+ centres. Switch to DARIAH environment and show only relevant info to DARIAH, e.g. format recommendations by DARIAH centres.
Format Recommendations

This page presents formats of data depositions that various CLARIN centres are ready to accept. Each format, for each centre, can be "recommended", "acceptable" or "discouraged" in the context of several domains that represent the functions that the deposited data can play. The level of recommendation should always be viewed as relative to the profile of the given centre.

  • "recommended" should be interpreted as meaning that the centre in question will in most cases be able to process the data without much manipulation and that it is likely that the data will be preserved long-term in that format (the specifics are up to that centre);
  • "acceptable" should be interpreted as meaning that the centre may need to spend some time and resources on the up-conversion of the data, and that the data may be preserved in one of the recommended formats instead;
  • "discouraged" should be understood as indicating that the centre may find it problematic to up-convert the data.

Use the dropboxes to select the particular domain, centre, and/or level of recommendation. Columns can be sorted, and your results can be downloaded as XML.

The exported XML files for a specified centre can be used to extend or modify the recommendations for that centre, by an authorised person. In order to aid in the process, please consult the separate lists of all available file formats and of the functional groupings of formats (functional domains).

As of mid-2022, not every centre with depositing services has submitted the information to the SIS; in some cases, the information had to be unreliably mapped from lists provided on centre homepages onto the feature matrix offered by the SIS (created on the basis of the SIS functional domains and levels of recommendation). If you think you see an error, please kindly help us get it right.

Format Centre Domain Recommendation
AACClick to add or suggest missing format information DANS Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. acceptable See more info from DANS
AG XML BAS Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. recommended
AIClick to add or suggest missing format information DANS Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). acceptable See more info from DANS
AIFF Sprakbanken Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. acceptable
AIFF CLARIN.SI Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
AIFF BAS Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
AIFF IDS Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. acceptable
AIFF DANS Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. acceptable See more info from DANS
ALTO Sprakbanken Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. acceptable Conversion to a suitable TEI-based format is expected.
ALTO FIN-CLARIN Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. acceptable
ALTO IDS Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. acceptable Conversion to a suitable TEI-based format is expected, per Empfehlungen des DFG-Fachkollegiums 104 “Sprachwissenschaften" (Oct. 2019)
ANVIL HZSK Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. recommended
ANVIL IDS Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. acceptable
ArcGIS.gdb DANS GeodataInformation on geographic locations. acceptable See more info from DANS
ArcGIS.mxd DANS GeodataInformation on geographic locations. acceptable See more info from DANS
ASCII Grid DANS GeodataInformation on geographic locations. recommended See more info from DANS
ASCII Grid MI GeodataInformation on geographic locations. recommended
AutoCAD DXF-R12 DANS GeodataInformation on geographic locations. recommended See more info from DANS
AutoCAD DXF-R12 MI GeodataInformation on geographic locations. recommended
AVI CLARIN.SI Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
AVI BAS Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
AVI DANS Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. acceptable See more info from DANS
AVI MI Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
BiBTeXClick to add or suggest missing format information EKUT MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. recommended
BiBTeXClick to add or suggest missing format information ZIM MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. recommended
BMP EKUT Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
BPF CLARIN.SI Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
BPF BAS Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. recommended
BWFClick to add or suggest missing format information DANS Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended See more info from DANS
BWFClick to add or suggest missing format information MI Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
CDRClick to add or suggest missing format information DANS Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). acceptable See more info from DANS
CHAT MPI-PL Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. recommended
CHAT Sprakbanken Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. discouraged Consider using TEISpoken instead.
CHAT CMU Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. recommended
CHAT ORTOLANG Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. recommended
CHAT FIN-CLARIN Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. discouraged Consider using TEISpoken instead.
CHAT IDS Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. discouraged Consider using TEISpoken instead.
CHAT COCOON Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. recommended
CHAT-XML Sprakbanken Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. discouraged Consider using TEISpoken instead.
CHAT-XML CMU Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. recommended
CHAT-XML FIN-CLARIN Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. discouraged Consider using TEISpoken instead.
CHAT-XML IDS Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. discouraged Consider using TEISpoken instead.
CMDI EKUT Catalogue MetadataBasic structured information for discoverability and general description, to be openly provided for harvesting. recommended
CMDI HZSK MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. recommended
CMDI Sprakbanken MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. recommended
CMDI CLST Catalogue MetadataBasic structured information for discoverability and general description, to be openly provided for harvesting. recommended
CMDI CLARIN.SI Catalogue MetadataBasic structured information for discoverability and general description, to be openly provided for harvesting. recommended
CMDI BAS Catalogue MetadataBasic structured information for discoverability and general description, to be openly provided for harvesting. recommended Profile media-corpus
CMDI BAS Contextual InformationStructured information on the communicative event or text and its creators (i.e. participants or authors) relevant for analysis. recommended Profile media-session
CMDI BAS MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. acceptable
1 > 2 > 3 > 4 > 5 > 6 > 7 > 8 > 9 > 10 > 11 > 12 > 13