Show all info regardless research infrastructures. Switch to CLARIN environment and show only relevant info to CLARIN, e.g. format recommendations by CLARIN centres. Switch to Text+ environment and show only relevant info to Text+, e.g. format recommendations by Text+ centres. Switch to DARIAH environment and show only relevant info to DARIAH, e.g. format recommendations by DARIAH centres.
Format Recommendations

This page presents formats of data depositions that various CLARIN centres are ready to accept. Each format, for each centre, can be "recommended", "acceptable" or "discouraged" in the context of several domains that represent the functions that the deposited data can play. The level of recommendation should always be viewed as relative to the profile of the given centre.

  • "recommended" should be interpreted as meaning that the centre in question will in most cases be able to process the data without much manipulation and that it is likely that the data will be preserved long-term in that format (the specifics are up to that centre);
  • "acceptable" should be interpreted as meaning that the centre may need to spend some time and resources on the up-conversion of the data, and that the data may be preserved in one of the recommended formats instead;
  • "discouraged" should be understood as indicating that the centre may find it problematic to up-convert the data.

Use the dropboxes to select the particular domain, centre, and/or level of recommendation. Columns can be sorted, and your results can be downloaded as XML.

The exported XML files for a specified centre can be used to extend or modify the recommendations for that centre, by an authorised person. In order to aid in the process, please consult the separate lists of all available file formats and of the functional groupings of formats (functional domains).

As of mid-2022, not every centre with depositing services has submitted the information to the SIS; in some cases, the information had to be unreliably mapped from lists provided on centre homepages onto the feature matrix offered by the SIS (created on the basis of the SIS functional domains and levels of recommendation). If you think you see an error, please kindly help us get it right.

Format Centre Domain Recommendation
plainText FIN-CLARIN DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended e.g. as README.txt
plainText CLARIN-DK-UCPH DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. acceptable
plainText CLARIN-DK-UCPH Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
plainText IDS Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. discouraged
plainText IDS DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
plainText IDS Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. discouraged
plainText IDS Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable without markup
plainText ACDH-ARCHE DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
plainText ACDH-ARCHE Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable
plainText DANS DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended Encoded as UTF-8/16/32, see more info from DANS
plainText DANS Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended Encoded as UTF-8/16/32, see more info from DANS
plainText MI DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
plainText MI Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable
plainText ZIM Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
plainText LAC Contextual DataImages (photos or drawings) or documents relevant to the communicative event or text but not part of the source language data. recommended UTF-8 encoding
plainText BBAW DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
plainText BBAW Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable
plainText SAW Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. discouraged
plainText SAW Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. discouraged
plainText SAW DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
plainText SAW Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
PlyClick to add or suggest missing format information ZIM Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
PNG EKUT Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
PNG MPI-PL Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
PNG UdS Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
PNG CLARIN.SI Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
PNG FIN-CLARIN Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
PNG DANS Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended See more info from DANS
PNG MI Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
PNG ZIM Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
PNG LAC Contextual DataImages (photos or drawings) or documents relevant to the communicative event or text but not part of the source language data. acceptable
Praat MPI-PL Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
Praat HZSK Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. recommended
Praat Sprakbanken Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. acceptable
Praat CLARIN.SI Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
Praat BAS Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. recommended
Praat FIN-CLARIN Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. acceptable
Praat FIN-CLARIN Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
Praat IDS Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. acceptable
Praat LAC Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. acceptable
Praat COCOON Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
QGIS.qgs Sprakbanken GeodataInformation on geographic locations. acceptable
QGIS.qgs DANS GeodataInformation on geographic locations. acceptable See more info from DANS
QTClick to add or suggest missing format information DANS Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. acceptable See more info from DANS
QuickTime ILC4CLARIN Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
QuickTime BAS Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
QuickTime MI Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
QuickTime ZIM Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended
QuickTime LAC Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. recommended Video codec h.264 (preferred profile: main, level: 4.0, 1080p, 30fps), Audio encoding LPCM (preferred sampling rate 48 kHz and bit depth 16 bit)
RClick to add or suggest missing format information EKUT Statistical DataData from surveys and tests in numeric formats. recommended
RClick to add or suggest missing format information CLARIN.SI Statistical DataData from surveys and tests in numeric formats. recommended
1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9 > 10 > 11 > 12 > 13 > 14