Show all info regardless research infrastructures. Switch to CLARIN environment and show only relevant info to CLARIN, e.g. format recommendations by CLARIN centres. Switch to Text+ environment and show only relevant info to Text+, e.g. format recommendations by Text+ centres. Switch to DARIAH environment and show only relevant info to DARIAH, e.g. format recommendations by DARIAH centres.
Format Recommendations

This page presents formats of data depositions that various CLARIN centres are ready to accept. Each format, for each centre, can be "recommended", "acceptable" or "discouraged" in the context of several domains that represent the functions that the deposited data can play. The level of recommendation should always be viewed as relative to the profile of the given centre.

  • "recommended" should be interpreted as meaning that the centre in question will in most cases be able to process the data without much manipulation and that it is likely that the data will be preserved long-term in that format (the specifics are up to that centre);
  • "acceptable" should be interpreted as meaning that the centre may need to spend some time and resources on the up-conversion of the data, and that the data may be preserved in one of the recommended formats instead;
  • "discouraged" should be understood as indicating that the centre may find it problematic to up-convert the data.

Use the dropboxes to select the particular domain, centre, and/or level of recommendation. Columns can be sorted, and your results can be downloaded as XML.

The exported XML files for a specified centre can be used to extend or modify the recommendations for that centre, by an authorised person. In order to aid in the process, please consult the separate lists of all available file formats and of the functional groupings of formats (functional domains).

As of mid-2022, not every centre with depositing services has submitted the information to the SIS; in some cases, the information had to be unreliably mapped from lists provided on centre homepages onto the feature matrix offered by the SIS (created on the basis of the SIS functional domains and levels of recommendation). If you think you see an error, please kindly help us get it right.

Format Centre Domain Recommendation
EAF CLARIN.SI Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
FoLiA CLARIN.SI Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
BPF CLARIN.SI Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
PDF CLARIN.SI Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. discouraged
Praat CLARIN.SI Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
TCFClick to add or suggest missing format information CLARIN.SI Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
plainText CLARIN.SI Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
DOCX PORTULAN-CLARIN Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. discouraged
HTML PORTULAN-CLARIN Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable ideally, with minimal markup and no Javascript
JSON PORTULAN-CLARIN Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable
PDF PORTULAN-CLARIN Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. discouraged
plainText PORTULAN-CLARIN Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended ideally, UTF-8 encoded
EAF COCOON Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
Praat COCOON Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
PDF ORTOLANG Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable
PDF/A-2 ORTOLANG Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable
PDF/A-3 ORTOLANG Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable
plainText ORTOLANG Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
FoLiA CLST Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
plainText CLST Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
KorAPXMLClick to add or suggest missing format information IDS Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
JSON IDS Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable regular and structured; consider using JSONLD with a schema
CoNLL-U IDS Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
I5 IDS Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended See the format description.
HTML IDS Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable without js etc. and with generic markup
plainText IDS Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable without markup
JSON Sprakbanken Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable regular and structured; consider using JSONLD with a schema
CoNLL-U Sprakbanken Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
HTML Sprakbanken Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable without js etc. and with generic markup
plainText Sprakbanken Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable without markup
CoNLL-U SAW Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
CoNLL-X SAW Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable consider using CoNLL-U instead
DOCX SAW Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable
HTML SAW Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable
JSON SAW Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable
Markdown SAW Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable
PDF SAW Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable
plainText SAW Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
EAF MPI-PL Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
ODTClick to add or suggest missing format information MPI-PL Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
PDF MPI-PL Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
Praat MPI-PL Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
plainText MPI-PL Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
plainText ACDH-ARCHE Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable
PDF MI Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable
PDF/A MI Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
plainText MI Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable
plainText CLARIN-DK-UCPH Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
PDF/A EKUT Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
TCFClick to add or suggest missing format information EKUT Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
1/2 >>