Show all info regardless research infrastructures. Switch to CLARIN environment and show only relevant info to CLARIN, e.g. format recommendations by CLARIN centres. Switch to Text+ environment and show only relevant info to Text+, e.g. format recommendations by Text+ centres. Switch to DARIAH environment and show only relevant info to DARIAH, e.g. format recommendations by DARIAH centres.
Format Recommendations

This page presents formats of data depositions that various CLARIN centres are ready to accept. Each format, for each centre, can be "recommended", "acceptable" or "discouraged" in the context of several domains that represent the functions that the deposited data can play. The level of recommendation should always be viewed as relative to the profile of the given centre.

  • "recommended" should be interpreted as meaning that the centre in question will in most cases be able to process the data without much manipulation and that it is likely that the data will be preserved long-term in that format (the specifics are up to that centre);
  • "acceptable" should be interpreted as meaning that the centre may need to spend some time and resources on the up-conversion of the data, and that the data may be preserved in one of the recommended formats instead;
  • "discouraged" should be understood as indicating that the centre may find it problematic to up-convert the data.

Use the dropboxes to select the particular domain, centre, and/or level of recommendation. Columns can be sorted, and your results can be downloaded as XML.

The exported XML files for a specified centre can be used to extend or modify the recommendations for that centre, by an authorised person. In order to aid in the process, please consult the separate lists of all available file formats and of the functional groupings of formats (functional domains).

As of mid-2022, not every centre with depositing services has submitted the information to the SIS; in some cases, the information had to be unreliably mapped from lists provided on centre homepages onto the feature matrix offered by the SIS (created on the basis of the SIS functional domains and levels of recommendation). If you think you see an error, please kindly help us get it right.

Format Centre Domain Recommendation
HTML BBAW DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
HTML SAW DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
HTML SAW Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable
I5 IDS Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended See the format description.
I5 IDS Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended See the format description.
I5 ZIM Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
JP2 DANS Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended See more info from DANS
JP2 MI Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
JP2 LAC Contextual DataImages (photos or drawings) or documents relevant to the communicative event or text but not part of the source language data. recommended
JPEG EKUT Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
JPEG MPI-PL Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
JPEG ILC4CLARIN Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
JPEG UdS Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
JPEG CLARIN.SI Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
JPEG FIN-CLARIN Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
JPEG CLARIN-DK-UCPH Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
JPEG DANS Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended See more info from DANS
JPEG MI Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
JPEG ZIM Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
JPEG LAC Contextual DataImages (photos or drawings) or documents relevant to the communicative event or text but not part of the source language data. acceptable
JS CLARIN.SI Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) recommended
JS DANS Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) recommended See more info from DANS
JS MI Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) recommended
JS ZIM Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) recommended
JSON Sprakbanken MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. acceptable regular and structured; consider using JSONLD with a schema
JSON Sprakbanken Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. acceptable regular and structured; consider using JSONLD with a schema
JSON Sprakbanken Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable regular and structured; consider using JSONLD with a schema
JSON FIN-CLARIN MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. acceptable regular and structured; consider using JSONLD with a schema
JSON IDS MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. acceptable regular and structured; consider using JSONLD with a schema
JSON IDS Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. acceptable regular and structured; consider using JSONLD with a schema
JSON IDS Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable regular and structured; consider using JSONLD with a schema
JSON SAW Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. acceptable
JSON SAW MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. acceptable regular and structured; consider using JSONLD with a schema
JSON SAW Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable
JSON-LD MI Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
JSON-LD ZIM Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
JSON-LD SAW Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
JSON-LD SAW MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. recommended
KML MPI-PL GeodataInformation on geographic locations. recommended
KML Sprakbanken GeodataInformation on geographic locations. acceptable
KML CLARIN.SI GeodataInformation on geographic locations. recommended
KML DANS GeodataInformation on geographic locations. acceptable See more info from DANS
KML ZIM GeodataInformation on geographic locations. recommended
KorAPXMLClick to add or suggest missing format information IDS MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. recommended
KorAPXMLClick to add or suggest missing format information IDS Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. recommended
KorAPXMLClick to add or suggest missing format information IDS Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
LaTeXClick to add or suggest missing format information ILC4CLARIN DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
LaTeXClick to add or suggest missing format information UdS DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
LaTeXClick to add or suggest missing format information ZIM DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
LispClick to add or suggest missing format information EKUT Language DescriptionStructured or unstructured descriptions of linguistic varieties or phenomena, typological databases etc. recommended
LispClick to add or suggest missing format information CLARIN.SI Language DescriptionStructured or unstructured descriptions of linguistic varieties or phenomena, typological databases etc. recommended
1 < 2 < 3 < 4 < 5 > 6 > 7 > 8 > 9 > 10 > 11 > 12 > 13 > 14