Show all info regardless research infrastructures. Switch to CLARIN environment and show only relevant info to CLARIN, e.g. format recommendations by CLARIN centres. Switch to Text+ environment and show only relevant info to Text+, e.g. format recommendations by Text+ centres. Switch to DARIAH environment and show only relevant info to DARIAH, e.g. format recommendations by DARIAH centres.
Data Deposition Formats

The SIS does not restrict the content of recommendations to the range of formats actually described in the system -- centres can mention any format that they are actually prepared to support, by creating and systematically using a format ID, which generally consists of the character "f" followed by a potentially mnemonic name. In this way, two major classes of broadly understood formats must be distinguished:

  • formats that are part of the SIS inventory, equipped with descriptions, keywords, potentially references to standards that define or use them, etc.
  • formats that are referenced inside centre recommendations, by means of an ID.

These two classes overlap, resulting in a tripartite division:

  1. formats that are mentioned in recommendations and are at the same time described in the SIS (115 'described formats'); these are listed at the bottom of this page;
  2. formats that are mentioned in recommendations and are not (yet) described in the SIS (57 'missing formats'); they are the ones that have a "+" symbol in recommendation lists and that link to predefined GitHub issues;
  3. formats that are described in the SIS but are not mentioned by any recommendation (9 'orphaned formats'); these are mostly either "hub" format categories, or formats once supported by centres but at least temporarily not in the scope of interest.

The present page lists the first category of formats, together with some of the properties that are identified in their descriptions. The other two categories have been delegated to the sanity checker page.

Formats described in the SIS (115)

The name of the format links to its description, sometimes rather stubby (you are welcome to help us extend the list and/or the descriptions, either by submitting an issue at GitHub containing suggested text or corrections, or by editing or adding the relevant format file and submitting a pull request).

By clicking on the icon next to the format name, you can copy the format ID, which may be useful for editing or adding centre recommendations.

Format MIME types File Extensions
Lex0Copy ID to clipboardFormat ID copied application/tei+xml .tei, .xml