Show all info regardless research infrastructures. Switch to CLARIN environment and show only relevant info to CLARIN, e.g. format recommendations by CLARIN centres. Switch to Text+ environment and show only relevant info to Text+, e.g. format recommendations by Text+ centres. Switch to DARIAH environment and show only relevant info to DARIAH, e.g. format recommendations by DARIAH centres.
Data Deposition Formats

The SIS does not restrict the content of recommendations to the range of formats actually described in the system -- centres can mention any format that they are actually prepared to support, by creating and systematically using a format ID, which generally consists of the character "f" followed by a potentially mnemonic name. In this way, two major classes of broadly understood formats must be distinguished:

  • formats that are part of the SIS inventory, equipped with descriptions, keywords, potentially references to standards that define or use them, etc.
  • formats that are referenced inside centre recommendations, by means of an ID.

These two classes overlap, resulting in a tripartite division:

  1. formats that are mentioned in recommendations and are at the same time described in the SIS (113 'described formats'); these are listed at the bottom of this page;
  2. formats that are mentioned in recommendations and are not (yet) described in the SIS (56 'missing formats'); they are the ones that have a "+" symbol in recommendation lists and that link to predefined GitHub issues;
  3. formats that are described in the SIS but are not mentioned by any recommendation (10 'orphaned formats'); these are mostly either "hub" format categories, or formats once supported by centres but at least temporarily not in the scope of interest.

The present page lists the first category of formats, together with some of the properties that are identified in their descriptions. The other two categories have been delegated to the sanity checker page.

Formats described in the SIS (113)

The name of the format links to its description, sometimes rather stubby (you are welcome to help us extend the list and/or the descriptions, either by submitting an issue at GitHub containing suggested text or corrections, or by editing or adding the relevant format file and submitting a pull request).

By clicking on the icon next to the format name, you can copy the format ID, which may be useful for editing or adding centre recommendations.

Format MIME types File Extensions
AG XML (Annotation Graphs XML Format)Copy ID to clipboardFormat ID copied text/xml .xml
AIFF (Audio Interchange File Format)Copy ID to clipboardFormat ID copied audio/aiff, audio/x-aiff .aif, .aiff
ALTO (Analyzed Layout and Text Object)Copy ID to clipboardFormat ID copied application/xml .xml
ANVIL (Anvil annotation file)Copy ID to clipboardFormat ID copied text/xml .anvil
ArcGIS.gdb (Esri File Geodatabase)Copy ID to clipboardFormat ID copied application/x-filegdb .gdb
ArcGIS.mxd (ArcGIS project file format)Copy ID to clipboardFormat ID copied application/octet-stream .mxd
ASCII Grid (ArcGIS ASCII grid format)Copy ID to clipboardFormat ID copied text/plain .asc, .txt
AutoCAD DXF-R12 (AutoCAD Drawing Interchange Format, v. R12 (ASCII))Copy ID to clipboardFormat ID copied image/vnd.dxf .dxf
AVI (Audio Video Interleaved)Copy ID to clipboardFormat ID copied video/avi .avi
BMP (Device-independent bitmap)Copy ID to clipboardFormat ID copied image/bmp .bmp, .dib
BPF (BAS Partitur Format)Copy ID to clipboardFormat ID copied text/plain-bas .par
CHAT (Codes for the Human Analysis of Transcripts)Copy ID to clipboardFormat ID copied text/plain;format-variant =clan-cha , text/x-chat .cha
CHAT-XML (XML serialization of CHAT)Copy ID to clipboardFormat ID copied application/xml;format-va riant=x-chat , text/xml .xml
CMDI (Component Metadata)Copy ID to clipboardFormat ID copied application/x-cmdi+xml .cmdi, .xml
Coma (EXMARaLDA Corpus Manager)Copy ID to clipboardFormat ID copied text/xml .coma
CoNLL (CoNLL unqualified)Copy ID to clipboardFormat ID copied
CoNLL-U (CoNLL-U (Universal Dependencies))Copy ID to clipboardFormat ID copied
CoNLL-U Plus (CoNLL-U Plus (extended Universal Dependencies))Copy ID to clipboardFormat ID copied
CoNLL-X (CoNLL-X)Copy ID to clipboardFormat ID copied
CSV (Comma-separated values)Copy ID to clipboardFormat ID copied text/csv .csv
CWB-VRT (Corpus Workbench Verticalized Text)Copy ID to clipboardFormat ID copied .vrt
DC XML (Dublin Core XML Metadata)Copy ID to clipboardFormat ID copied application/xml .xml
DGD-XML (DGD XML Metadata)Copy ID to clipboardFormat ID copied text/xml .xml
DICOM (Digital Imaging and Communications in Medicine)Copy ID to clipboardFormat ID copied application/dicom .dcm, .dic, .dicom
DOCX (Microsoft Word/Office Open XML)Copy ID to clipboardFormat ID copied application/vnd.openxmlfo rmats-officedocument.word processingml.document .docx
DTABf (Deutsches Textarchiv Basisformat)Copy ID to clipboardFormat ID copied application/tei+xml;forma t-variant=dta , application/tei+xml;forma t-variant=dta;tokenized=[ 0,1] .xml
DXF (AutoCAD Drawing Interchange Format)Copy ID to clipboardFormat ID copied image/vnd.dxf .dxf
EAF (ELAN EAF)Copy ID to clipboardFormat ID copied text/x-eaf+xml, text/xml .eaf
EMU (Emu Speech Database)Copy ID to clipboardFormat ID copied
Erdas.img (ERDAS IMAGINE File Format)Copy ID to clipboardFormat ID copied application/octet-stream .ige, .img
EXB (EXMARaLDA Basic Transcription Format)Copy ID to clipboardFormat ID copied text/xml .exb
EXS (EXMARaLDA Segmented Transcription Format)Copy ID to clipboardFormat ID copied text/xml .exs
F4 (f4transkript file format)Copy ID to clipboardFormat ID copied text/plain .txt
FLAC (Free Lossless Audio Codec)Copy ID to clipboardFormat ID copied audio/flac .flac
FLEx (SIL FieldWorks Language Explorer (FLEx))Copy ID to clipboardFormat ID copied text/xml .xml
FLExText (SIL FieldWorks Language Explorer Interlinear Text)Copy ID to clipboardFormat ID copied text/xml .flextext
FLN (FOLKER)Copy ID to clipboardFormat ID copied text/xml .fln
FoLiA (FoLiA: Format for Linguistic Annotation)Copy ID to clipboardFormat ID copied application/xml .folia.xml, .xml
GeoJSON (Geographic JSON)Copy ID to clipboardFormat ID copied application/geo+json .geojson, .json
GeoTIFF (Geographic Tagged Image File Format)Copy ID to clipboardFormat ID copied image/tiff .gtif, .tif, .tiff
GIF (Graphics Interchange Format)Copy ID to clipboardFormat ID copied image/gif .gif
GML (Geography Markup Language)Copy ID to clipboardFormat ID copied application/gml+xml, application/x-gmz .gml, .xml
GrAF (Graph Annotation Format)Copy ID to clipboardFormat ID copied application/xml .xml
GZIP (GZIP File Format)Copy ID to clipboardFormat ID copied application/gzip .gz
HDF5 (Hierarchical Data Format, Version 5)Copy ID to clipboardFormat ID copied application/x-hdf5 .h5
HTML (Hypertext Markup Language)Copy ID to clipboardFormat ID copied text/html .htm, .html
I5 (DeReKo archiving format)Copy ID to clipboardFormat ID copied application/tei+xml .i5, .xml
JP2 (Joint Photographic Experts Group 2000)Copy ID to clipboardFormat ID copied image/jp2, image/jpx .jp2, .jpx
JPEG (Joint Photographic Experts Group)Copy ID to clipboardFormat ID copied image/jpeg .jpeg, .jpg
JS (JavaScript)Copy ID to clipboardFormat ID copied application/ecmascript, application/javascript .cjs, .es, .js, .mjs
JSON (JavaScript Object Notation)Copy ID to clipboardFormat ID copied application/json .json
JSON-LD (JavaScript Object Notation for Linked Data)Copy ID to clipboardFormat ID copied application/ld+json .jsonld
KML (Keyhole Markup Language)Copy ID to clipboardFormat ID copied application/vnd.google-ea rth.kml+xml , application/vnd.google-ea rth.kmz .kml, .kmz
Lex0 (TEI Lex0 (dictionary encoding))Copy ID to clipboardFormat ID copied application/tei+xml .tei, .xml
LMF (Lexical Markup Framework format)Copy ID to clipboardFormat ID copied application/tei+xml .tei, .xml
LMF:2008 (Lexical Markup Framework 2008 format)Copy ID to clipboardFormat ID copied text/x-lmf+xml .lmf
MapInfo.mif (MapInfo interchange format)Copy ID to clipboardFormat ID copied .mid, .mif
MapInfo.tab (MapInfo native format)Copy ID to clipboardFormat ID copied .tab
MapInfo.wor (MapInfo workspace file)Copy ID to clipboardFormat ID copied .wor
Markdown (Markdown)Copy ID to clipboardFormat ID copied text/markdown .markdown, .md, .mdown, .mkd
MP3 (MPEG Audio Layer III)Copy ID to clipboardFormat ID copied audio/mpeg .mp3
MP4 (MPEG 4 video)Copy ID to clipboardFormat ID copied video/mp4 .mp4
MPEG-1 (MPEG-1 Video Coding (H.261))Copy ID to clipboardFormat ID copied video/mpeg .mpeg, .mpg
MPEG-2 (MPEG-2 Video Encoding (H.262))Copy ID to clipboardFormat ID copied video/mpeg .mpeg, .mpg
MPEG-4 AVC (MPEG-4, Advanced Video Coding (Part 10) (H.264))Copy ID to clipboardFormat ID copied video/mp4 .mp4
NIST SPHERE (NIST SPHERE)Copy ID to clipboardFormat ID copied audio/x-nist .nist
OCFL (Oxford Common File Layout)Copy ID to clipboardFormat ID copied
PAULA (Potsdamer AUstauschformat Linguistischer Annotationen)Copy ID to clipboardFormat ID copied application/xml .xml
PDF (Portable Document Format)Copy ID to clipboardFormat ID copied application/pdf .pdf
PDF/A (PDF for archival preservation)Copy ID to clipboardFormat ID copied application/pdf .pdf
PDF/A-1 (PDF for archival preservation, 2005)Copy ID to clipboardFormat ID copied application/pdf .pdf
PDF/A-2 (PDF for archival preservation, 2011)Copy ID to clipboardFormat ID copied application/pdf .pdf
PDF/A-3 (PDF for archival preservation with support for embedded files, 2012)Copy ID to clipboardFormat ID copied application/pdf .pdf
PDF/A-4 (PDF for archival preservation, 2020)Copy ID to clipboardFormat ID copied application/pdf .pdf
PhonDat1 (PhonDat Data Format #1)Copy ID to clipboardFormat ID copied
PhonDat2 (PhonDat Data Format #2)Copy ID to clipboardFormat ID copied
plainText (Plain text)Copy ID to clipboardFormat ID copied text/plain .txt
PNG (Portable Network Graphics)Copy ID to clipboardFormat ID copied image/png .png
Praat (Praat TextGrid)Copy ID to clipboardFormat ID copied text/plain, text/praat-textgrid .TextGrid
QGIS.qgs (QGIS project file format)Copy ID to clipboardFormat ID copied .qgd, .qgs, .qgz, .qlr, .qml
QuickTime (QuickTime File Format)Copy ID to clipboardFormat ID copied video/quicktime, video/x-quicktime .mov, .qt
RAW (Raw Audio Format)Copy ID to clipboardFormat ID copied audio/raw .raw
SAM (SAM Format)Copy ID to clipboardFormat ID copied text/plain .txt
SAS.sas (Statistical Analysis System (SAS) standard save file)Copy ID to clipboardFormat ID copied application/x-sas .sas
SAS.sd2 (Statistical Analysis System (SAS) Dataset File Format)Copy ID to clipboardFormat ID copied .sas7bdat, .sd2
SAS.xpt (Statistical Analysis System (SAS) Transport File Format)Copy ID to clipboardFormat ID copied application/x-sas-xport .xport, .xpt
Shapefile (ESRI Arc/View ShapeFile)Copy ID to clipboardFormat ID copied application/vnd.shp, x-gis/x-shapefile .shp
SPSS (Statistical Product and Service Solutions - uncategorized formats)Copy ID to clipboardFormat ID copied .dat, .por, .sav, .sps, .spv
SPSS.data+setup (Statistical Product and Service Solutions (data and setup))Copy ID to clipboardFormat ID copied text/plain .dat, .sps
SPSS.por (Statistical Product and Service Solutions (portable format))Copy ID to clipboardFormat ID copied application/x-spss-por .por
SPSS.sav (Statistical Product and Service Solutions (standard dataset save format))Copy ID to clipboardFormat ID copied application/x-spss-sav .sav, .spv, .zsav
SPSS.spv (Statistical Product and Service Solutions (statistics output files))Copy ID to clipboardFormat ID copied .spo, .spv
STATA (STATA - uncategorized formats)Copy ID to clipboardFormat ID copied .DO, .dat, .dta
STATA.data+setup (STATA (data and setup))Copy ID to clipboardFormat ID copied text/plain .DO, .dat
STATA.dta (STATA (standard save format))Copy ID to clipboardFormat ID copied .dta
SVG (Scalable Vector Graphics)Copy ID to clipboardFormat ID copied image/svg+xml .svg, .svgz
TAR (Tape Archive File Format)Copy ID to clipboardFormat ID copied application/x-tar .tar
TEI (Text Encoding Initiative)Copy ID to clipboardFormat ID copied application/tei+xml .tei, .xml
TEIHeader (TEI Header elements)Copy ID to clipboardFormat ID copied application/tei+xml .tei, .xml
TEISpoken (ISO/TEI Transcriptions of Spoken Language)Copy ID to clipboardFormat ID copied application/tei+xml;forma t-variant=tei-iso-spoken , application/tei+xml;forma t-variant=tei-iso-spoken; tokenized=[0,1] .tei
TIFF (Tagged Image File Format)Copy ID to clipboardFormat ID copied image/tiff .tif, .tiff
Toolbox (SIL Toolbox)Copy ID to clipboardFormat ID copied text/plain .tbt
Transana (Transana XML format)Copy ID to clipboardFormat ID copied text/xml .xml
TRS (Transcriber)Copy ID to clipboardFormat ID copied text/xml .trs
TSV (Tab Separated Values)Copy ID to clipboardFormat ID copied text/tab-separated-values .tsv
WAVE (Waveform Audio File Format)Copy ID to clipboardFormat ID copied audio/vnd.wave, audio/wav, audio/wave, audio/x-wav .wav, .wave
Worldfile (Esri World File)Copy ID to clipboardFormat ID copied text/plain .wld
Worldfile.jpgw (JPEG World File)Copy ID to clipboardFormat ID copied text/plain .jgw, .jpgw
Worldfile.tifw (TIFF World File)Copy ID to clipboardFormat ID copied text/plain .tfw, .tifw
XHTML (EXtensible HyperText Markup Language)Copy ID to clipboardFormat ID copied application/xhtml+xml .html
XLSX (Microsoft Excel/Office Open XML)Copy ID to clipboardFormat ID copied application/vnd.openxmlfo rmats-officedocument.spre adsheetml.sheet .xlsx
XML (eXtensible Markup Language)Copy ID to clipboardFormat ID copied application/xml .xml
ZIP (ZIP File Format)Copy ID to clipboardFormat ID copied application/zip .zip