- CLARIN (C-centre)
Preferred formats are file formats of which DANS is confident that they will offer the best long-term guarantees in terms of usability, accessibility and sustainability. Deposits of research data in preferred formats will always be accepted by DANS.
Non-preferred formats are file formats that are widely used in addition to the preferred formats, and which will be moderately to reasonably usable, accessible and robust in the long term.
As a general guideline, DANS believes that the file formats best suited for long-term sustainability and accessibility:
- Are frequently used
- Have open specifications
- Are independent of specific software, developers or vendors
In practice, it is not always possible to use formats which satisfy all of these criteria.
It may be desirable to make certain original data available in ‘Non-preferred format(s)’ because these can be characterized as current usage formats. Examples include Esri Shapefiles, Microsoft Access databases, SPSS .sav files. DANS then asks you to deposit your data in these original formats as well as in Preferred formats aimed at long-term sustainability.
If your data are stored in other formats than those mentioned in the recommendations, please contact DANS.
Format | Domain | Level | Comments |
---|---|---|---|
AACClick to add or suggest missing format information | Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. | acceptable |
|
AIClick to add or suggest missing format information | Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). | acceptable |
|
AIFF | Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. | acceptable |
|
ArcGIS.gdb | GeodataInformation on geographic locations. | acceptable |
|
ArcGIS.mxd | GeodataInformation on geographic locations. | acceptable |
|
ASCII Grid | GeodataInformation on geographic locations. | recommended |
|
AutoCAD DXF-R12 | GeodataInformation on geographic locations. | recommended |
|
AVI | Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. | acceptable |
|
BWFClick to add or suggest missing format information | Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. | recommended |
|
CDRClick to add or suggest missing format information | Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). | acceptable |
|
CSSClick to add or suggest missing format information | Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) | recommended |
|
CSV | OtherAny other function that cannot be included in an existing domain. The content of this domain will be periodically examined for potential patterns that may give rise to new domains. | recommended |
|
DBASEClick to add or suggest missing format information | OtherAny other function that cannot be included in an existing domain. The content of this domain will be periodically examined for potential patterns that may give rise to new domains. | acceptable |
|
DGNClick to add or suggest missing format information | GeodataInformation on geographic locations. | acceptable |
|
DICOM | Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). | recommended |
|
DOCClick to add or suggest missing format information | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | acceptable |
|
DOCClick to add or suggest missing format information | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | acceptable |
|
DOCX | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | acceptable |
|
DOCX | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | acceptable |
|
DWGClick to add or suggest missing format information | GeodataInformation on geographic locations. | acceptable |
|
DXF | GeodataInformation on geographic locations. | acceptable |
|
EPSClick to add or suggest missing format information | Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). | acceptable |
|
Erdas.img | GeodataInformation on geographic locations. | acceptable |
|
FLAC | Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. | recommended |
|
GeoJSON | GeodataInformation on geographic locations. | recommended | |
GeoTIFF | GeodataInformation on geographic locations. | recommended |
|
GML | GeodataInformation on geographic locations. | recommended |
|
HDF5 | PackagingPackaging formats of various nature (archiving, compression, library) if no more specific domain is suitable. | acceptable |
|
HTML | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | recommended |
|
HTML | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | recommended |
|
JP2 | Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). | recommended |
|
JPEG | Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). | recommended |
|
JS | Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) | recommended |
|
KML | GeodataInformation on geographic locations. | acceptable |
|
MapInfo.mif | GeodataInformation on geographic locations. | recommended |
|
MapInfo.tab | GeodataInformation on geographic locations. | acceptable |
|
MapInfo.wor | GeodataInformation on geographic locations. | acceptable |
|
Markdown | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | acceptable |
|
Markdown | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | acceptable |
|
MATLABClick to add or suggest missing format information | Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) | recommended |
|
MKVClick to add or suggest missing format information | Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. | recommended |
|
MP3 | Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. | acceptable |
|
MP4 | Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. | acceptable |
|
MPEG-2 | Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. | acceptable |
|
MSAccessClick to add or suggest missing format information | OtherAny other function that cannot be included in an existing domain. The content of this domain will be periodically examined for potential patterns that may give rise to new domains. | acceptable |
|
MXFClick to add or suggest missing format information | Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. | recommended |
|
NetCDFClick to add or suggest missing format information | Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) | recommended |
|
ODSClick to add or suggest missing format information | OtherAny other function that cannot be included in an existing domain. The content of this domain will be periodically examined for potential patterns that may give rise to new domains. | recommended |
|
ODTClick to add or suggest missing format information | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | recommended |
|
ODTClick to add or suggest missing format information | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | recommended |
|
OGGClick to add or suggest missing format information | Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. | acceptable |
|
OPUSClick to add or suggest missing format information | Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. | recommended |
|
DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | acceptable |
|
|
Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | acceptable |
|
|
PDF/A | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | recommended |
|
PDF/A | OtherAny other function that cannot be included in an existing domain. The content of this domain will be periodically examined for potential patterns that may give rise to new domains. | acceptable |
|
PDF/A | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | recommended |
|
plainText | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | recommended |
|
plainText | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | recommended |
|
PNG | Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). | recommended |
|
QGIS.qgs | GeodataInformation on geographic locations. | acceptable |
|
QTClick to add or suggest missing format information | Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. | acceptable |
|
RClick to add or suggest missing format information | Statistical DataData from surveys and tests in numeric formats. | recommended |
|
RTFClick to add or suggest missing format information | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | acceptable |
|
RTFClick to add or suggest missing format information | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | acceptable |
|
SAS.sd2 | Statistical DataData from surveys and tests in numeric formats. | acceptable |
|
SGMLClick to add or suggest missing format information | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | acceptable |
|
SGMLClick to add or suggest missing format information | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | acceptable |
|
Shapefile | GeodataInformation on geographic locations. | acceptable |
|
SIARDClick to add or suggest missing format information | OtherAny other function that cannot be included in an existing domain. The content of this domain will be periodically examined for potential patterns that may give rise to new domains. | recommended |
|
SPSS.data+setup | Statistical DataData from surveys and tests in numeric formats. | recommended |
|
SPSS.por | Statistical DataData from surveys and tests in numeric formats. | acceptable |
|
SPSS.sav | Statistical DataData from surveys and tests in numeric formats. | acceptable |
|
SQLClick to add or suggest missing format information | OtherAny other function that cannot be included in an existing domain. The content of this domain will be periodically examined for potential patterns that may give rise to new domains. | recommended |
|
STATA.data+setup | Statistical DataData from surveys and tests in numeric formats. | recommended |
|
STATA.dta | Statistical DataData from surveys and tests in numeric formats. | acceptable |
|
SVG | Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). | recommended |
|
SVG | OtherAny other function that cannot be included in an existing domain. The content of this domain will be periodically examined for potential patterns that may give rise to new domains. | recommended |
|
TextFabricClick to add or suggest missing format information | Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) | recommended |
|
TIFF | Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). | recommended |
|
WAVE | Audiovisual Source Language DataAudio or video recordings providing spoken/multimodal or signed language data for research purposes. | acceptable |
|
WMFClick to add or suggest missing format information | Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). | acceptable |
|
Worldfile.jpgw | GeodataInformation on geographic locations. | acceptable |
|
Worldfile.tifw | GeodataInformation on geographic locations. | acceptable |
|
XLSClick to add or suggest missing format information | OtherAny other function that cannot be included in an existing domain. The content of this domain will be periodically examined for potential patterns that may give rise to new domains. | acceptable |
|
XLSX | OtherAny other function that cannot be included in an existing domain. The content of this domain will be periodically examined for potential patterns that may give rise to new domains. | acceptable |
|
XML | Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. | recommended |
|
XML | Catalogue MetadataBasic structured information for discoverability and general description, to be openly provided for harvesting. | recommended |
|
XML | Contextual InformationStructured information on the communicative event or text and its creators (i.e. participants or authors) relevant for analysis. | recommended |
|
XML | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | recommended |
|
XML | GeodataInformation on geographic locations. | recommended |
|
XML | Image AnnotationAnnotations of image sources. | recommended |
|
XML | Language DescriptionStructured or unstructured descriptions of linguistic varieties or phenomena, typological databases etc. | recommended |
|
XML | Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) | recommended |
|
XML | MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. | recommended |
|
XML | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | recommended |
|
XML | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | recommended |
|
XML | Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) | recommended |
|
XSLTClick to add or suggest missing format information | Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) | recommended |
|