Show all info regardless research infrastructures. Switch to CLARIN environment and show only relevant info to CLARIN, e.g. format recommendations by CLARIN centres. Switch to Text+ environment and show only relevant info to Text+, e.g. format recommendations by Text+ centres. Switch to DARIAH environment and show only relevant info to DARIAH, e.g. format recommendations by DARIAH centres.
SAW Leipzig
Abbreviation: SAW
Research infrastructure:
  • CLARIN (B-centre)
  • Text+ (Lexical Resources, Operations)

Das Repositorium der Sächsischen Akademie der Wissenschaften zu Leipzig bietet die langfristige Sicherung digitaler Ressourcen und ihrer Metadaten. Der Auftrag des Repositoriums ist es die Verfügbarkeit und langfristige Sicherung von Forschungsdaten sicherzustellen, Forschungsergebnisse zu sichern, den Wissenstransfer in neue Fachbereiche zu erleichtern und neuartige Methoden und Ressourcen in den universitären Lehrplan zu integrieren. Ein besonderer inhaltlicher Fokus liegt auf lexikalischen Ressourcen und Sprachressourcen für sogenannte "unterrepräsentierte" Sprachen.

Falls kein empfohlenes, standardisiertes und dokumentiertes Format verwendet wird, muss eine umfassende Dokumentation zur Syntax und Semantik der Daten bereitgestellt werden (z. B. bei Datenbank-Dumps: Namen von Tabellen und Spalten; Spezifikationen und Beispiele zum Inhalt jeder Spalte; Beispiele zum Abrufen verschiedene Arten von Daten). Diese Dokumentation (Englisch, PDF) wird zusammen mit den Daten und Metadaten im Repositorium gespeichert und allen zur Verfügung gestellt, welche die Ressource herunterladen bzw. auf sie zugreifen möchten.

Auf dem Webportal des Repositoriums sind weiterführende Informationen zu Datenhosting und Metadatenanforderungen zu finden.

Data functions covered by the recommendations: ...
Format recommendations:
Format Domain Level Comments
CMDI Catalogue Metadata Basic structured information for discoverability and general description, to be openly provided for harvesting. recommended
CMDI Contextual Information Structured information on the communicative event or text and its creators (i.e. participants or authors) relevant for analysis. recommended
CMDI Metadata Comprehensive structured information including descriptive, structural and administrative metadata. See the National Information Standards Organization primer on metadata for further hints. recommended
CoNLL-U Text Annotation Annotations of textual sources/written text, with the original text included or as stand-off. recommended
CoNLL-U Textual Source Language Data Written unstructured/plain text or originally structured text (e.g. HTML), without linguistic or other mark-up added for research purposes. recommended
CoNLL-X Text Annotation Annotations of textual sources/written text, with the original text included or as stand-off. acceptable consider using CoNLL-U instead
CoNLL-X Textual Source Language Data Written unstructured/plain text or originally structured text (e.g. HTML), without linguistic or other mark-up added for research purposes. acceptable consider using CoNLL-U instead
CSV Metadata Comprehensive structured information including descriptive, structural and administrative metadata. See the National Information Standards Organization primer on metadata for further hints. acceptable
CSV Lexical Resource Structured (item-based) resources for lexical and/or conceptual information on units of language (e.g., wordlists, lexicons, WordNets, etc.) recommended
DC XML Metadata Comprehensive structured information including descriptive, structural and administrative metadata. See the National Information Standards Organization primer on metadata for further hints. recommended
DOCX Audiovisual Annotation Annotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and, sometimes, further annotation. discouraged
DOCX Documentation Unstructured documentation of the resource and its parts such as corpus or annotation guidelines. discouraged
DOCX Metadata Comprehensive structured information including descriptive, structural and administrative metadata. See the National Information Standards Organization primer on metadata for further hints. discouraged
DOCX Textual Source Language Data Written unstructured/plain text or originally structured text (e.g. HTML), without linguistic or other mark-up added for research purposes. acceptable
HTML Documentation Unstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
HTML Textual Source Language Data Written unstructured/plain text or originally structured text (e.g. HTML), without linguistic or other mark-up added for research purposes. acceptable
JSON Text Annotation Annotations of textual sources/written text, with the original text included or as stand-off. acceptable
JSON Metadata Comprehensive structured information including descriptive, structural and administrative metadata. See the National Information Standards Organization primer on metadata for further hints. acceptable regular and structured; consider using JSONLD with a schema
JSON Textual Source Language Data Written unstructured/plain text or originally structured text (e.g. HTML), without linguistic or other mark-up added for research purposes. acceptable
JSON-LD Text Annotation Annotations of textual sources/written text, with the original text included or as stand-off. recommended
JSON-LD Metadata Comprehensive structured information including descriptive, structural and administrative metadata. See the National Information Standards Organization primer on metadata for further hints. recommended
LMF Lexical Resource Structured (item-based) resources for lexical and/or conceptual information on units of language (e.g., wordlists, lexicons, WordNets, etc.) recommended
Markdown Documentation Unstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
Markdown Textual Source Language Data Written unstructured/plain text or originally structured text (e.g. HTML), without linguistic or other mark-up added for research purposes. acceptable
PDF Text Annotation Annotations of textual sources/written text, with the original text included or as stand-off. discouraged
PDF Documentation Unstructured documentation of the resource and its parts such as corpus or annotation guidelines. acceptable consider using PDFA instead
PDF Textual Source Language Data Written unstructured/plain text or originally structured text (e.g. HTML), without linguistic or other mark-up added for research purposes. acceptable
PDF/A Documentation Unstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
plainText Audiovisual Annotation Annotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and, sometimes, further annotation. discouraged
plainText Text Annotation Annotations of textual sources/written text, with the original text included or as stand-off. discouraged
plainText Documentation Unstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
plainText Textual Source Language Data Written unstructured/plain text or originally structured text (e.g. HTML), without linguistic or other mark-up added for research purposes. recommended
RDFXMLClick to add or suggest missing format information Text Annotation Annotations of textual sources/written text, with the original text included or as stand-off. recommended
TEI Audiovisual Annotation Annotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and, sometimes, further annotation. recommended
TEI Text Annotation Annotations of textual sources/written text, with the original text included or as stand-off. recommended
TEIHeader Metadata Comprehensive structured information including descriptive, structural and administrative metadata. See the National Information Standards Organization primer on metadata for further hints. recommended
TEISpoken Audiovisual Annotation Annotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and, sometimes, further annotation. recommended
TSV Lexical Resource Structured (item-based) resources for lexical and/or conceptual information on units of language (e.g., wordlists, lexicons, WordNets, etc.) acceptable
XML Audiovisual Annotation Annotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and, sometimes, further annotation. recommended
XML Image Annotation Annotations of image sources. recommended
XML Text Annotation Annotations of textual sources/written text, with the original text included or as stand-off. recommended
XML Catalogue Metadata Basic structured information for discoverability and general description, to be openly provided for harvesting. acceptable
XML Contextual Information Structured information on the communicative event or text and its creators (i.e. participants or authors) relevant for analysis. acceptable
XML Metadata Comprehensive structured information including descriptive, structural and administrative metadata. See the National Information Standards Organization primer on metadata for further hints. acceptable
XML Language Description Structured or unstructured descriptions of linguistic varieties or phenomena, typological databases, etc. recommended
ZIP Packaging Packaging formats of various nature (archiving, compression, library) if no more specific domain is suitable. recommended
Last update commit-id: 8468186f
Suggest a fix or extension