- CLARIN (B-centre)
- Text+ (Lexical Resources, Operations)
- Felix Helfer (April 23, 2024)
Das Repositorium der Sächsischen Akademie der Wissenschaften zu Leipzig bietet die langfristige Sicherung digitaler Ressourcen und ihrer Metadaten. Der Auftrag des Repositoriums ist es die Verfügbarkeit und langfristige Sicherung von Forschungsdaten sicherzustellen, Forschungsergebnisse zu sichern, den Wissenstransfer in neue Fachbereiche zu erleichtern und neuartige Methoden und Ressourcen in den universitären Lehrplan zu integrieren. Ein besonderer inhaltlicher Fokus liegt auf lexikalischen Ressourcen und Sprachressourcen für sogenannte "unterrepräsentierte" Sprachen.
Falls kein empfohlenes, standardisiertes und dokumentiertes Format verwendet wird, muss eine umfassende Dokumentation zur Syntax und Semantik der Daten bereitgestellt werden (z. B. bei Datenbank-Dumps: Namen von Tabellen und Spalten; Spezifikationen und Beispiele zum Inhalt jeder Spalte; Beispiele zum Abrufen verschiedene Arten von Daten). Diese Dokumentation (Englisch, PDF) wird zusammen mit den Daten und Metadaten im Repositorium gespeichert und allen zur Verfügung gestellt, welche die Ressource herunterladen bzw. auf sie zugreifen möchten.
Auf dem Webportal des Repositoriums sind weiterführende Informationen zu Datenhosting und Metadatenanforderungen zu finden.
- Audiovisual Annotation
- Image Annotation
- Text Annotation
- Catalogue Metadata
- Contextual Information
- Documentation
- Metadata
- Language Description
- Lexical Resource
- Textual Source Language Data
- Tool Support
Format | Domain | Level | Comments |
---|---|---|---|
CMDI | Catalogue MetadataBasic structured information for discoverability and general description, to be openly provided for harvesting. | recommended | |
CMDI | Contextual InformationStructured information on the communicative event or text and its creators (i.e. participants or authors) relevant for analysis. | recommended | |
CMDI | MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. | recommended | |
CoNLL-U | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | recommended | |
CoNLL-U | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | recommended | |
CoNLL-X | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | acceptable |
|
CoNLL-X | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | acceptable |
|
CSV | MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. | acceptable | |
CSV | Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) | recommended | |
DC XML | MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. | recommended | |
DOCX | Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. | discouraged | |
DOCX | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | discouraged | |
DOCX | MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. | discouraged | |
DOCX | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | acceptable | |
HTML | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | recommended | |
HTML | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | acceptable | |
JSON | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | acceptable | |
JSON | MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. | acceptable |
|
JSON | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | acceptable | |
JSON-LD | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | recommended | |
JSON-LD | MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. | recommended | |
LMF | Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) | recommended | |
Markdown | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | recommended | |
Markdown | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | acceptable | |
Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | discouraged | ||
DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | acceptable |
|
|
Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | acceptable | ||
PDF/A | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | recommended | |
plainText | Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. | discouraged | |
plainText | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | discouraged | |
plainText | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | recommended | |
plainText | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | recommended | |
RDFXMLClick to add or suggest missing format information | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | recommended | |
TEI | Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. | recommended | |
TEI | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | recommended | |
TEIHeader | MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. | recommended | |
TEISpoken | Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. | recommended | |
TSV | Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.) | acceptable | |
XML | Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation. | recommended | |
XML | Image AnnotationAnnotations of image sources. | recommended | |
XML | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | recommended | |
XML | Catalogue MetadataBasic structured information for discoverability and general description, to be openly provided for harvesting. | acceptable | |
XML | Contextual InformationStructured information on the communicative event or text and its creators (i.e. participants or authors) relevant for analysis. | acceptable | |
XML | MetadataComprehensive structured information including descriptive, structural and administrative metadata. See the for further hints. | acceptable | |
XML | Language DescriptionStructured or unstructured descriptions of linguistic varieties or phenomena, typological databases etc. | recommended | |
ZIP | Tool SupportTool-related formats required for specific functionality of the tool or reliable reuse of resources (e.g. tagsets, annotation schemes, vocabularies, language models, parameter files, and other specifications or settings) | recommended |