FoLiA: Format for Linguistic Annotation
Abbreviation: FoLiA
Identifiers:
Type | Id | |
---|---|---|
SIS ID | fFoLiA | Copy ID to clipboardSIS ID copied |
Media type(s):
-
application/xml
File extension(s): .xml, .folia.xml
Format family: XML
Functional domains:
- Textual Source Language Data
Recommendations:
Centre | Domain | Level | Comments |
---|---|---|---|
CLARIN.SI | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | recommended | |
CLST | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | recommended |
Description:
FoLiA, an acronym for Format for Linguistic Annotation, is a data model and file format to represent digitised language resources enriched with linguistic annotation, e.g. linguistically enriched textual documents or transcriptions of speech. The format is intended to provide a standard for the storage and exchange of such language resources, including corpora and to promote interoperability between Natural Language Processing tools that use the format.
See https://folia.readthedocs.io/en/latest/index.html for details.
Keywords: annotation format, corpus encoding