Show all info regardless research infrastructures. Switch to CLARIN environment and show only relevant info to CLARIN, e.g. format recommendations by CLARIN centres. Switch to Text+ environment and show only relevant info to Text+, e.g. format recommendations by Text+ centres. Switch to DARIAH environment and show only relevant info to DARIAH, e.g. format recommendations by DARIAH centres.
Annotation of Multilingual Data

Multilingual data are identical or similar data in two or more different languages. Multilingual data with different levels of annotation have been used in all kinds of language-specific or cross-linguistic research and in various natural language processing tasks. They can be applied for a huge range of applications such as machine translation, speech recognition, information retrieval etc. Some examples of multilingual data are multilingual corpora (e.g. Europarl), international vocabularies (e.g. Agrovoc) or multilingual datasets (e.g. DBpedia, YAGO), multilingual encyclopaedia and ontology (e.g. BabelNet), and multilingual dictionary (e.g. OmegaWiki).

The complexity factors for building multilingual annotated data are the size of the data, the number of languages, and the types of annotation. Some problems in multilingual annotation are encoding characters, recognizing letters, numerals and symbols in the data, linking between different resources refering to the same entity in different languages, and filling in any lexical gap. Lexical mismatches or overlaps in a target language is also a common problem.

Standards dealing with this topic:
  1. Language Resources Management — Multilingual Information Framework