Show all info regardless research infrastructures. Switch to CLARIN environment and show only relevant info to CLARIN, e.g. format recommendations by CLARIN centres. Switch to Text+ environment and show only relevant info to Text+, e.g. format recommendations by Text+ centres. Switch to DARIAH environment and show only relevant info to DARIAH, e.g. format recommendations by DARIAH centres.
Analyzed Layout and Text Object
suggest a fix or extension
Abbreviation: ALTO
Type Id
SIS ID fALTO Copy ID to clipboardSIS ID copied
Versions: 4.1
Media type(s):
File extension(s): .xml
Format family: ALTO
Schema location:
Functional domains:
  • Text Annotation
Centre Domain Level Comments
IDS Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. acceptable Conversion to a suitable TEI-based format is expected, per Empfehlungen des DFG-Fachkollegiums 104 “Sprachwissenschaften" (Oct. 2019)
Sprakbanken Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. acceptable Conversion to a suitable TEI-based format is expected.
FIN-CLARIN Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. acceptable
OTA Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. acceptable Conversion to a suitable TEI-based format is expected.

ALTO (Analyzed Layout and Text Object) is an open XML Schema developed by the EU-funded project called METAe.

The standard was initially developed for the description of text OCR and layout information of pages for digitized material. The goal was to describe the layout and text in a form to be able to reconstruct the original appearance based on the digitized information - similar to the approach of a lossless image saving operation.

ALTO is often used in combination with Metadata Encoding and Transmission Standard (METS) for the description of the whole digitized object and creation of references across the ALTO files, e.g. reading sequence description.

The standard is hosted by the Library of Congress since 2010 and maintained by the Editorial Board initialized at the same time.

In the time from the final version of the ALTO standard in June 2004 (version 1.0) ALTO was maintained by CCS Content Conversion Specialists GmbH, Hamburg up to version 1.4. See: Wikipedia article for ALTO.

Keywords: XML, OCR, digitization
Related Standard(s):
