Show all info regardless research infrastructures. Switch to CLARIN environment and show only relevant info to CLARIN, e.g. format recommendations by CLARIN centres. Switch to Text+ environment and show only relevant info to Text+, e.g. format recommendations by Text+ centres. Switch to DARIAH environment and show only relevant info to DARIAH, e.g. format recommendations by DARIAH centres.
PDF for archival preservation
suggest a fix or extension
Abbreviation: PDF/A
Identifiers:
Type Id
SIS ID fPDFA Copy ID to clipboardSIS ID copied
LOCLibrary of Congress fdd000318
Media type(s):
File extension(s): .pdf
Format family: PDF
Functional domains extracted from the recommendations:
Recommendations:
Centre Domain Level Comments
CLARIN:EL Image Source Language Data Digitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended scanned images
CLARIN:EL Textual Source Language Data Written unstructured/plain text or originally structured text (e.g. HTML), without linguistic or other mark-up added for research purposes. recommended Formatted/Encoded
CLARINO_Bergen Documentation Unstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
DANS Documentation Unstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended See more info from DANS
DANS Other Any other function that cannot be included in an existing domain. The content of this domain will be periodically examined for potential patterns that may give rise to new domains. acceptable See more info from DANS
DANS Textual Source Language Data Written unstructured/plain text or originally structured text (e.g. HTML), without linguistic or other mark-up added for research purposes. recommended See more info from DANS
EKUT Image Source Language Data Digitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
EKUT Textual Source Language Data Written unstructured/plain text or originally structured text (e.g. HTML), without linguistic or other mark-up added for research purposes. recommended
FIN-CLARIN Documentation Unstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
IDS Documentation Unstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
IDS Image Source Language Data Digitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
LAC Contextual Data Images (photos or drawings) or documents relevant to the communicative event or text, but not part of the source language data. recommended
MI Image Source Language Data Digitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
MI Textual Source Language Data Written unstructured/plain text or originally structured text (e.g. HTML), without linguistic or other mark-up added for research purposes. recommended
OTA Image Source Language Data Digitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
PORTULAN-CLARIN Documentation Unstructured documentation of the resource and its parts such as corpus or annotation guidelines. acceptable
SAW Documentation Unstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
Sprakbanken Image Source Language Data Digitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
UdS Documentation Unstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
UdS Textual Source Language Data Written unstructured/plain text or originally structured text (e.g. HTML), without linguistic or other mark-up added for research purposes. acceptable
ZIM Image Source Language Data Digitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
ZIM Textual Source Language Data Written unstructured/plain text or originally structured text (e.g. HTML), without linguistic or other mark-up added for research purposes. recommended
Description:

PDF/A differs from PDF by prohibiting features unsuitable for long-term archiving, such as font linking (as opposed to font embedding) and encryption. Note that "PDF/A" is actually a collection of formats:

  • PDF/A-1: "Part 1: Use of PDF 1.4" (2005-09-28)
  • PDF/A-2: "Part 2: Use of ISO 32000-1" (2011-06-20)
  • PDF/A-3: "Part 3: Use of ISO 32000-1 with support for embedded files" (2012-10-15)
  • PDF/A-4: "Part 4: Use of ISO 32000-2" (2020-11)

Centres should note that Part 1 references an obsolete version of PDF, while parts 2 and 3 reference the fully open PDF 1.7.

VeraPDF is an open-source validator for PDF/A formats.

Keywords: document format, binarized TextualData
Related Standard(s):
Relations
Legend:

isDefinedBy