Show all info regardless research infrastructures. Switch to CLARIN environment and show only relevant info to CLARIN, e.g. format recommendations by CLARIN centres. Switch to Text+ environment and show only relevant info to Text+, e.g. format recommendations by Text+ centres. Switch to DARIAH environment and show only relevant info to DARIAH, e.g. format recommendations by DARIAH centres.
Portable Document Format
suggest a fix or extension
Abbreviation: PDF
Identifiers:
Type Id
SIS ID fPDF Copy ID to clipboardSIS ID copied
Media type(s):
File extension(s): .pdf
Format family: Binary
Functional domains:
  • Documentation
  • Image Source Language Data
  • Other
  • Text Annotation
  • Textual Source Language Data
Recommendations:
Centre Domain Level Comments
ACDH-ARCHE DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. acceptable
CLARIN.SI DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. recommended
CLARIN.SI Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). recommended
CLARIN.SI Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. discouraged
CLARIN.SI Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. discouraged
DANS DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. acceptable See more info from DANS
DANS Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable See more info from DANS
FIN-CLARIN DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. acceptable
ILC4CLARIN DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. acceptable
ILC4CLARIN Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable
MI OtherAny other function that cannot be included in an existing domain. The content of this domain will be periodically examined for potential patterns that may give rise to new domains. acceptable
MI Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable
MPI-PL Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. recommended
ORTOLANG DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. acceptable
ORTOLANG Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable
SAW Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. discouraged
SAW DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. acceptable consider using PDFA instead
SAW Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable
Sprakbanken DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. acceptable
UdS DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. acceptable
UdS Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable
ZIM OtherAny other function that cannot be included in an existing domain. The content of this domain will be periodically examined for potential patterns that may give rise to new domains. acceptable
ZIM Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. acceptable
Description:

For the purpose of CLARIN format recommendations, we treat PDF as a format collection that is divided along two lines: the development line from 1.0 through 1.7 and 2.0 (the latter two defined by ISO standards). We attempt to treat all these and future versions as "PDF", on the understanding that centres expect relatively recent versions.

Another line of division concerns the published subsets of PDF, namely: (after Wikipedia)

  • PDF/X ("PDF for Exchange"
  • PDF/A ("PDF for Archive")
  • PDF/E ("PDF for Engineering")
  • PDF/VT ("PDF for exchange of variable data and transactional (VT) printing")
  • PDF/UA ("PDF for Universal Accessibility")
  • PDF/raster 1.0 (for storing, transporting and exchanging multi-page raster-image documents, especially scanned documents)

Out of these, PDF/A (which itself is another format collection) is of special interest to CLARIN, because of its suitability for long-term archiving, such as font linking (as opposed to font embedding) and encryption.

Keywords: document format, binarized TextualData
Related Standard(s):
Relations
Legend:

isDefinedBy

isRelatedTo