Portable Document Format
Abbreviation: PDF
Identifiers:
Type | Id | |
---|---|---|
SIS ID | fPDF | Copy ID to clipboardSIS ID copied |
Media type(s):
-
application/pdf
File extension(s): .pdf
Format family: Binary
Functional domains:
- Documentation
- Image Source Language Data
- Other
- Text Annotation
- Textual Source Language Data
Recommendations:
Centre | Domain | Level | Comments |
---|---|---|---|
ACDH-ARCHE | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | acceptable | |
CLARIN-CH | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | discouraged | |
CLARIN-CH | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | discouraged | |
CLARIN.SI | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | recommended | |
CLARIN.SI | Image Source Language DataDigitized images of analogue sources of written language data for research purposes (e.g. facsimiles, scans of handwriting, photos of inscriptions). | recommended | |
CLARIN.SI | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | discouraged | |
CLARIN.SI | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | discouraged | |
CLARINO_Bergen | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | acceptable | |
DANS | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | acceptable |
|
DANS | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | acceptable |
|
FIN-CLARIN | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | acceptable | |
ILC4CLARIN | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | acceptable | |
ILC4CLARIN | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | acceptable | |
MI | OtherAny other function that cannot be included in an existing domain. The content of this domain will be periodically examined for potential patterns that may give rise to new domains. | acceptable | |
MI | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | acceptable | |
MPI-PL | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | recommended | |
ORTOLANG | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | acceptable | |
ORTOLANG | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | acceptable | |
OTA | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | acceptable | |
PORTULAN-CLARIN | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | discouraged | |
PORTULAN-CLARIN | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | discouraged | |
SAW | Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off. | discouraged | |
SAW | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | acceptable |
|
SAW | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | acceptable | |
Sprakbanken | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | acceptable | |
UdS | DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines. | acceptable | |
UdS | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | acceptable | |
ZIM | OtherAny other function that cannot be included in an existing domain. The content of this domain will be periodically examined for potential patterns that may give rise to new domains. | acceptable | |
ZIM | Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes. | acceptable |
Description:
For the purpose of CLARIN format recommendations, we treat PDF as a format collection that is divided along two lines: the development line from 1.0 through 1.7 and 2.0 (the latter two defined by ISO standards). We attempt to treat all these and future versions as "PDF", on the understanding that centres expect relatively recent versions.
Another line of division concerns the published subsets of PDF, namely: (after Wikipedia)
- PDF/X ("PDF for Exchange"
- PDF/A ("PDF for Archive")
- PDF/E ("PDF for Engineering")
- PDF/VT ("PDF for exchange of variable data and transactional (VT) printing")
- PDF/UA ("PDF for Universal Accessibility")
- PDF/raster 1.0 (for storing, transporting and exchanging multi-page raster-image documents, especially scanned documents)
Out of these, PDF/A (which itself is another format collection) is of special interest to CLARIN, because of its suitability for long-term archiving, such as font linking (as opposed to font embedding) and encryption.
Keywords: document format, binarized TextualData
Relations
Legend: | |
|
isDefinedBy |
|
isRelatedTo |