Plain text

Abbreviation: plainText

Identifiers:

Type	Id
SIS ID	fTextPlain	Copy ID to clipboardSIS ID copied

Media type(s):

text/plain

File extension(s): .txt

Format family: Plain.Running

Functional domains:

Audiovisual Annotation
Contextual Data
Documentation
Lexical Resource
Text Annotation
Textual Source Language Data

Recommendations:

Centre	Domain	Level	Comments
ACDH-ARCHE	DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines.	recommended
ACDH-ARCHE	Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes.	acceptable
BBAW	DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines.	recommended
BBAW	Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes.	acceptable
CLARIN-CH	Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation.	discouraged
CLARIN-CH	Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes.	recommended
CLARIN-DK-UCPH	DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines.	acceptable
CLARIN-DK-UCPH	Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes.	recommended
CLARIN.SI	Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes.	recommended
CLST	Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes.	recommended
DANS	DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines.	recommended	Encoded as UTF-8/16/32, see more info from DANS
DANS	Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes.	recommended	Encoded as UTF-8/16/32, see more info from DANS
EKUT	Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes.	recommended
FIN-CLARIN	Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation.	discouraged
FIN-CLARIN	Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes.	recommended	UTF-8 encoded
FIN-CLARIN	DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines.	recommended	e.g. as README.txt
IDS	Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation.	discouraged
IDS	DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines.	recommended
IDS	Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off.	discouraged
IDS	Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes.	acceptable	ohne Mark-up
ILC4CLARIN	Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes.	recommended
LAC	Contextual DataImages (photos or drawings) or documents relevant to the communicative event or text but not part of the source language data.	recommended	UTF-8 encoding
MI	DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines.	recommended
MI	Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes.	acceptable
MPI-PL	Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes.	recommended
ORTOLANG	Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes.	recommended
OTA	Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation.	discouraged
OTA	DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines.	recommended
OTA	Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off.	discouraged
OTA	Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes.	acceptable	without markup
PORTULAN-CLARIN	DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines.	recommended
PORTULAN-CLARIN	Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes.	recommended	ideally, UTF-8 encoded
PORTULAN-CLARIN	Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off.	acceptable
PORTULAN-CLARIN	Lexical ResourceStructured (item-based) resources for lexical and/or conceptual information on units of language (e.g. wordlists, lexicons, WordNets etc.)	acceptable	for simple word lists
SAW	Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation.	discouraged
SAW	Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off.	discouraged
SAW	DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines.	recommended
SAW	Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes.	recommended
Sprakbanken	Audiovisual AnnotationAnnotations of audiovisual sources, usually including a basic rendering of the spoken content (transcription) and sometimes further annotation.	discouraged
Sprakbanken	DocumentationUnstructured documentation of the resource and its parts such as corpus or annotation guidelines.	recommended
Sprakbanken	Text AnnotationAnnotations of textual sources/written text, with the original text included or as stand-off.	discouraged
Sprakbanken	Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes.	acceptable	without markup
UdS	Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes.	recommended
ZIM	Textual Source Language DataWritten unstructured/plain text or originally structured text (e.g. HTML) without linguistic or other mark-up added for research purposes.	recommended

Description:

Plain text is a pure sequence of character codes. (...) Plain text represents character content only, not its appearance. (...) Plain text must contain enough information to permit the text to be rendered legibly, and nothing more. (Unicode 6.1, section 2.2)

See: Unicode 6.1, Wikipedia article for a broader context.

Parameters important to plain text are, among others, its encoding and the platform-dependent end-of-line markup. Higher-level parameters include text directionality and further, e.g., the natural language that sequences of characters are meant to represent.

Whitespace characters in plain text are often used to provide rough structural markup (e.g. double whitespace after sentence-final punctuation; end-of-line and multiples thereof to signal division into paragraphs). When treated as a data format, plain text uses whitespace to signal division into columns -- this is how it is related to formats such as TSV and column-based formats in general (with CSV, where tabs are replaced by commas, being a close relative).

Keywords: plain text format

Related Standard(s):

SpecUCSUCS
SpecUnicodeUnicode

Relations

Legend:
	isUsedBy

Home
Centres
Format Recommendations
	Data Deposition Formats
	Functional Domains
	File Extensions
	Media Types
	Statistics
		Popular Formats
		Centre Statistics
		Relevant KPIs
	Sanity Check
		Keywords
		Media Types
Standards and Specifications
	Standard Bodies
	Topics
	Search
API
About / F.A.Q.