Skip to content

Open Data Connectivity for Text-based and Document Editing Systems

Documents and Text Data are generated via the employment of word processing or text editing software. Examples encompass meeting minutes, organizational diagrams, personal planners, schedules, emails, reports, briefing documents, legal analyses, and administrative orders. Key attributes of...

Open Data Linkage for Text and Word Processing Applications
Open Data Linkage for Text and Word Processing Applications

Open Data Connectivity for Text-based and Document Editing Systems

The National Archives and Records Administration (NARA) has unveiled its Textual and Word Processing Preservation Plan, a crucial component of its broader digital preservation efforts. This plan aims to ensure the long-term accessibility and usability of electronic textual and word processing records.

The plan categorises electronic records based on their types, developing preservation strategies tailored to each category. It includes specifications, standards, and documentation for various file formats, where available. The plan outlines proposed preservation actions, including migration when necessary, to ensure continued access to digital information. It also lists recommended tools for processing and preservation actions.

Common textual and word processing formats included in NARA's holdings are plain text files (.txt), Markdown (.md), Microsoft Word (.docx, .doc), LibreOffice (.odt), and other proprietary formats like Apple Pages.

NARA's Linked Open Data includes a wide range of file formats, but it does not provide specific "format IDs" in the search results. However, the Digital Preservation Framework, accessible as Linked Open Data, includes a comprehensive list of file formats within its holdings. These formats are managed through tools and strategies outlined in the preservation plans.

NARA's Linked Open Data is available in Resource Description Framework Terse RDF Triple Language (RDF Turtle) format, which can be opened in any text editor. This data includes numerous file IDs such as NF00113 (.ttl), NF00114 (.ttl), NF00846 (.ttl), NF00156 (.ttl), NF00172 (.ttl), NF00686 (.ttl), NF00187 (.ttl), NF00561 (.ttl), and NF00162 (.ttl).

The Digital Preservation Framework also includes the DEC WPS Plus file format (.wpl), ASCII 7-bit Text (.txt), ASCII 8-bit Text (.txt), ASCII unspecified version (.asc), Xtensible Markup Language 1.0 (.xml) and 1.1 (.xml), the Extensible Forms Description Language (XFDL), and the DTD and Scheme XSD under the Web Records section.

The Textual and Word Processing Preservation Plan serves as documentation for the significant properties of these records and can be used as test criteria for tools and processes in format transformations. It is also used to evaluate tools and processes for format transformations.

For detailed information on specific formats included in NARA's preservation plans, accessing the Linked Open Data resources or the GitHub repository for the Preservation Plans would be necessary. This would provide a more exhaustive list of formats and their associated preservation strategies. The locations to find these files are provided on NARA's website. The Digital Preservation Framework as Linked Open Data includes the same elements as the version of the Preservation Plans available on GitHub, but it is not exhaustive nor universally applicable.

Data-and-cloud-computing technologies are utilized in the Textual and Word Processing Preservation Plan, as the plan outlines recommended tools for processing and preservation actions, including migration to cloud-based formats when necessary. This plan, a component of NARA's digital preservation efforts, includes specifications, standards, and documentation for various file formats, such as Microsoft Word (.docx, .doc), LibreOffice (.odt), and Apple Pages, which fall under the category of technology-based electronic records.

Read also:

    Latest