Open Data Linkage for Software and Coding Standards
The National Archives and Records Administration (NARA) has unveiled its Digital Preservation Framework, a comprehensive guide aimed at securing the long-term preservation of digital records. This framework outlines preferred and acceptable file formats for various digital assets, as well as strategies for software and code preservation.
At the heart of the framework lies a diverse range of file formats. For text documents, the preferred formats include PDF/A, plain text (.txt), XML, and formats that ensure long-term accessibility and preservation. Images are best preserved in TIFF format, with JPEG2000 and PNG as alternatives. Audio files should be preserved in WAV or Broadcast WAV (BWF) files for lossless preservation, while video formats like MPEG-2 or AVI may be used, depending on sustainability evaluations.
Email archiving is facilitated through standard archival formats like MBOX or EML, while structured data is preserved in CSV, XML, or JSON formats. Geospatial data is preserved in Shapefile or GeoTIFF formats, and software and code are stored in common programming languages in plain text or standard archival formats. Virtual machine images or software containers may also be documented to support software preservation.
The framework places a strong emphasis on long-term sustainability, interoperability, and accessibility. To ensure the ongoing usability of archived code and software, strategies such as software emulation or virtualization are utilised. Metadata about software environment, versions, and dependencies are also preserved to enable future execution or migration. Source code repositories and executable formats are treated as integral preservation objects.
Although the framework does not provide an exhaustive list of specific software titles or languages, it advocates for archiving source code in open, text-based, widely accepted formats and software dependencies and environments in containerized forms or emulated platforms for sustainability.
The framework also includes various file formats and variant versions in NARA holdings, such as Adobe Photoshop Duotone Options files, Adobe Type 1 PostScript Printer Font Binary files, ASP.NET HTTP Handler Files, and A86 assembler source code files. Processing capabilities and tools in use at NARA are also included.
The Software and Code Preservation Plan, a component of the framework, can be used as test criteria for tools and processes used in format transformations. This plan documents the significant properties of software and code records.
It's important to note that the plans in the Digital Preservation Framework are not exhaustive nor universally applicable. Examples of application software include office suites, gaming applications, database systems, and educational software. System software, such as device drivers, operating systems, scripts, compilers, disk formatters, text editors, and utilities, are also included.
The framework is available in Resource Description Framework Terse RDF Triple Language (RDF Turtle) format, making it accessible to a wide range of users. For more detailed information or specific NARA-published file format lists or software examples, consulting NARA's official Digital Preservation Framework or contacting NARA directly would provide the most authoritative and detailed documentation.
In summary, the NARA Digital Preservation Framework is a valuable resource for ensuring the long-term preservation of digital records. It outlines preferred file formats for various digital assets, emphasises long-term sustainability, interoperability, and accessibility, and provides strategies for software and code preservation.
[1] For more information on preferred file formats for transfer and preservation based on sustainability criteria, refer to guidelines from Library and Archives Canada.
Data-and-cloud-computing technology plays a critical role in the NARA Digital Preservation Framework, as it is used to facilitate various file format preservations and software emulation or virtualization strategies. The framework advocates for the archiving of source code in open, text-based, widely accepted formats and software dependencies and environments in containerized forms or emulated platforms for sustainability.