Home
/
Blog
/
PDF/A Format and its difference from PDF Format

PDF/A Format and its difference from PDF Format

PDF/A (Archival)

PDF/A (Archival Portable Document Format) or commonly referred to as the archival PDF is a type of standard PDF, which is used to store information for a longer period of time as compared to the traditional PDF. PDF/A is an ISO-standardized variation of PDF and is widely used to preserve electronic documents and files, digitally for longer periods of time.

History and Purpose

The traditional PDF document format relies on lots of external information when a document is created. This external information can fonts used from external font libraries, third-party encryption algorithms, or some external color scheme.

A drawback of this external information usage is that standards often change in the long run and it is not possible to retrieve the same information from the document as it was encoded. Long term data storage is not feasible with traditional PDF encoding. PDF documents are the most widely used documents.

PDF/A was the joint venture among three organizations, Publishing, and Converting Technologies, The Association for Suppliers of Printing and Association for Information and Image Management. These organizations collaborated together to develop such a PDF version which could avoid the use of external information in document creation and hence the resulting document can be stored for longer periods of time.

Usage

Legal documents need to be stored for a longer period of time for the sake of keeping the record, PDF/A is most suited to this sector.
PDF/A is also suitable for digital libraries where books need to be stored for longer periods of time.
Newspaper and print media makes wide use of PDF/A document format for newspaper archives.

Converting PDF to text and extracting images from PDF are a very common task for developers and that can be solved with Bytescout PDF Extractor SDK.

Levels of PDF/A

PDF/A-1

It is Part 1 of the standard that was first published on 28th September 2005 and it has two levels of conformance for PDF files.
PDF/A-1a is level A (accessible) conformance. It offers boosted access to content but requires more effort during the creation process. It enables the contents of the file to be more correctly obtained and deduced. This level was developed to increase the accessibility of conforming files for users such as the physically impaired.
PDF/A-1b is level B (basic) conformance. This is easier to create and ensures the reproducibility of the content, only requiring that the standards followed precisely.

PDF/A-2

It is Part 2 of the standard published on 20th, June 2011. It offers new features such as JPEG 2000 image compression, the embedding of OpenType fonts, support for transparency layers, PAdES standards and the option of embedding PDF/A files to enable the archiving of document sets within a single file.
It is defined by 3 conformance levels, PDF/A-2a and PDF/A-2b which parallel to the level A and level B conformance in PDF/A-1 and a new conformance level, PDF/A-2u which is a level B with the requirement that all text within the document should have a Unicode mapping.

PDF/A-3

It is Part 3 of the standard published on 15th, October 2012. It allows the embedding of arbitrary file formats such as CAD, CSV, XML, spreadsheets, and others into PDF/A conforming documents.

PDF/A-4

It will be Part 4 of the standard. It is expected to be published in 2019.

Viewing PDF/A files

A PDF/A file can be recognized through PDF/A specific metadata which represents a claim of conformance though it does not ensure conformance.

PDF/A viewers such as Adobe Acrobat Reader will alert the user about activating PDF/A viewing mode. Viewers may even allow users to stop the PDF/A viewing mode or remove the PDF/A information from the file.

A PDF/A viewer must meet certain criteria to display the PDF/A file correctly. These include certifying that annotations are precisely rendered, guaranteeing that form fields do not change from the original, only showing the embedded color profile, exclusively using the embedded fonts, ignoring any linearization of information provided by the file and ignoring any data that are not described by the PDF/A standards.

Disadvantages of PDF/A

PDF/A documents require the content within to be embedded. The content will include color information, text, images, fonts, and other elements. It is understandable then that a PDF/A file would be much larger than its PDF equivalent. However, PDF/A-3 allows arbitrary files to be embedded into PDF/A files. There are archivists who are concerned that this will be problematic.

According to the PDF Association, in converting from PDF files to PDF/A files, some problems may be encountered. A typical problem in converting from PDF files to PDF/A-2 files is glyphs being altered.

A glyph is an elemental symbol such as a letter with an agreed set of symbols which may all be recognized as that specific elemental symbol or letter when reading or writing. For example, the letter “a” can be written in so many different ways but despite the differences, it will be recognized as an “a.”

To discover this problem would require a careful visual check. This is may not seem like a huge problem but it means that a PDF/A file that meets the standard and is stored might, later on, be opened and viewed or printed with undesirable results.

The root cause of the glyph problem is because, in PDF/A, text usage is meant to be unique so that it cannot be incorrect but this malfunctions as the glyph changes. Also, generation issues may affect the Unicode mapping.

Differences between PDF/A and PDF

The following table contains the difference between the PDF/A and PDF document format.

Criteria	PDF/A	PDF
Storage Duration	Long Term Storage	Short Term Storage
Graphics and Fonts	Embedded within the PDF/A file	No Need to embed within the document.
Executable Content	Doesn’t allow video, audio, and other executable content.	Can embed executable content within the document.
External References(Links)	Doesn’t allow external links and references in the document.	Allows external links and references.
Encryption	Doesn’t allow encryption in the documents	Allows encryption in the documents.
File Size	Larger file size due to embedded information	Smaller file size due to external referencing.

The downside of PDF/A

A drawback of PDF/A file format as mentioned in the table is that the file size of this document is larger since it doesn’t contain any external referencing and everything has to be embedded within a file so that after a long period of time the information in the file can be correctly retrieved as was once embedded.