PDF/A (Archival Portable Document Format) or commonly referred to as the archival PDF is a type of standard PDF, which is used to store information for a longer period of time as compared to the traditional PDF. PDF/A is an ISO-standardized variation of PDF and is widely used to preserve electronic documents and files, digitally for longer periods of time.
The traditional PDF document format relies on lots of external information when a document is created. This external information can fonts used from external font libraries, third-party encryption algorithms, or some external color scheme.
A drawback of this external information usage is that standards often change in the long run and it is not possible to retrieve the same information from the document as it was encoded. Long term data storage is not feasible with traditional PDF encoding. PDF documents are the most widely used documents.
PDF/A was the joint venture among three organizations, Publishing, and Converting Technologies, The Association for Suppliers of Printing and Association for Information and Image Management. These organizations collaborated together to develop such a PDF version which could avoid the use of external information in document creation and hence the resulting document can be stored for longer periods of time.
Converting PDF to text and extracting images from PDF are a very common task for developers and that can be solved with Bytescout PDF Extractor SDK.
A PDF/A file can be recognized through PDF/A specific metadata which represents a claim of conformance though it does not ensure conformance.
PDF/A viewers such as Adobe Acrobat Reader will alert the user about activating PDF/A viewing mode. Viewers may even allow users to stop the PDF/A viewing mode or remove the PDF/A information from the file.
A PDF/A viewer must meet certain criteria to display the PDF/A file correctly. These include certifying that annotations are precisely rendered, guaranteeing that form fields do not change from the original, only showing the embedded color profile, exclusively using the embedded fonts, ignoring any linearization of information provided by the file and ignoring any data that are not described by the PDF/A standards.
PDF/A documents require the content within to be embedded. The content will include color information, text, images, fonts, and other elements. It is understandable then that a PDF/A file would be much larger than its PDF equivalent. However, PDF/A-3 allows arbitrary files to be embedded into PDF/A files. There are archivists who are concerned that this will be problematic.
According to the PDF Association, in converting from PDF files to PDF/A files, some problems may be encountered. A typical problem in converting from PDF files to PDF/A-2 files is glyphs being altered.
A glyph is an elemental symbol such as a letter with an agreed set of symbols which may all be recognized as that specific elemental symbol or letter when reading or writing. For example, the letter “a” can be written in so many different ways but despite the differences, it will be recognized as an “a.”
To discover this problem would require a careful visual check. This is may not seem like a huge problem but it means that a PDF/A file that meets the standard and is stored might, later on, be opened and viewed or printed with undesirable results.
The root cause of the glyph problem is because, in PDF/A, text usage is meant to be unique so that it cannot be incorrect but this malfunctions as the glyph changes. Also, generation issues may affect the Unicode mapping.
The following table contains the difference between the PDF/A and PDF document format.
Criteria | PDF/A | |
Storage Duration | Long Term Storage | Short Term Storage |
Graphics and Fonts
|
Embedded within the PDF/A file | No Need to embed within the document. |
Executable Content | Doesn’t allow video, audio, and other executable content. | Can embed executable content within the document. |
External References(Links) | Doesn’t allow external links and references in the document. | Allows external links and references. |
Encryption | Doesn’t allow encryption in the documents | Allows encryption in the documents. |
File Size | Larger file size due to embedded information | Smaller file size due to external referencing. |
A drawback of PDF/A file format as mentioned in the table is that the file size of this document is larger since it doesn’t contain any external referencing and everything has to be embedded within a file so that after a long period of time the information in the file can be correctly retrieved as was once embedded.