How to Extract Document Info and Metadata using PDF Extractor SDK in C# - ByteScout
Announcement
Our ByteScout SDK products are sunsetting as we focus on expanding new solutions.
Learn More Open modal
Close modal
Announcement Important Update
ByteScout SDK Sunsetting Notice
Our ByteScout SDK products are sunsetting as we focus on our new & improved solutions. Thank you for being part of our journey, and we look forward to supporting you in this next chapter!
  • Home
  • /
  • Blog
  • /
  • How to Extract Document Info and Metadata using PDF Extractor SDK in C#

How to Extract Document Info and Metadata using PDF Extractor SDK in C#

PDF Extractor SDK makes possible not only extraction of actual document data such as text and images but also retrieval of basic and detailed information about the PDF document including its metadata.

InfoExtractor class implements a rich interface IInfoExtractor. Here are listed all properties and methods of this interface on the online documentation page: https://docs.bytescout.com/pdf-extractor-sdk-t-bytescout-pdfextractor-iinfoextractor

Initialization and document loading is using are not different from other SDK extractors:

InfoExtractor extractor = new InfoExtractor();
extractor.RegistrationName = "demo";
extractor.RegistrationKey = "demo";

Basic document properties, for example, the title, author, subject, keywords and bookmarks can be read by using corresponding extractor properties:

Console.WriteLine("Author:       " + extractor.Author);
Console.WriteLine("Subject:      " + extractor.Subject);
Console.WriteLine("Title:        " + extractor.Title);
Console.WriteLine("Keywords:     " + extractor.Keywords);
Console.WriteLine("Bookmarks:    " + extractor.Bookmarks);

To see whether the document is encrypted or not Encrypted property must be used and EncryptionAlgorithm defines an algorithm with which the PDF was encrypted:

Console.WriteLine("Encrypted:" + extractor.Encrypted);
Console.Write($"EncryptionAlgorithm: {extractor.EncryptionAlgorithm}”)

To check whether the document contains user-defined properties that are not standard property names there is InfoExtractor.CustomProperies dictionary of user-added properties in the PDF.

SDK also allows reading metadata streams in XMP format, which is metadata in XML-based format embedded in PDF. InfoExtractor.GetMetadata() method returns a metadata stream available in the document in XMP format.

Extracts Document Info and Metadata using PDF Extractor SDK in C#

   

About the Author

ByteScout Team ByteScout Team of Writers ByteScout has a team of professional writers proficient in different technical topics. We select the best writers to cover interesting and trending topics for our readers. We love developers and we hope our articles help you learn about programming and programmers.  
prev
next