ByteScout PDF Suite – C# – Convert PDF To XML With Images with PDF Extractor SDK

Home
/
Articles
/
ByteScout PDF Suite – C# – Convert PDF To XML With Images with PDF Extractor SDK

printable version:
ByteScout-PDF-Suite-C-sharp-Convert-PDF-To-XML-With-Images-with-PDF-Extractor-SDK.pdf

How to convert PDF to XML with images with PDF extractor SDK in C# and ByteScout PDF Suite

Learning is essential in computer world and the tutorial below will demonstrate how to convert PDF to XML with images with PDF extractor SDK in C#

The coding instructions are formulated to help you to try-out the features without the requirement to write your own code. Want to convert PDF to XML with images with PDF extractor SDK in your C# app? ByteScout PDF Suite is designed for it. ByteScout PDF Suite is the bundle that provides six different SDK libraries to work with PDF from generating rich PDF reports to extracting data from PDF documents and converting them to HTML. This bundle includes PDF (Generator) SDK, PDF Renderer SDK, PDF Extractor SDK, PDF to HTML SDK, PDF Viewer SDK and PDF Generator SDK for Javascript.

These C# code samples for C# guide developers to speed up coding of the application when using ByteScout PDF Suite. Just copy and paste the code into your C# application’s code and follow the instructions. Further improvement of the code will make it more robust.

ByteScout PDF Suite free trial version is available on our website. C# and other programming languages are supported.

On-demand (REST Web API) version:
Web API (on-demand version)

On-premise offline SDK for Windows:
60 Day Free Trial (on-premise)

Program.cs

      using System;
using Bytescout.PDFExtractor;

namespace PDF2XML
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create Bytescout.PDFExtractor.XMLExtractor instance
            XMLExtractor extractor = new XMLExtractor();
            extractor.RegistrationName = "demo";
            extractor.RegistrationKey = "demo";

            // Load sample PDF document
            extractor.LoadDocumentFromFile("sample1.pdf");

            // Uncomment this line to get rid of empty nodes in XML
            //extractor.PreserveFormattingOnTextExtraction = false;

            // Set output image format
            extractor.ImageFormat = OutputImageFormat.PNG;
            
            // Save images to external files
            extractor.SaveImages = ImageHandling.OuterFile;
            extractor.ImageFolder = "images"; // Folder for external images
            extractor.SaveXMLToFile("result_with_external_images.xml");

            // Embed images into XML as Base64 encoded string
            extractor.SaveImages = ImageHandling.Embed;
            extractor.SaveXMLToFile("result_with_embedded_images.xml");

            // Cleanup
			extractor.Dispose();
        }
    }
}