ByteScout Data Extraction Suite – C# – Find PDF Table And Extract As XML with PDF Extractor SDK

Home
/
Articles
/
ByteScout Data Extraction Suite – C# – Find PDF Table And Extract As XML with PDF Extractor SDK

printable version:
ByteScout-Data-Extraction-Suite-C-sharp-Find-PDF-Table-And-Extract-As-XML-with-PDF-Extractor-SDK.pdf

How to find PDF table and extract as XML with PDF extractor SDK in C# and ByteScout Data Extraction Suite

Learn to find PDF table and extract as XML with PDF extractor SDK in C#

Find PDF table and extract as XML with PDF extractor SDK is simple to apply in C# if you use these source codes below. ByteScout Data Extraction Suite is the set that includes 3 SDK products for data extraction from PDF, scans, images and from spreadsheets: PDF Extractor SDK, Data Extraction SDK, Barcode Reader SDK. It can be applied to find PDF table and extract as XML with PDF extractor SDK using C#.

This prolific sample source code in C# for ByteScout Data Extraction Suite contains various functions and other necessary options you should do calling the API to find PDF table and extract as XML with PDF extractor SDK. Just copy and paste the code into your C# application’s code and follow the instructions. Check C# sample code samples to see if they respond to your needs and requirements for the project.

If you want to try other source code samples then the free trial version of ByteScout Data Extraction Suite is available for download from our website. Just try other source code samples for C#.

On-demand (REST Web API) version:
Web API (on-demand version)

On-premise offline SDK for Windows:
60 Day Free Trial (on-premise)

Program.cs

      using System.Diagnostics;
using Bytescout.PDFExtractor;

namespace FindTableAndExtractAsXml
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create Bytescout.PDFExtractor.XMLExtractor instance
            XMLExtractor xmlExtractor = new XMLExtractor();
            xmlExtractor.RegistrationName = "demo";
            xmlExtractor.RegistrationKey = "demo";

            // Create Bytescout.PDFExtractor.TableDetector instance
            TableDetector tableDetector = new TableDetector();
            tableDetector.RegistrationKey = "demo";
            tableDetector.RegistrationName = "demo";

            // We should define what kind of tables we should detect.
            // So we set min required number of columns to 3 ...
            tableDetector.DetectionMinNumberOfColumns = 3;
            // ... and we set min required number of rows to 3
            tableDetector.DetectionMinNumberOfRows = 3;

            // Load sample PDF document
            xmlExtractor.LoadDocumentFromFile(@".\sample3.pdf");
            tableDetector.LoadDocumentFromFile(@".\sample3.pdf");

            // Get page count
            int pageCount = tableDetector.GetPageCount();

            for (int i = 0; i < pageCount; i++)
            {
                int t = 1;
                // Find first table and continue if found
                if (tableDetector.FindTable(i))
                {
                    do
                    {
                        // Set extraction area for XML extractor to rectangle received from the table detector
                        xmlExtractor.SetExtractionArea(tableDetector.FoundTableLocation);
                        // Export the table to XML file
                        xmlExtractor.SavePageXMLToFile(i, "page-" + i + "-table-" + t + ".xml");
                        t++;
                    } 
                    while (tableDetector.FindNextTable()); // search next table
                }
            }

            // Cleanup
            xmlExtractor.Dispose();
            tableDetector.Dispose();

            // Open first output file in default associated application (for demo purposes)
            ProcessStartInfo processStartInfo = new ProcessStartInfo("page-0-table-1.xml");
            processStartInfo.UseShellExecute = true;
            Process.Start(processStartInfo);
        }
    }
}