ByteScout PDF Extractor SDK – VB.NET – Find Table in PDF And Extract As XML

Home
/
Articles
/
ByteScout PDF Extractor SDK – VB.NET – Find Table in PDF And Extract As XML

printable version:
ByteScout-PDF-Extractor-SDK-VB-NET-Find-Table-in-PDF-And-Extract-As-XML.pdf

How to find table in PDF and extract as XML in VB.NET using ByteScout PDF Extractor SDK

The tutorial shows how to find table in PDF and extract as XML in VB.NET

On this page you will learn from code samples for programming in VB.NET.Writing of the code to find table in PDF and extract as XML in VB.NET can be done by developers of any level using ByteScout PDF Extractor SDK. ByteScout PDF Extractor SDK is the Software Development Kit (SDK) that is designed to help developers with data extraction from unstructured documents like pdf, tiff, scans, images, scanned and electronic forms. The library is powered by OCR, computer vision and AI to provide unique functionality like table detection, automatic table structure extraction, data restoration, data restructuring and reconstruction. Supports PDF, TIFF, PNG, JPG images as input and can output CSV, XML, JSON formatted data. Includes full set of utilities like pdf splitter, pdf merger, searchable pdf maker. It can find table in PDF and extract as XML in VB.NET.

Fast application programming interfaces of ByteScout PDF Extractor SDK for VB.NET plus the instruction and the code below will help you quickly learn how to find table in PDF and extract as XML. In order to implement the functionality, you should copy and paste this code for VB.NET below into your code editor with your app, compile and run your application. Detailed tutorials and documentation are available along with installed ByteScout PDF Extractor SDK if you’d like to dive deeper into the topic and the details of the API.

ByteScout PDF Extractor SDK free trial version is available on our website. VB.NET and other programming languages are supported.

On-demand (REST Web API) version:
Web API (on-demand version)

On-premise offline SDK for Windows:
60 Day Free Trial (on-premise)

Program.vb

      Imports Bytescout.PDFExtractor

Class Program
	Friend Shared Sub Main(args As String())

        ' Create Bytescout.PDFExtractor.XMLExtractor instance
        Dim xmlExtractor As New XMLExtractor()
        xmlExtractor.RegistrationName = "demo"
        xmlExtractor.RegistrationKey = "demo"

        ' Create Bytescout.PDFExtractor.TableDetector instance
        Dim tableDetector As New TableDetector()
        tableDetector.RegistrationName = "demo"
        tableDetector.RegistrationKey = "demo"

        ' We should define what kind of tables we should detect.
        ' So we set min required number of columns to 3 ...
        tableDetector.DetectionMinNumberOfColumns = 3
        ' ... and we set min required number of rows to 3
        tableDetector.DetectionMinNumberOfRows = 3

		' Load sample PDF document
        xmlExtractor.LoadDocumentFromFile(".\sample3.pdf")
        tableDetector.LoadDocumentFromFile(".\sample3.pdf")

		' Get page count
        Dim pageCount As Integer = tableDetector.GetPageCount()

		For i As Integer = 0 To pageCount - 1
            Dim t As Integer = 1
            ' Find first table and continue if found
            If (tableDetector.FindTable(i)) Then
                Do
                    ' Set extraction area for XML extractor to rectangle received from the table detector
                    xmlExtractor.SetExtractionArea(tableDetector.FoundTableLocation)
                    ' Export the table to XML file
                    xmlExtractor.SavePageXMLToFile(i, "page-" + i.ToString() + "-table-" + t.ToString() + ".xml")
                    t = t + 1
                Loop While tableDetector.FindNextTable()
            End If
        Next

        ' Cleanup
		xmlExtractor.Dispose()
		tableDetector.Dispose()

        ' Open first output file in default associated application (for demo purposes)
        System.Diagnostics.Process.Start("page-0-table-1.xml")

	End Sub
End Class