On this page you will learn from code samples for programming in VB.NET.Writing of the code to find table in PDF and extract as XML in VB.NET can be done by developers of any level using ByteScout PDF Extractor SDK. ByteScout PDF Extractor SDK is the Software Development Kit (SDK) that is designed to help developers with data extraction from unstructured documents like pdf, tiff, scans, images, scanned and electronic forms. The library is powered by OCR, computer vision and AI to provide unique functionality like table detection, automatic table structure extraction, data restoration, data restructuring and reconstruction. Supports PDF, TIFF, PNG, JPG images as input and can output CSV, XML, JSON formatted data. Includes full set of utilities like pdf splitter, pdf merger, searchable pdf maker. It can find table in PDF and extract as XML in VB.NET.
Fast application programming interfaces of ByteScout PDF Extractor SDK for VB.NET plus the instruction and the code below will help you quickly learn how to find table in PDF and extract as XML. In order to implement the functionality, you should copy and paste this code for VB.NET below into your code editor with your app, compile and run your application. Detailed tutorials and documentation are available along with installed ByteScout PDF Extractor SDK if you’d like to dive deeper into the topic and the details of the API.
ByteScout PDF Extractor SDK free trial version is available on our website. VB.NET and other programming languages are supported.
Imports Bytescout.PDFExtractor Class Program Friend Shared Sub Main(args As String()) ' Create Bytescout.PDFExtractor.XMLExtractor instance Dim xmlExtractor As New XMLExtractor() xmlExtractor.RegistrationName = "demo" xmlExtractor.RegistrationKey = "demo" ' Create Bytescout.PDFExtractor.TableDetector instance Dim tableDetector As New TableDetector() tableDetector.RegistrationName = "demo" tableDetector.RegistrationKey = "demo" ' We should define what kind of tables we should detect. ' So we set min required number of columns to 3 ... tableDetector.DetectionMinNumberOfColumns = 3 ' ... and we set min required number of rows to 3 tableDetector.DetectionMinNumberOfRows = 3 ' Load sample PDF document xmlExtractor.LoadDocumentFromFile(".\sample3.pdf") tableDetector.LoadDocumentFromFile(".\sample3.pdf") ' Get page count Dim pageCount As Integer = tableDetector.GetPageCount() For i As Integer = 0 To pageCount - 1 Dim t As Integer = 1 ' Find first table and continue if found If (tableDetector.FindTable(i)) Then Do ' Set extraction area for XML extractor to rectangle received from the table detector xmlExtractor.SetExtractionArea(tableDetector.FoundTableLocation) ' Export the table to XML file xmlExtractor.SavePageXMLToFile(i, "page-" + i.ToString() + "-table-" + t.ToString() + ".xml") t = t + 1 Loop While tableDetector.FindNextTable() End If Next ' Cleanup xmlExtractor.Dispose() tableDetector.Dispose() ' Open first output file in default associated application (for demo purposes) System.Diagnostics.Process.Start("page-0-table-1.xml") End Sub End Class