ByteScout PDF Extractor SDK - VBScript - Find PDF Table And Extract As XML - ByteScout

ByteScout PDF Extractor SDK – VBScript – Find PDF Table And Extract As XML

  • Home
  • /
  • Articles
  • /
  • ByteScout PDF Extractor SDK – VBScript – Find PDF Table And Extract As XML

How to find PDF table and extract as XML in VBScript and ByteScout PDF Extractor SDK

Tutorial on how to find PDF table and extract as XML in VBScript

Every ByteScout tool contains example VBScript source codes that you can find here or in the folder with installed ByteScout product. ByteScout PDF Extractor SDK is the Software Development Kit (SDK) that is designed to help developers with data extraction from unstructured documents like pdf, tiff, scans, images, scanned and electronic forms. The library is powered by OCR, computer vision and AI to provide unique functionality like table detection, automatic table structure extraction, data restoration, data restructuring and reconstruction. Supports PDF, TIFF, PNG, JPG images as input and can output CSV, XML, JSON formatted data. Includes full set of utilities like pdf splitter, pdf merger, searchable pdf maker. It can find PDF table and extract as XML in VBScript.

The SDK samples like this one below explain how to quickly make your application do find PDF table and extract as XML in VBScript with the help of ByteScout PDF Extractor SDK. Just copy and paste the code into your VBScript application’s code and follow the instruction. Test VBScript sample code examples whether they respond your needs and requirements for the project.

Free trial version of ByteScout PDF Extractor SDK is available on our website. Documentation and source code samples are included.

Try it today: Get 60 Day Free Trial or sign up for Web API

FindTableAndExtractAsXML.vbs
      
' Create Bytescout.PDFExtractor.TextExtractor object Set tableDetector= CreateObject("Bytescout.PDFExtractor.TableDetector") tableDetector.RegistrationName = "demo" tableDetector.RegistrationKey = "demo" ' Create Bytescout.PDFExtractor.xmlExtractor object Set xmlExtractor = CreateObject("Bytescout.PDFExtractor.XMLExtractor") xmlExtractor.RegistrationName = "demo" xmlExtractor.RegistrationKey = "demo" ' We should define what kind of tables we should detect. ' So we set min required number of columns to 3 ... tableDetector.DetectionMinNumberOfColumns = 3 ' ... and we set min required number of rows to 3 tableDetector.DetectionMinNumberOfRows = 3 ' Load sample PDF document tableDetector.LoadDocumentFromFile("..\..\sample3.pdf") xmlExtractor.LoadDocumentFromFile "..\..\sample3.pdf" ' Get page count pageCount = tableDetector.GetPageCount() ' Iterate through pages For i = 0 to pageCount - 1 t = 0 ' Find first table and continue if found If (tableDetector.FindTable(i)) Then Do ' Set extraction area for CSV extractor to rectangle received from the table detector xmlExtractor.SetExtractionArea _ tableDetector.GetFoundTableRectangle_Left(), _ tableDetector.GetFoundTableRectangle_Top(), _ tableDetector.GetFoundTableRectangle_Width(), _ tableDetector.GetFoundTableRectangle_Height() ' Export the table to CSV file xmlExtractor.SavePageXMLToFile i, "page-" & CStr(i) & "-table-" & CStr(t) & ".xml" t = t + 1 Loop While tableDetector.FindNextTable() End If Next Set xmlExtractor = Nothing Set tableDetector = Nothing

Try it today: Get 60 Day Free Trial or sign up for Web API

MORE INFORMATION

Get 60 Day Free Trial or Visit ByteScout PDF Extractor SDK page

Explore ByteScout PDF Extractor SDK documentation

WEB API VERSION

Sign Up for free Web API key

Explore Web API Documentation

Tutorials:

prev
next