Converting PDF to CSV in VB.NET after finding tables in the document is made easier with PDF Extractor SDK. This SDK is designed to streamline the process of doing many tasks involved in software development, thus making the job of developers easier. It includes extracting data from PDF to CSV, XLS, XLSX.
PDF Extractor SDK is just one of several Software Development Kits (SDKs) developed by ByteScout. ByteScout SDKs can also be also used to generate barcodes and decode them from images, PDFs, and scanned documents. It is built with an OCR and image recognition technology.
We prepared a sample code for you to test PDF Extractor SDK functionality, particularly in implementing conversion of PDF to CSV in VB.NET. Just copy and paste this code to your application and run it. You will appreciate the time and energy saved in writing and testing code. This VB.NET code snippet is also obtainable from our GitHub.
We also have a FREE ByteScout trial version that you can download from our website. Besides getting nifty source code samples, you can also find some helpful programming tutorials attached to it.
Imports Bytescout.PDFExtractor Class Program Friend Shared Sub Main(args As String()) ' Create Bytescout.PDFExtractor.CSVExtractor instance Dim csvExtractor As New CSVExtractor() csvExtractor.RegistrationName = "demo" csvExtractor.RegistrationKey = "demo" ' Create Bytescout.PDFExtractor.TableDetector instance Dim tableDetector As New TableDetector() tableDetector.RegistrationName = "demo" tableDetector.RegistrationKey = "demo" ' We should define what kind of tables we should detect. ' So we set min required number of columns to 3 ... tableDetector.DetectionMinNumberOfColumns = 3 ' ... and we set min required number of rows to 3 tableDetector.DetectionMinNumberOfRows = 3 ' Set table detection mode to "bordered tables" - best for tables with closed solid borders. tableDetector.ColumnDetectionMode = ColumnDetectionMode.BorderedTables ' Load sample PDF document csvExtractor.LoadDocumentFromFile(".\sample3.pdf") tableDetector.LoadDocumentFromFile(".\sample3.pdf") ' Get page count Dim pageCount As Integer = tableDetector.GetPageCount() ' Iterate through pages For i As Integer = 0 To pageCount - 1 Dim t As Integer = 1 ' Find first table and continue if found If (tableDetector.FindTable(i)) Then Do ' Set extraction area for CSV extractor to rectangle received from the table detector csvExtractor.SetExtractionArea(tableDetector.FoundTableLocation) ' Export the table to CSV file csvExtractor.SavePageCSVToFile(i, "page-" + i.ToString() + "-table-" + t.ToString() + ".csv") t = t + 1 Loop While tableDetector.FindNextTable() End If Next ' Cleanup csvExtractor.Dispose() tableDetector.Dispose() ' Open first output file in default associated application (for demo purposes) System.Diagnostics.Process.Start("page-0-table-1.csv") End Sub End Class