ByteScout PDF Suite – VB.NET – Find text in pdf with regex with pdf extractor sdk

Home
/
Articles
/
ByteScout PDF Suite – VB.NET – Find text in pdf with regex with pdf extractor sdk

printable version:
ByteScout-PDF-Suite-VB-NET-Find-text-in-pdf-with-regex-with-pdf-extractor-sdk.pdf

How to find text in pdf with regex with pdf extractor sdk in VB.NET and ByteScout PDF Suite

Learn to find text in pdf with regex with pdf extractor sdk in VB.NET

The sample shows instructions and algorithm of how to find text in pdf with regex with pdf extractor sdk and how to make it run in your VB.NET application. ByteScout PDF Suite: the bundle that provides six different SDK libraries to work with PDF from generating rich PDF reports to extracting data from PDF documents and converting them to HTML. This bundle includes PDF (Generator) SDK, PDF Renderer SDK, PDF Extractor SDK, PDF to HTML SDK, PDF Viewer SDK and PDF Generator SDK for Javascript. It can find text in pdf with regex with pdf extractor sdk in VB.NET.

These VB.NET code samples for VB.NET guide developers to speed up coding of the application when using ByteScout PDF Suite. Follow the instructions from scratch to work and copy the VB.NET code. Use of ByteScout PDF Suite in VB.NET is also described in the documentation included along with the product.

ByteScout provides the free trial version of ByteScout PDF Suite along with the documentation and source code samples.

On-demand (REST Web API) version:
Web API (on-demand version)

On-premise offline SDK for Windows:
60 Day Free Trial (on-premise)

Program.vb

      Imports System.Drawing
Imports Bytescout.PDFExtractor

Class Program
    Friend Shared Sub Main(args As String())

        ' Create Bytescout.PDFExtractor.TextExtractor instance
        Dim extractor As New TextExtractor()
        extractor.RegistrationName = "demo"
        extractor.RegistrationKey = "demo"

        ' Load sample PDF document
        extractor.LoadDocumentFromFile(".\Invoice.pdf")

        extractor.RegexSearch = True ' Enable the regular expressions

        Dim pageCount As Integer = extractor.GetPageCount()

        ' Search through pages
        For i As Integer = 0 To pageCount - 1

            ' Search dates in format 12/31/1999
            Dim regexPattern As String = "[0-9]{2}/[0-9]{2}/[0-9]{4}"
            ' See the complete regular expressions reference at https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx

            ' Search each page for the pattern
            If extractor.Find(i, regexPattern, False) Then
                Do
                    Console.WriteLine("")
                    Console.WriteLine(("Found on page " & i & " at location ") + extractor.FoundText.Bounds.ToString())
                    Console.WriteLine("")

                    ' Iterate through each element in the found text
                    For Each element As ISearchResultElement In extractor.FoundText.Elements
                        Console.WriteLine("   Text: " + element.Text)
                        Console.WriteLine("   Font is bold: " + element.FontIsBold.ToString())
                        Console.WriteLine("   Font is italic:" + element.FontIsItalic.ToString())
                        Console.WriteLine("   Font name: " + element.FontName)
                        Console.WriteLine("   Font size:" + element.FontSize.ToString())
                        Console.WriteLine("   Font color:" + element.FontColor.ToString())
                        Console.WriteLine()
                    Next

                Loop While extractor.FindNext()

            End If
        Next

        ' Cleanup
		extractor.Dispose()

        Console.WriteLine()
        Console.WriteLine("Press any key to continue...")
        Console.ReadLine()
    End Sub

End Class