ByteScout PDF Suite – VB.NET – Find email addresses in pdf using regex with pdf extractor sdk

Home
/
Articles
/
ByteScout PDF Suite – VB.NET – Find email addresses in pdf using regex with pdf extractor sdk

printable version:
ByteScout-PDF-Suite-VB-NET-Find-email-addresses-in-pdf-using-regex-with-pdf-extractor-sdk.pdf

How to find email addresses in pdf using regex with pdf extractor sdk in VB.NET and ByteScout PDF Suite

How to write a robust code in VB.NET to find email addresses in pdf using regex with pdf extractor sdk with this step-by-step tutorial

These source code samples are assembled by their programming language and functions they apply. What is ByteScout PDF Suite? It is the set that includes 6 SDK products to work with PDF from generating rich PDF reports to extracting data from PDF documents and converting them to HTML. This bundle includes PDF (Generator) SDK, PDF Renderer SDK, PDF Extractor SDK, PDF to HTML SDK, PDF Viewer SDK and PDF Generator SDK for Javascript. It can help you to find email addresses in pdf using regex with pdf extractor sdk in your VB.NET application.

This prolific sample source code in VB.NET for ByteScout PDF Suite contains various functions and other necessary options you should do calling the API to find email addresses in pdf using regex with pdf extractor sdk. Just copy and paste the code into your VB.NET application’s code and follow the instructions. Further improvement of the code will make it more robust.

If you want to try other source code samples then the free trial version of ByteScout PDF Suite is available for download from our website. Just try other source code samples for VB.NET.

On-demand (REST Web API) version:
Web API (on-demand version)

On-premise offline SDK for Windows:
60 Day Free Trial (on-premise)

Program.vb

      Imports Bytescout.PDFExtractor

Module Program

    Sub Main()

        Try
            ' Create Bytescout.PDFExtractor.TextExtractor instance
            Using extractor As TextExtractor = New TextExtractor()
                extractor.RegistrationName = "demo"
                extractor.RegistrationKey = "demo"

                ' Load sample PDF document
                extractor.LoadDocumentFromFile("samplePDF_EmailAddress.pdf")

                ' Enable the regular expression 
                extractor.RegexSearch = True

                Dim pageCount As Integer = extractor.GetPageCount()

                ' Search through pages
                For i As Integer = 0 To pageCount - 1
                    ' Search Email addresses
                    Dim regexPattern As String = "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b"
                    ' See the complete regular expressions reference at https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx

                    ' Search each page for the pattern
                    If extractor.Find(i, regexPattern, False) Then

                        Do
                            ' Iterate through each element in the found text
                            For Each element As ISearchResultElement In extractor.FoundText.Elements
                                Console.WriteLine("Found Email Addresses: " & element.Text)
                            Next
                        Loop While extractor.FindNext()

                    End If
                Next
            End Using

        Catch ex As Exception
            Console.WriteLine("Error: " & ex.Message)
        End Try

        Console.WriteLine()
        Console.WriteLine("Press enter key to continue...")
        Console.ReadLine()

    End Sub

End Module