ByteScout Sensitive Data Suite – VB.NET – Remove Sensitive Text Using Regex Expressions

Home
/
Articles
/
ByteScout Sensitive Data Suite – VB.NET – Remove Sensitive Text Using Regex Expressions

printable version:
ByteScout-Sensitive-Data-Suite-VB-NET-Remove-Sensitive-Text-Using-Regex-Expressions.pdf

How to remove sensitive text using regex expressions in VB.NET and ByteScout Sensitive Data Suite

This code in VB.NET shows how to remove sensitive text using regex expressions with this how to tutorial

The sample shows instructions and algorithm of how to remove sensitive text using regex expressions and how to make it run in your VB.NET application. ByteScout Sensitive Data Suite: the set that includes SDK tools from ByteScout for working with sensitive and personal data. With these tools you may analyze, redact, remove, blackout sensitive data in documents and pdf. It can remove sensitive text using regex expressions in VB.NET.

Want to quickly learn? This fast application programming interfaces of ByteScout Sensitive Data Suite for VB.NET plus the guidelines and the code below will help you quickly learn how to remove sensitive text using regex expressions. Just copy and paste the code into your VB.NET application’s code and follow the instructions. Applying VB.NET application mostly includes various stages of the software development so even if the functionality works please test it with your data and the production environment.

Trial version of ByteScout Sensitive Data Suite is available for free. Source code samples are included to help you with your VB.NET app.

On-demand (REST Web API) version:
Web API (on-demand version)

On-premise offline SDK for Windows:
60 Day Free Trial (on-premise)

Module1.vb

      Imports System.Drawing
Imports Bytescout.PDFExtractor

Class Program

    Shared Sub Main(ByVal args As String())

        ' Create Bytescout.PDFExtractor.Remover instance
        Dim remover As New Remover("demo", "demo")

        ' Load sample PDF document
        remover.LoadDocumentFromFile("samplePDF_EmailAddress.pdf")

        ' Prepare TextExtractor
        Using textExtractor As New TextExtractor("demo", "demo")

            ' Load document into TextExtractor
            textExtractor.LoadDocumentFromFile("samplePDF_EmailAddress.pdf")

            ' Search email Addresses
            'See the complete regular expressions reference at https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx
            Dim regexPattern As String = "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b"

            ' Enable RegexSearch
            textExtractor.RegexSearch = True

            ' Set word matching options
            textExtractor.WordMatchingMode = WordMatchingMode.None

            Dim searchResults() As ISearchResult = textExtractor.FindAll(0, regexPattern, caseSensitive:=False)

            ' Remove text objects find by SearchResults.
            ' NOTE: The removed text might be larger than the specified rectangle. Currently the Remover Is unable 
            ' to split PDF text objects.
            remover.RemoveText(searchResults, "result1.pdf")

        End Using

        ' Clean up.
        remover.Dispose()

        Console.WriteLine()
        Console.WriteLine("Press any key to continue and open result PDF files in default PDF viewer...")
        Console.ReadKey()

        Process.Start("result1.pdf")

    End Sub

End Class