ByteScout Sensitive Data Suite – VB.NET – Remove Sensitive Data From Scanned Document

Home
/
Articles
/
ByteScout Sensitive Data Suite – VB.NET – Remove Sensitive Data From Scanned Document

printable version:
ByteScout-Sensitive-Data-Suite-VB-NET-Remove-Sensitive-Data-From-Scanned-Document.pdf

How to remove sensitive data from scanned document in VB.NET with ByteScout Sensitive Data Suite

Learn to code in VB.NET to remove sensitive data from scanned document with this step-by-step tutorial

An easy to understand sample source code to learn how to remove sensitive data from scanned document in VB.NET ByteScout Sensitive Data Suite can remove sensitive data from scanned document. It can be applied from VB.NET. ByteScout Sensitive Data Suite is the bundle that includes multiple components from ByteScout for working with sensitive and personal data. With these components you may analyze, redact, remove, blackout sensitive data in documents and pdf.

The following code snippet for ByteScout Sensitive Data Suite works best when you need to quickly remove sensitive data from scanned document in your VB.NET application. Follow the instructions from scratch to work and copy the VB.NET code. Use of ByteScout Sensitive Data Suite in VB.NET is also described in the documentation included along with the product.

You can download free trial version of ByteScout Sensitive Data Suite from our website with this and other source code samples for VB.NET.

On-demand (REST Web API) version:
Web API (on-demand version)

On-premise offline SDK for Windows:
60 Day Free Trial (on-premise)

Module1.vb

      Imports System.IO
Imports Bytescout.PDFExtractor

Class Program

    Shared Sub Main(ByVal args As String())

        Dim searchablePDFStream As New MemoryStream()

        ' STEP-1 Make Searchable PDF
        ' STEP-2: Get search text result from that searchable PDF
        ' STEP-3: Remove sensitive data

        ' Create Bytescout.PDFExtractor.SearchablePDFMaker instance
        Using searchablePDFMaker As New SearchablePDFMaker("demo", "demo")

            ' Load sample PDF document
            searchablePDFMaker.LoadDocumentFromFile("sampleScannedPDF_EmailAddress.pdf")

            ' Set the location of language data files
            searchablePDFMaker.OCRLanguageDataFolder = "c:\Program Files\Bytescout PDF Extractor SDK\ocrdata\"

            ' Set OCR language
            searchablePDFMaker.OCRLanguage = "eng" ' "eng" For english, "deu" For German, "fra" For French, "spa" For Spanish etc - according To files In "ocrdata" folder

            ' Set PDF document rendering resolution
            searchablePDFMaker.OCRResolution = 300

            ' Save extracted text to file
            searchablePDFMaker.MakePDFSearchable(searchablePDFStream)


            ' Prepare TextExtractor
            Using textExtractor As New TextExtractor("demo", "demo")

                ' Load stream into TextExtractor
                textExtractor.LoadDocumentFromStream(searchablePDFStream)

                ' Search email Addresses
                'See the complete regular expressions reference at https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx
                Dim regexPattern As String = "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b"

                ' Enable RegexSearch
                textExtractor.RegexSearch = True

                ' Set word matching options
                textExtractor.WordMatchingMode = WordMatchingMode.None

                Dim searchResults() As ISearchResult = textExtractor.FindAll(0, regexPattern, caseSensitive:=False)

                ' Create Bytescout.PDFExtractor.Remover instance
                Using remover As New Remover2("demo", "demo")

                    ' Load sample PDF document
                    remover.LoadDocumentFromStream(searchablePDFStream)

                    ' Mask removed text
                    remover.MaskRemovedText = True

                    ' Make output file unsearchable
                    remover.MakePDFUnsearchable = True

                    ' Provide text to remove
                    remover.AddTextToRemove(searchResults)

                    ' Remove text objects find by SearchResults.
                    remover.PerformRemoval("result1.pdf")

                End Using

            End Using

        End Using

        Console.WriteLine()
        Console.WriteLine("Press any key to continue and open result PDF files in default PDF viewer...")
        Console.ReadKey()

        Process.Start("result1.pdf")

    End Sub

End Class