ByteScout Data Extraction Suite – VBScript – Find text in pdf using regex with pdf extractor sdk

Home
/
Articles
/
ByteScout Data Extraction Suite – VBScript – Find text in pdf using regex with pdf extractor sdk

printable version:
ByteScout-Data-Extraction-Suite-VBScript-Find-text-in-pdf-using-regex-with-pdf-extractor-sdk.pdf

How to find text in pdf using regex with pdf extractor sdk in VBScript and ByteScout Data Extraction Suite

Learning is essential in computer world and the tutorial below will demonstrate how to find text in pdf using regex with pdf extractor sdk in VBScript

The sample shows instructions and algorithm of how to find text in pdf using regex with pdf extractor sdk and how to make it run in your VBScript application. ByteScout Data Extraction Suite is the set that includes 3 SDK products for data extraction from PDF, scans, images and from spreadsheets: PDF Extractor SDK, Data Extraction SDK, Barcode Reader SDK and you can use it to find text in pdf using regex with pdf extractor sdk with VBScript.

This prolific sample source code in VBScript for ByteScout Data Extraction Suite contains various functions and other necessary options you should do calling the API to find text in pdf using regex with pdf extractor sdk. Follow the instructions from scratch to work and copy the VBScript code. Enjoy writing a code with ready-to-use sample VBScript codes.

You can download free trial version of ByteScout Data Extraction Suite from our website with this and other source code samples for VBScript.

On-demand (REST Web API) version:
Web API (on-demand version)

On-premise offline SDK for Windows:
60 Day Free Trial (on-premise)

FindTextUsingRegex.vbs

      ' Create Bytescout.PDFExtractor.TextExtractor object
Set extractor = CreateObject("Bytescout.PDFExtractor.TextExtractor")
extractor.RegistrationName = "demo"
extractor.RegistrationKey = "demo"

' Load sample PDF document
extractor.LoadDocumentFromFile("..\..\Invoice.pdf")

extractor.RegexSearch = True ' Turn on the regex search
pattern = "[0-9]{2}/[0-9]{2}/[0-9]{4}" ' Search dates in format 'mm/dd/yyyy'

' Get page count
pageCount = extractor.GetPageCount()

For i = 0 to PageCount - 1 
    If extractor.Find(i, pattern, false) Then ' Parameters are: page index, string to find, case sensitivity
        Do
            extractedString = extractor.FoundText.Text
            MsgBox "Found match on page #" & CStr(i) & ": " & extractedString
            extractor.ResetExtractionArea()
        Loop While extractor.FindNext
    End If
Next

MsgBox "Done"

Set extractor = Nothing