PDF Extractor SDK Explained: Advanced Text Search Using Regular Expressions - ByteScout

PDF Extractor SDK Explained: Advanced Text Search Using Regular Expressions

  • Home
  • /
  • Articles
  • /
  • PDF Extractor SDK Explained: Advanced Text Search Using Regular Expressions

In this program, we’re going to see how we can find a text by using the Regex. I’m having one PDF file and it contains some phone numbers. Basically, we are going to have one Regex for the phone numbers and we will see whether we are able to extract all the phone numbers or not. Now copy and paste the file into the Solution Explorer Window. At first, we create the instance of the text extractor. Then we’re going to load a document and enable the Regex search. We Iterate through all the pages until we find out the text. Once we find the text, we will be displayed. It’s very simple and let’s get started.

START YOUR FREE TRIAL HERE

How to Use Regular Expressions for Text Search

I’m going to have one instance of the text extractor. We’re passing the registration key and name here. Let’s load the document which is like, extractor.LoadDocumentFromFile(“sample_program3.pdf”). Now enable Regex, extractor.RegexSearch = true; Let’s Iterate to all the pages for (int pageIndex = 0, pageIndex < extractor.GetPageCount(); pageIndex++) and implement it. Advanced Search RegEx

Let’s see if we are able to find it, if(extractor. Find(pageIndex, regexPattern,false)) and do it until we find everything. Let us display everything we found here, for each we’re going to have the search result element which is like foreach (SearchResultElement elm in extractor.FoundText.Elements).

Advanced Search Regular Expressions

Now display console.WriteLine(elm.Text). Let’s see what is output. We are able to get all the phone numbers by using Regex inside the PDF.

Text Search using Reg Ex with PDF Extractor SDK

Tutorials:

prev
next