ByteScout PDF Extractor SDK - C# - Find Text in PDF using Regex - ByteScout

ByteScout PDF Extractor SDK – C# – Find Text in PDF using Regex

  • Home
  • /
  • Articles
  • /
  • ByteScout PDF Extractor SDK – C# – Find Text in PDF using Regex

How to find text in PDF using regex in C# using ByteScout PDF Extractor SDK

Write code in C# to find text in PDF using regex with this step-by-step tutorial

These source code samples are listed and grouped by their programming language and functions they use. ByteScout PDF Extractor SDK is the SDK that helps developers to extract data from unstructured documents, pdf, images, scanned and electronic forms. Includes AI functions like automatic table detection, automatic table extraction and restructuring, text recognition and text restoration from pdf and scanned documents. Includes PDF to CSV, PDF to XML, PDF to JSON, PDF to searchable PDF functions as well as methods for low level data extraction. It can find text in PDF using regex in C#.

You will save a lot of time on writing and testing code as you may just take the C# code from ByteScout PDF Extractor SDK for find text in PDF using regex below and use it in your application. In order to implement the functionality, you should copy and paste this code for C# below into your code editor with your app, compile and run your application. Further enhancement of the code will make it more vigorous.

Download free trial version of ByteScout PDF Extractor SDK from our website with this and other source code samples for C#.

Try ByteScout PDF Extractor SDK today: Get 60 Day Free Trial or sign up for Web API

Program.cs
      
using System; using Bytescout.PDFExtractor; namespace FindText { class Program { static void Main(string[] args) { // Create Bytescout.PDFExtractor.TextExtractor instance TextExtractor extractor = new TextExtractor(); extractor.RegistrationName = "demo"; extractor.RegistrationKey = "demo"; // Load sample PDF document extractor.LoadDocumentFromFile(@".\Invoice.pdf"); extractor.RegexSearch = true; // Enable the regular expressions int pageCount = extractor.GetPageCount(); // Search through pages for (int i = 0; i < pageCount; i++) { // Search dates in format 12/31/1999 string regexPattern = "[0-9]{2}/[0-9]{2}/[0-9]{4}"; // See the complete regular expressions reference at https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx // Search each page for the pattern if (extractor.Find(i, regexPattern, false)) { do { Console.WriteLine(""); Console.WriteLine("Found on page " + i + " at location " + extractor.FoundText.Bounds); Console.WriteLine(""); // Iterate through each element in the found text foreach (ISearchResultElement element in extractor.FoundText.Elements) { Console.WriteLine(" Text: " + element.Text); Console.WriteLine(" Font is bold: " + element.FontIsBold); Console.WriteLine(" Font is italic: " + element.FontIsItalic); Console.WriteLine(" Font name: " + element.FontName); Console.WriteLine(" Font size: " + element.FontSize); Console.WriteLine(" Font color: " + element.FontColor); Console.WriteLine(); } } while (extractor.FindNext()); } } // Cleanup extractor.Dispose(); Console.WriteLine(); Console.WriteLine("Press any key to continue..."); Console.ReadLine(); } } }

Try ByteScout PDF Extractor SDK today: 60 Day Free Trial (on-premise version) or sign up for Web API (on demand version)

VIDEO

ON-PREMISE VERSION

Get 60 Day Free Trial or Visit ByteScout PDF Extractor SDK page

Explore ByteScout PDF Extractor SDK documentation

WEB API

Sign Up for free Web API key

Explore Web API Documentation

Tutorials:

prev
next