How to find text in PDF file and get coordinates in ASP.NET, C#, VB.NET, VBScript using PDF Extractor SDK - ByteScout

How to find text in PDF file and get coordinates in ASP.NET, C#, VB.NET, VBScript using PDF Extractor SDK

  • Home
  • /
  • Articles
  • /
  • How to find text in PDF file and get coordinates in ASP.NET, C#, VB.NET, VBScript using PDF Extractor SDK

These sample source codes can be used to find text in PDF files and get coordinates using Bytescout PDF Extractor SDK.

We’ve provided source code snippet below. Select your programming language:

Select your programming language:

Let’s see the code and we’ll analyze it later in this article.

ASP.NET

using System;
using System.Drawing;
using Bytescout.PDFExtractor;

namespace FindText
{
	public partial class _Default : System.Web.UI.Page
	{
		protected void Page_Load(object sender, EventArgs e)
		{
			// This test file will be copied to the project directory on the pre-build event (see the project properties).
			String inputFile = Server.MapPath("sample1.pdf");

			// Create Bytescout.PDFExtractor.TextExtractor instance
			TextExtractor extractor = new TextExtractor();
			extractor.RegistrationName = "demo";
			extractor.RegistrationKey = "demo";
			
			// Load sample PDF document
			extractor.LoadDocumentFromFile(inputFile);

			Response.Clear();
			Response.ContentType = "text/html";

			Rectangle location;
			int pageIndex;

			Response.Write("Searching for "ipsum" string:

");
			
			// Search for "ipsum" string
			if (extractor.Find("ipsum", out pageIndex, out location))
			{
				do
				{
					Response.Write("Found on page " + pageIndex + " at location " + location.ToString() + "
");

				} while (extractor.FindNext(out pageIndex, out location));
			}

			Response.End();
		}
	}
}

C#

using System;
using System.Drawing;
using Bytescout.PDFExtractor;

namespace FindText
{
	class Program
	{
		static void Main(string[] args)
		{
			// Create Bytescout.PDFExtractor.TextExtractor instance
			TextExtractor extractor = new TextExtractor();
			extractor.RegistrationName = "demo";
			extractor.RegistrationKey = "demo";

			// Load sample PDF document
			extractor.LoadDocumentFromFile("sample1.pdf");
			
			int pageCount = extractor.GetPageCount();
			RectangleF location;

			for (int i = 0; i < pageCount; i++)
			{
				// Search each page for "ipsum" string
				if (extractor.Find(i, "ipsum", false, out location))
				{
					do
					{
						Console.WriteLine("Found on page " + i + " at location " + location.ToString());

					}
					while (extractor.FindNext(out location));
				}
			}
			
			Console.WriteLine();
			Console.WriteLine("Press any key to continue...");
			Console.ReadLine();
		}
	}
}

VB.NET

Imports System.Drawing
Imports Bytescout.PDFExtractor

Class Program
	Friend Shared Sub Main(args As String())
		' Create Bytescout.PDFExtractor.TextExtractor instance
		Dim extractor As New TextExtractor()
		extractor.RegistrationName = "demo"
		extractor.RegistrationKey = "demo"

		' Load sample PDF document
		extractor.LoadDocumentFromFile("sample1.pdf")

		Dim location As Rectangle
		Dim pageIndex As Integer

		' Search for "ipsum" string
		If extractor.Find("ipsum", pageIndex, location) Then
			Do
                Console.WriteLine("Found on page " & pageIndex & " at location " & location.ToString())
            Loop While extractor.FindNext(pageIndex, location)
		End If

		Console.WriteLine()
		Console.WriteLine("Press any key to continue...")
		Console.ReadLine()
	End Sub
End Class

VBScript

' Create Bytescout.PDFExtractor.TextExtractor object
Set extractor = CreateObject("Bytescout.PDFExtractor.TextExtractor")
extractor.RegistrationName = "demo"
extractor.RegistrationKey = "demo"

' Load sample PDF document
extractor.LoadDocumentFromFile("....sample1.pdf")

' Get page count

pageCount = extractor.GetPageCount()

For i=0 to PageCount-1 
 
 If extractor.Find(i, "ipsum", false) Then ' parameters are: page index, string to find, case sensitivity
 	Do
 		MsgBox "Found word 'ipsum' on page #" & CStr(i) & " at left=" & CStr(extractor.GetFoundTextRectangle_Left) & "; top=" & CStr(extractor.GetFoundTextRectangle_Top) & "; width=" & CStr(extractor.GetFoundTextRectangle_Width) & "; height=" & CStr(extractor.GetFoundTextRectangle_Height)
  	Loop While extractor.FindNext
 End If

Next

MsgBox "Done"

Set extractor = Nothing

All the code snippet achieve same functionality, let’s review C# code snippet here.

We’re using Bytescout.PDFExtractor library here. If you want to code along, then you need to install Bytescout SDK in your machine. Bytescout SDK are available at this link.

First of all we’re creating instance of “TextExtractor” class and passing registration key and name to it. We’re passing “demo” key and name here which has it’s limitations but for this demo it’s okay. If you are using in production, then this needs to be replaced with actual registration key and name.

// Create Bytescout.PDFExtractor.TextExtractor instance
TextExtractor extractor = new TextExtractor();
extractor.RegistrationName = "demo";
extractor.RegistrationKey = "demo";

We are loading input PDF file to text extractor instance by using “LoadDocumentFromFile” method. We can also have stream as input source, and we can utilize it by using “LoadDocumentFromStream” method.

// Load sample PDF document
extractor.LoadDocumentFromFile("sample1.pdf");

Then we’re getting all page numbers and looping through all pages to perform search. We get the number of pages in PDF by using “GetPageCount” method.

int pageCount = extractor.GetPageCount();

Lastly we’re using method “Find” to search word “ipsum” in input file.

// Search each page for "ipsum" string
if (extractor.Find(i, "ipsum", false, out location))
{
  do
  {
    Console.WriteLine("Found on page " + i + " location " + location.ToString()
  }
  while (extractor.FindNext(out location));
}

We’re looping through all enumerations, till all words are found. Co-Ordinates of found text are printed on console.

That’s all guys, I hope you find this article useful to understand how to find text using Bytescout SDK.

Happy Coding!

Tutorials:

prev
next