ByteScout PDF Suite – C# – Index PDF Files with PDF Extractor SDK

Home
/
Articles
/
ByteScout PDF Suite – C# – Index PDF Files with PDF Extractor SDK

printable version:
ByteScout-PDF-Suite-C-sharp-Index-PDF-Files-with-PDF-Extractor-SDK.pdf

How to index PDF files with PDF extractor SDK in C# with ByteScout PDF Suite

Learning is essential in computer world and the tutorial below will demonstrate how to index PDF files with PDF extractor SDK in C#

The documentation is designed for a specific purpose to help you to apply the features on your side. ByteScout PDF Suite is the bundle that provides six different SDK libraries to work with PDF from generating rich PDF reports to extracting data from PDF documents and converting them to HTML. This bundle includes PDF (Generator) SDK, PDF Renderer SDK, PDF Extractor SDK, PDF to HTML SDK, PDF Viewer SDK and PDF Generator SDK for Javascript. It can be applied to index PDF files with PDF extractor SDK using C#.

Want to save time? You will save a lot of time on writing and testing code as you may just take the C# code from ByteScout PDF Suite for index PDF files with PDF extractor SDK below and use it in your application. Just copy and paste the code into your C# application’s code and follow the instructions. Enjoy writing a code with ready-to-use sample C# codes.

ByteScout provides the free trial version of ByteScout PDF Suite along with the documentation and source code samples.

On-demand (REST Web API) version:
Web API (on-demand version)

On-premise offline SDK for Windows:
60 Day Free Trial (on-premise)

Program.cs

      using System;
using System.IO;
using Bytescout.PDFExtractor;

namespace IndexPDFFiles
{
	class Program
	{
		static void Main(string[] args)
		{
			// Create Bytescout.PDFExtractor.InfoExtractor instance
			InfoExtractor infoExtractor = new InfoExtractor();
			infoExtractor.RegistrationName = "demo";
			infoExtractor.RegistrationKey = "demo";

			TextExtractor textExtractor = new TextExtractor();
			textExtractor.RegistrationName = "demo";
			textExtractor.RegistrationKey = "demo";

			// List all PDF files in directory
			foreach (string file in Directory.GetFiles(@"..\..\..\..", "*.pdf"))
			{
				infoExtractor.LoadDocumentFromFile(file);

				Console.WriteLine("File Name:      " + Path.GetFileName(file));
				Console.WriteLine("Page Count:     " + infoExtractor.GetPageCount());
				Console.WriteLine("Author:         " + infoExtractor.Author);
				Console.WriteLine("Title:          " + infoExtractor.Title);
				Console.WriteLine("Producer:       " + infoExtractor.Producer);
				Console.WriteLine("Subject:        " + infoExtractor.Subject);
				Console.WriteLine("CreationDate:   " + infoExtractor.CreationDate);
				Console.WriteLine("Text (first 2 lines): ");

				// Load a couple of lines from each document
				textExtractor.LoadDocumentFromFile(file);
				using (StringReader stringReader = new StringReader(textExtractor.GetTextFromPage(0)))
				{
				    Console.WriteLine(stringReader.ReadLine());
				    Console.WriteLine(stringReader.ReadLine());
				}
				Console.WriteLine();
			}

			// Cleanup
			infoExtractor.Dispose();
        	textExtractor.Dispose();
			
			Console.WriteLine();
			Console.WriteLine("Press any key to continue...");
			Console.ReadLine();
		}
	}
}