ByteScout Data Extraction Suite – C# – Index PDF Files with PDF Extractor SDK

Home
/
Articles
/
ByteScout Data Extraction Suite – C# – Index PDF Files with PDF Extractor SDK

printable version:
ByteScout-Data-Extraction-Suite-C-sharp-Index-PDF-Files-with-PDF-Extractor-SDK.pdf

How to index PDF files with PDF extractor SDK in C# with ByteScout Data Extraction Suite

Learning is essential in computer world and the tutorial below will demonstrate how to index PDF files with PDF extractor SDK in C#

Sample source code below will display you how to manage a complex task like index PDF files with PDF extractor SDK in C#. ByteScout Data Extraction Suite: the bundle that includes three SDK tools for data extraction from PDF, scans, images and from spreadsheets: PDF Extractor SDK, Data Extraction SDK, Barcode Reader SDK. It can index PDF files with PDF extractor SDK in C#.

Want to save time? You will save a lot of time on writing and testing code as you may just take the C# code from ByteScout Data Extraction Suite for index PDF files with PDF extractor SDK below and use it in your application. IF you want to implement the functionality, just copy and paste this code for C# below into your code editor with your app, compile and run your application. Want to see how it works with your data then code testing will allow the function to be tested and work properly.

ByteScout provides the free trial version of ByteScout Data Extraction Suite along with the documentation and source code samples.

On-demand (REST Web API) version:
Web API (on-demand version)

On-premise offline SDK for Windows:
60 Day Free Trial (on-premise)

Program.cs

      using System;
using System.IO;
using Bytescout.PDFExtractor;

namespace IndexPDFFiles
{
	class Program
	{
		static void Main(string[] args)
		{
			// Create Bytescout.PDFExtractor.InfoExtractor instance
			InfoExtractor infoExtractor = new InfoExtractor();
			infoExtractor.RegistrationName = "demo";
			infoExtractor.RegistrationKey = "demo";

			TextExtractor textExtractor = new TextExtractor();
			textExtractor.RegistrationName = "demo";
			textExtractor.RegistrationKey = "demo";

			// List all PDF files in directory
			foreach (string file in Directory.GetFiles(@"..\..\..\..", "*.pdf"))
			{
				infoExtractor.LoadDocumentFromFile(file);

				Console.WriteLine("File Name:      " + Path.GetFileName(file));
				Console.WriteLine("Page Count:     " + infoExtractor.GetPageCount());
				Console.WriteLine("Author:         " + infoExtractor.Author);
				Console.WriteLine("Title:          " + infoExtractor.Title);
				Console.WriteLine("Producer:       " + infoExtractor.Producer);
				Console.WriteLine("Subject:        " + infoExtractor.Subject);
				Console.WriteLine("CreationDate:   " + infoExtractor.CreationDate);
				Console.WriteLine("Text (first 2 lines): ");

				// Load a couple of lines from each document
				textExtractor.LoadDocumentFromFile(file);
				using (StringReader stringReader = new StringReader(textExtractor.GetTextFromPage(0)))
				{
				    Console.WriteLine(stringReader.ReadLine());
				    Console.WriteLine(stringReader.ReadLine());
				}
				Console.WriteLine();
			}

			// Cleanup
			infoExtractor.Dispose();
        	textExtractor.Dispose();
			
			Console.WriteLine();
			Console.WriteLine("Press any key to continue...");
			Console.ReadLine();
		}
	}
}