ByteScout PDF Suite – C# – Make Searchable PDF Discarding Existing Content with PDF Extractor SDK

Home
/
Articles
/
ByteScout PDF Suite – C# – Make Searchable PDF Discarding Existing Content with PDF Extractor SDK

printable version:
ByteScout-PDF-Suite-C-sharp-Make-Searchable-PDF-Discarding-Existing-Content-with-PDF-Extractor-SDK.pdf

How to make searchable PDF discarding existing content with PDF extractor SDK in C# using ByteScout PDF Suite

Step-by-step tutorial on how to make searchable PDF discarding existing content with PDF extractor SDK in C#

The coding instructions are formulated to help you to try-out the features without the requirement to write your own code. ByteScout PDF Suite is the set that includes 6 SDK products to work with PDF from generating rich PDF reports to extracting data from PDF documents and converting them to HTML. This bundle includes PDF (Generator) SDK, PDF Renderer SDK, PDF Extractor SDK, PDF to HTML SDK, PDF Viewer SDK and PDF Generator SDK for Javascript and you can use it to make searchable PDF discarding existing content with PDF extractor SDK with C#.

Want to save time? You will save a lot of time on writing and testing code as you may just take the C# code from ByteScout PDF Suite for make searchable PDF discarding existing content with PDF extractor SDK below and use it in your application. IF you want to implement the functionality, just copy and paste this code for C# below into your code editor with your app, compile and run your application. If you want to use these C# sample examples in one or many applications then they can be used easily.

All these programming tutorials along with source code samples and ByteScout free trial version are available for download from our website.

On-demand (REST Web API) version:
Web API (on-demand version)

On-premise offline SDK for Windows:
60 Day Free Trial (on-premise)

Program.cs

      using System.Diagnostics;
using Bytescout.PDFExtractor;

// To make OCR work you should references "Bytescout.PDFExtractor.dll" and "Bytescout.PDFExtractor.OCRExtension.dll" from your project.

namespace MakeSearchablePDFDiscardingExistingContent
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create Bytescout.PDFExtractor.TextExtractor instance
            SearchablePDFMaker searchablePDFMaker = new SearchablePDFMaker();
            searchablePDFMaker.RegistrationName = "demo";
            searchablePDFMaker.RegistrationKey = "demo";

            // Load sample PDF document
            searchablePDFMaker.LoadDocumentFromFile("sample_ocr_withText.pdf");
            
            // Set the location of language data files
            searchablePDFMaker.OCRLanguageDataFolder = @"c:\Program Files\Bytescout PDF Extractor SDK\ocrdata\";

            // Set OCR language
            searchablePDFMaker.OCRLanguage = "eng"; // "eng" for english, "deu" for German, "fra" for French, "spa" for Spanish etc - according to files in "ocrdata" folder

            // Set PDF document rendering resolution
            searchablePDFMaker.OCRResolution = 300;

            // Discard Existing Text in document
            searchablePDFMaker.DiscardExistingDocumentText = true;

            // Save extracted text to file
            searchablePDFMaker.MakePDFSearchable("output.pdf");

            // Cleanup
            searchablePDFMaker.Dispose();

            // Open output file in default associated application
            ProcessStartInfo processStartInfo = new ProcessStartInfo("output.pdf");
            processStartInfo.UseShellExecute = true;
            Process.Start(processStartInfo);
        }
    }
}