ByteScout Data Extraction Suite – C# – Repair Text in PDF with PDF Extractor SDK

Home
/
Articles
/
ByteScout Data Extraction Suite – C# – Repair Text in PDF with PDF Extractor SDK

printable version:
ByteScout-Data-Extraction-Suite-C-sharp-Repair-Text-in-PDF-with-PDF-Extractor-SDK.pdf

repair text in PDF with PDF extractor SDK in C# with ByteScout Data Extraction Suite

Simple tutorial on how to do repair text in PDF with PDF extractor SDK in C#

The example source codes on this page will display you how to make repair text in PDF with PDF extractor SDK in C#. ByteScout Data Extraction Suite helps with repair text in PDF with PDF extractor SDK in C#. ByteScout Data Extraction Suite is the bundle that includes three SDK tools for data extraction from PDF, scans, images and from spreadsheets: PDF Extractor SDK, Data Extraction SDK, Barcode Reader SDK.

This rich and prolific sample source code in C# for ByteScout Data Extraction Suite contains various functions and options you should do calling the API to implement repair text in PDF with PDF extractor SDK. Just copy and paste this C# sample code to your C# application’s code editor, add a reference to ByteScout Data Extraction Suite (if you haven’t added yet) and you are ready to go! Want to see how it works with your data then code testing will allow the function to be tested and work properly.

Trial version can be downloaded from our website for free. It contains this and other source code samples for C#.

On-demand (REST Web API) version:
Web API (on-demand version)

On-premise offline SDK for Windows:
60 Day Free Trial (on-premise)

Program.cs

      using Bytescout.PDFExtractor;
using System;

namespace RepairText
{
    class Program
    {
        static void Main(string[] args)
        {
            try
            {
                //Read all text from pdf file
                using (TextExtractor extractor = new TextExtractor())
                {
                    // Load PDF document
                    extractor.LoadDocumentFromFile("sample.pdf");

                    // Set the font repairing OCR mode 
                    extractor.OCRMode = OCRMode.TextFromImagesAndVectorsAndRepairedFonts;

                    // Set the location of OCR language data files
                    extractor.OCRLanguageDataFolder = @"c:\Program Files\Bytescout PDF Extractor SDK\ocrdata\";

                    // Set OCR language
                    extractor.OCRLanguage = "eng"; // "eng" for english, "deu" for German, "fra" for French, "spa" for Spanish etc - according to files in "ocrdata" folder
                    // Find more language files at https://github.com/bytescout/ocrdata

                    // Set PDF document rendering resolution
                    extractor.OCRResolution = 300;

                    //Read all text
                    string allText = extractor.GetText();

                    Console.WriteLine("Extracted Text: \n\n" + allText);
                }
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }

            Console.ReadLine();
        }
    }
}