ByteScout Document Parser SDK – C# – Parse with OCR

Home
/
Articles
/
ByteScout Document Parser SDK – C# – Parse with OCR

printable version:
ByteScout-Document-Parser-SDK-C-sharp-Parse-with-OCR.pdf

How to parse with OCR in C# with ByteScout Document Parser SDK

How to code in C# to parse with OCR with this step-by-step tutorial

Every ByteScout tool contains example C# source codes that you can find here or in the folder with installed ByteScout product. ByteScout Document Parser SDK: the robost offline data extraction platform for template based data extraction and processing. Supports high load with millions of documents as input. Templates can be quickly created and updated with no special technical knowledge required. It can parse with OCR in C#.

This code snippet below for ByteScout Document Parser SDK works best when you need to quickly parse with OCR in your C# application. Follow the instructions from the scratch to work and copy the C# code. Enjoy writing a code with ready-to-use sample codes in C#.

Trial version of ByteScout Document Parser SDK is available for free. Source code samples are included to help you with your C# app.

On-demand (REST Web API) version:
Web API (on-demand version)

On-premise offline SDK for Windows:
60 Day Free Trial (on-premise)

Program.cs

      using System;
using ByteScout.DocumentParser;

// This example demonstrates parsing of scanned documents
// using the Optical Character Recognition (OCR).

namespace GeneralExample
{
    class Program
    {
        static void Main(string[] args)
        {
            string template = @".\DigitalOcean.yml";
            string inputDocument = @".\DigitalOcean-scanned.jpg";

            // Create and activate DocumentParser instance
            using (DocumentParser documentParser = new DocumentParser("demo", "demo"))
            {
                // Enable Optical Character Recognition (OCR) in Auto mode
                // (DocumentParser automatically detects if OCR is required).
                documentParser.OCRMode = OCRMode.Auto;

                // Set PDF document rendering resolution
                documentParser.OCRResolution = 300;

                // Set the location of OCR language data files
                documentParser.OCRLanguageDataFolder = @"c:\Program Files\ByteScout Document Parser SDK\ocrdata";

                // Set OCR language
                // "eng" for english, "deu" for German, "fra" for French, etc. - according to files in "ocrdata" folder
                documentParser.OCRLanguage = "eng";
                // Find more language files at https://github.com/bytescout/ocrdata

                // Note: The OCRLanguage can be overridden in a template. 
                // See the Template Creation Guide.

                
                
                // You can also apply various preprocessing filters
                // to improve the recognition on low-quality scans.

                // Automatically deskew skewed scans
                //documentParser.OCRImagePreprocessingFilters.AddDeskew();

                // Remove vertical or horizontal lines (sometimes helps to avoid OCR engine's page segmentation errors)
                //documentParser.OCRImagePreprocessingFilters.AddVerticalLinesRemover();
                //documentParser.OCRImagePreprocessingFilters.AddHorizontalLinesRemover();

                // Repair broken letters
                //documentParser.OCRImagePreprocessingFilters.AddDilate();

                // Remove noise
                //documentParser.OCRImagePreprocessingFilters.AddMedian();

                // Apply Gamma Correction
                //documentParser.OCRImagePreprocessingFilters.AddGammaCorrection(1.4);

                // Add Contrast
                //documentParser.OCRImagePreprocessingFilters.AddContrast(20);


                // Load template
                documentParser.AddTemplate(template);

                Console.WriteLine("Template loaded.");
                Console.WriteLine();

                Console.WriteLine({code}quot;Parsing \"{inputDocument}\" with OCR...");
                Console.WriteLine();

                // Parse document data to JSON format
                string jsonString = documentParser.ParseDocument(inputDocument, OutputFormat.JSON);

                // Display parsed data in console
                Console.WriteLine("Parsed data in JSON format:");
                Console.WriteLine();
                Console.WriteLine(jsonString);
            }

            Console.WriteLine();
            Console.WriteLine("Press any key to continue...");
            Console.ReadLine();
        }
    }
}