ByteScout PDF Extractor SDK - C# - Check If OCR Is Required for PDF - ByteScout

ByteScout PDF Extractor SDK – C# – Check If OCR Is Required for PDF

  • Home
  • /
  • Articles
  • /
  • ByteScout PDF Extractor SDK – C# – Check If OCR Is Required for PDF

check if OCR is required for PDF in C# and ByteScout PDF Extractor SDK

Make check if OCR is required for PDF in C#

:

Tutorial on how to do check if OCR is required for PDF in C#

Every ByteScout tool contains example C# source codes that you can find here or in the folder with installed ByteScout product. Check if OCR is required for PDF in C# can be implemented with ByteScout PDF Extractor SDK. ByteScout PDF Extractor SDK is the Software Development Kit (SDK) that is designed to help developers with data extraction from unstructured documents like pdf, tiff, scans, images, scanned and electronic forms. The library is powered by OCR, computer vision and AI to provide unique functionality like table detection, automatic table structure extraction, data restoration, data restructuring and reconstruction. Supports PDF, TIFF, PNG, JPG images as input and can output CSV, XML, JSON formatted data. Includes full set of utilities like pdf splitter, pdf merger, searchable pdf maker.

The SDK samples like this one below explain how to quickly make your application do check if OCR is required for PDF in C# with the help of ByteScout PDF Extractor SDK. C# sample code is all you need: copy and paste the code to your C# application’s code editor, add a reference to ByteScout PDF Extractor SDK (if you haven’t added yet) and you are ready to go! Enhanced documentation and tutorials are available along with installed ByteScout PDF Extractor SDK if you’d like to dive deeper into the topic and the details of the API.

On our website you may get trial version of ByteScout PDF Extractor SDK for free. Source code samples are included to help you with your C# application.

Try it today: Get 60 Day Free Trial or sign up for Web API

CheckIfOCRIsRequired.csproj
      
<?xml version="1.0" encoding="utf-8"?> <Project ToolsVersion="15.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003"> <Import Project="$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props" Condition="Exists('$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props')" /> <PropertyGroup> <Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration> <Platform Condition=" '$(Platform)' == '' ">AnyCPU</Platform> <ProjectGuid>{99735776-2956-463D-9795-EBCE16928C30}</ProjectGuid> <OutputType>Exe</OutputType> <RootNamespace>CheckIfOCRIsRequired</RootNamespace> <AssemblyName>CheckIfOCRIsRequired</AssemblyName> <TargetFrameworkVersion>v2.0</TargetFrameworkVersion> <FileAlignment>512</FileAlignment> </PropertyGroup> <PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Debug|AnyCPU' "> <PlatformTarget>AnyCPU</PlatformTarget> <DebugSymbols>true</DebugSymbols> <DebugType>full</DebugType> <Optimize>false</Optimize> <OutputPath>bin\Debug\</OutputPath> <DefineConstants>DEBUG;TRACE</DefineConstants> <ErrorReport>prompt</ErrorReport> <WarningLevel>4</WarningLevel> </PropertyGroup> <PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|AnyCPU' "> <PlatformTarget>AnyCPU</PlatformTarget> <DebugType>pdbonly</DebugType> <Optimize>true</Optimize> <OutputPath>bin\Release\</OutputPath> <DefineConstants>TRACE</DefineConstants> <ErrorReport>prompt</ErrorReport> <WarningLevel>4</WarningLevel> </PropertyGroup> <ItemGroup> <Reference Include="Bytescout.PDFExtractor, Version=9.1.0.3170, Culture=neutral, PublicKeyToken=f7dd1bd9d40a50eb, processorArchitecture=MSIL"> <SpecificVersion>False</SpecificVersion> <HintPath>c:\Program Files\Bytescout PDF Extractor SDK\net2.00\Bytescout.PDFExtractor.dll</HintPath> </Reference> <Reference Include="System" /> <Reference Include="System.Data" /> <Reference Include="System.Xml" /> </ItemGroup> <ItemGroup> <Compile Include="Program.cs" /> </ItemGroup> <ItemGroup> <None Include="InputFiles\sample_ocr_not_required.pdf"> <CopyToOutputDirectory>Always</CopyToOutputDirectory> </None> <None Include="InputFiles\sample_ocr_required.pdf"> <CopyToOutputDirectory>Always</CopyToOutputDirectory> </None> </ItemGroup> <Import Project="$(MSBuildToolsPath)\Microsoft.CSharp.targets" /> </Project>

Try it today: Get 60 Day Free Trial or sign up for Web API

Program.cs
      
using Bytescout.PDFExtractor; using System; namespace CheckIfOCRIsRequired { class Program { static void Main(string[] args) { try { // Loop through all files in directory and check whether OCR operation is required foreach (string filePath in System.IO.Directory.GetFiles("InputFiles")) { _CheckOCRRequired(filePath); } } catch (Exception ex) { Console.WriteLine("Error: " + ex.Message); } Console.WriteLine("Press enter key to exit..."); Console.ReadLine(); } /// <summary> /// Check whether OCR Operation is required /// </summary> /// <param name="filePath"></param> private static void _CheckOCRRequired(string filePath) { //Read all file content... using (TextExtractor extractor = new TextExtractor()) { extractor.RegistrationKey = "demo"; extractor.RegistrationName = "demo"; // Load document extractor.LoadDocumentFromFile(filePath); Console.WriteLine("\n*******************\n\nFilePath: {0}", filePath); int pageIndex = 0; // Identify OCR operation is recommended for page if (extractor.IsOCRRecommendedForPage(pageIndex)) { Console.WriteLine("\nOCR Recommended: True"); // Enable Optical Character Recognition (OCR) // in .Auto mode (SDK automatically checks if needs to use OCR or not) extractor.OCRMode = OCRMode.Auto; // Set the location of language data files extractor.OCRLanguageDataFolder = @"c:\Program Files\Bytescout PDF Extractor SDK\ocrdata\"; // Set OCR language extractor.OCRLanguage = "eng"; // "eng" for english, "deu" for German, "fra" for French, "spa" for Spanish etc - according to files in "ocrdata" folder // Find more language files at https://github.com/bytescout/ocrdata // Set PDF document rendering resolution extractor.OCRResolution = 300; } else { Console.WriteLine("\nOCR Recommended: False"); } //Read all text var allExtractedText = extractor.GetText(); Console.WriteLine("\nExtracted Text:\n{0}\n\n", allExtractedText); } } } }

Try it today: Get 60 Day Free Trial or sign up for Web API

MORE INFORMATION

Get 60 Day Free Trial or Visit ByteScout PDF Extractor SDK page

Explore ByteScout PDF Extractor SDK documentation

WEB API VERSION

Sign Up for free Web API key

Explore Web API Documentation

Tutorials:

prev
next