ByteScout PDF Extractor SDK – VB.NET – Check If OCR Is Required for PDF

check if OCR is required for PDF in VB.NET using ByteScout PDF Extractor SDK

How to code check if OCR is required for PDF in VB.NET: How-To tutorial

The coding tutorials are designed to help you test the features without need to write your own code. ByteScout PDF Extractor SDK was made to help with check if OCR is required for PDF in VB.NET. ByteScout PDF Extractor SDK is the SDK that helps developers to extract data from unstructured documents, pdf, images, scanned and electronic forms. Includes AI functions like automatic table detection, automatic table extraction and restructuring, text recognition and text restoration from pdf and scanned documents. Includes PDF to CSV, PDF to XML, PDF to JSON, PDF to searchable PDF functions as well as methods for low level data extraction.

You will save a lot of time on writing and testing code as you may just take the code below and use it in your application. To do check if OCR is required for PDF in your VB.NET project or application you may simply copy & paste the code and then run your app! VB.NET application implementation typically includes multiple stages of the software development so even if the functionality works please test it with your data and the production environment.

On our website you may get trial version of ByteScout PDF Extractor SDK for free. Source code samples are included to help you with your VB.NET application.

<?xml version="1.0" encoding="utf-8"?> <Project ToolsVersion="15.0" xmlns=""> <Import Project="$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props" Condition="Exists('$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props')" /> <PropertyGroup> <Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration> <Platform Condition=" '$(Platform)' == '' ">AnyCPU</Platform> <ProjectGuid>{80667702-F68F-42E8-AF48-A3F9D8C879CF}</ProjectGuid> <OutputType>Exe</OutputType> <StartupObject>CheckIfOCRIsRequired.Program</StartupObject> <RootNamespace>CheckIfOCRIsRequired</RootNamespace> <AssemblyName>CheckIfOCRIsRequired</AssemblyName> <FileAlignment>512</FileAlignment> <MyType>Console</MyType> <TargetFrameworkVersion>v2.0</TargetFrameworkVersion> </PropertyGroup> <PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Debug|AnyCPU' "> <PlatformTarget>AnyCPU</PlatformTarget> <DebugSymbols>true</DebugSymbols> <DebugType>full</DebugType> <DefineDebug>true</DefineDebug> <DefineTrace>true</DefineTrace> <OutputPath>bin\Debug\</OutputPath> <DocumentationFile>CheckIfOCRIsRequired.xml</DocumentationFile> <NoWarn>42016,41999,42017,42018,42019,42032,42036,42020,42021,42022</NoWarn> </PropertyGroup> <PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|AnyCPU' "> <PlatformTarget>AnyCPU</PlatformTarget> <DebugType>pdbonly</DebugType> <DefineDebug>false</DefineDebug> <DefineTrace>true</DefineTrace> <Optimize>true</Optimize> <OutputPath>bin\Release\</OutputPath> <DocumentationFile>CheckIfOCRIsRequired.xml</DocumentationFile> <NoWarn>42016,41999,42017,42018,42019,42032,42036,42020,42021,42022</NoWarn> </PropertyGroup> <PropertyGroup> <OptionExplicit>On</OptionExplicit> </PropertyGroup> <PropertyGroup> <OptionCompare>Binary</OptionCompare> </PropertyGroup> <PropertyGroup> <OptionStrict>Off</OptionStrict> </PropertyGroup> <PropertyGroup> <OptionInfer>On</OptionInfer> </PropertyGroup> <ItemGroup> <Reference Include="Bytescout.PDFExtractor, Version=, Culture=neutral, PublicKeyToken=f7dd1bd9d40a50eb, processorArchitecture=MSIL"> <SpecificVersion>False</SpecificVersion> <HintPath>c:\Program Files\Bytescout PDF Extractor SDK\net2.00\Bytescout.PDFExtractor.dll</HintPath> </Reference> <Reference Include="System" /> <Reference Include="System.Data" /> <Reference Include="System.Deployment" /> <Reference Include="System.Xml" /> </ItemGroup> <ItemGroup> <Import Include="Microsoft.VisualBasic" /> <Import Include="System" /> <Import Include="System.Collections" /> <Import Include="System.Collections.Generic" /> <Import Include="System.Data" /> <Import Include="System.Diagnostics" /> </ItemGroup> <ItemGroup> <Compile Include="Program.vb" /> </ItemGroup> <ItemGroup> <None Include="InputFiles\sample_ocr_not_required.pdf"> <CopyToOutputDirectory>Always</CopyToOutputDirectory> </None> <None Include="InputFiles\sample_ocr_required.pdf"> <CopyToOutputDirectory>Always</CopyToOutputDirectory> </None> </ItemGroup> <Import Project="$(MSBuildToolsPath)\Microsoft.VisualBasic.targets" /> </Project>

Imports Bytescout.PDFExtractor Module Program Sub Main() Try ' Loop through all files in directory and check whether OCR operation is required For Each filePath As String In System.IO.Directory.GetFiles("InputFiles") _CheckOCRRequired(filePath) Next Catch ex As Exception Console.WriteLine("Error: " + ex.Message) End Try Console.WriteLine("Press enter key to exit...") Console.ReadLine() End Sub ''' <summary> ''' Check whether OCR Operation is required ''' </summary> ''' <param name="filePath"></param> Private Sub _CheckOCRRequired(ByVal filePath As String) ' Read all file content... Using extractor As TextExtractor = New TextExtractor() extractor.RegistrationKey = "demo" extractor.RegistrationName = "demo" ' Load document extractor.LoadDocumentFromFile(filePath) Console.WriteLine("{1}*******************{1}{1}FilePath: {0}", filePath, vbLf) Dim pageIndex As Int32 = 0 ' Identify OCR operation is recommended for page If (extractor.IsOCRRecommendedForPage(pageIndex)) Then Console.WriteLine("{0}OCR Recommended: True", vbLf) ' Enable Optical Character Recognition (OCR) ' in .Auto mode (SDK automatically checks if needs to use OCR or not) extractor.OCRMode = OCRMode.Auto ' Set the location of OCR language data files extractor.OCRLanguageDataFolder = "c:\Program Files\Bytescout PDF Extractor SDK\ocrdata\" ' Set OCR language extractor.OCRLanguage = "eng" ' "eng" for english, "deu" for German, "fra" for French, "spa" for Spanish etc - according to files in "ocrdata" folder ' Find more language files at ' Set PDF document rendering resolution extractor.OCRResolution = 300 Else Console.WriteLine("{0}OCR Recommended: False", vbLf) End If ' Read all text Dim allExtractedText = extractor.GetText() Console.WriteLine("{1}Extracted Text:{1}{0}{1}{1}", allExtractedText, vbLf) End Using End Sub End Module

