ByteScout PDF Extractor SDK – VB.NET – OCR Analyser

  • Home
  • /
  • Articles
  • /
  • ByteScout PDF Extractor SDK – VB.NET – OCR Analyser

ByteScout PDF Extractor SDK – VB.NET – OCR Analyser


Imports System.Drawing
Imports Bytescout.PDFExtractor

' This example demonstrates the use of OCR Analyser - a tooling class for analysis of scanned documents
' in PDF or raster image formats to find best parameters for Optical Character Recognition (OCR) that
' provide highest recognition quality.

' To make OCR work you should add the following references to your project:
' 'Bytescout.PDFExtractor.dll', 'Bytescout.PDFExtractor.OCRExtension.dll'.

Class Program

    Friend Shared Sub Main(args As String())

        ' Input document
        Dim inputDocument As String = ".\sample_ocr.pdf"

        ' Document page index
        Dim pageIndex As Integer = 0

        ' Area of the document page to perform the analysis (optional).
        ' RectangleF.Empty means the full page.
        Dim rectangle As RectangleF = RectangleF.Empty ' New RectangleF(100, 50, 350, 250)

        ' Location of "tessdata" folder containing language data files
        Dim ocrLanguageDataFolder As String = "c:\Program Files\Bytescout PDF Extractor SDK\Redistributable\net2.00\tessdata\"

        ' OCR language
        Dim ocrLanguage As String = "eng" ' "eng" for english, "deu" for German, "fra" for French, "spa" for Spanish etc - according to files in /tessdata
        ' Find more language files at

        ' Create OCRAnalyzer instance and activate it with your registration information
        Using ocrAnalyzer As New OCRAnalyzer("demo", "demo")

            ' Display analysis progress
            AddHandler ocrAnalyzer.ProgressChanged, Sub(sender, message, progress, ByRef cancel)
                                                    End Sub

            ' Load document to OCRAnalyzer

            ' Setup OCRAnalyzer
            ocrAnalyzer.OCRLanguage = ocrLanguage
            ocrAnalyzer.OCRLanguageDataFolder = ocrLanguageDataFolder

            ' Set page area for analysis (optional)

            ' Perform analysis and get results
            Dim analysisResults As OCRAnalysisResults = ocrAnalyzer.AnalyzeByOCRConfidence(pageIndex)

            ' Now extract page text using detected OCR parameters

            Dim outputDocument As String = ".\result.txt"

            ' Create TextExtractor instance
            Using textExtractor As TextExtractor = New TextExtractor("demo", "demo")

                ' Load document to TextExtractor

                ' Setup TextExtractor
                textExtractor.OCRMode = OCRMode.Auto
                textExtractor.OCRLanguageDataFolder = ocrLanguageDataFolder
                textExtractor.OCRLanguage = ocrLanguage

                ' Apply analysys results to TextExtractor instance
                ocrAnalyzer.ApplyResults(analysisResults, textExtractor)

                ' Set extraction area (optional)

                ' Save extracted text to file

                ' Open output file in default associated application (for demonstration purposes)

            End Using

        End Using

    End Sub
End Class

  Click here to get your Free Trial version of the SDK