ByteScout PDF Suite – VB.NET – Ocr (optical character recognition) and pdf with pdf extractor sdk

Home
/
Articles
/
ByteScout PDF Suite – VB.NET – Ocr (optical character recognition) and pdf with pdf extractor sdk

printable version:
ByteScout-PDF-Suite-VB-NET-Ocr-(optical-character-recognition)-and-pdf-with-pdf-extractor-sdk.pdf

ocr (optical character recognition) and pdf with pdf extractor sdk in VB.NET with ByteScout PDF Suite

Learn ocr (optical character recognition) and pdf with pdf extractor sdk in VB.NET

We regularly create and update our sample code library so you may quickly learn ocr (optical character recognition) and pdf with pdf extractor sdk and the step-by-step process in VB.NET. ByteScout PDF Suite helps with ocr (optical character recognition) and pdf with pdf extractor sdk in VB.NET. ByteScout PDF Suite is the set that includes 6 SDK products to work with PDF from generating rich PDF reports to extracting data from PDF documents and converting them to HTML. This bundle includes PDF (Generator) SDK, PDF Renderer SDK, PDF Extractor SDK, PDF to HTML SDK, PDF Viewer SDK and PDF Generator SDK for Javascript.

If you want to quickly learn then these fast application programming interfaces of ByteScout PDF Suite for VB.NET plus the guideline and the VB.NET code below will help you quickly learn ocr (optical character recognition) and pdf with pdf extractor sdk. To use ocr (optical character recognition) and pdf with pdf extractor sdk in your VB.NET project or application just copy & paste the code and then run your app! Enjoy writing a code with ready-to-use sample VB.NET codes to add ocr (optical character recognition) and pdf with pdf extractor sdk functions using ByteScout PDF Suite in VB.NET.

Visit our website to get a free trial version of ByteScout PDF Suite. Free trial contains many of source code samples to help you with your VB.NET project.

On-demand (REST Web API) version:
Web API (on-demand version)

On-premise offline SDK for Windows:
60 Day Free Trial (on-premise)

Program.vb

      Imports Bytescout.PDFExtractor

' This example demonstrates the use of Optical Character Recognition (OCR) to extract text 
' from scanned PDF documents and raster images.

' To make OCR work you should add the following references to your project:
' "Bytescout.PDFExtractor.dll", "Bytescout.PDFExtractor.OCRExtension.dll".

Class Program

    Friend Shared Sub Main(args As String())

        ' Create Bytescout.PDFExtractor.TextExtractor instance
        Dim extractor As New TextExtractor()
        extractor.RegistrationName = "demo"
        extractor.RegistrationKey = "demo"

        ' Load sample PDF document
        extractor.LoadDocumentFromFile("sample_ocr.pdf")

        ' Enable Optical Character Recognition (OCR)
        ' in .Auto mode (SDK automatically checks if needs to use OCR or not)
        extractor.OCRMode = OCRMode.Auto

        ' Set the location of OCR language data files
        extractor.OCRLanguageDataFolder = "c:\Program Files\Bytescout PDF Extractor SDK\ocrdata"
        
        ' Set OCR language
        extractor.OCRLanguage = "eng"  ' "eng" for english, "deu" for German, "fra" for French, "spa" for Spanish etc - according to files in "ocrdata" folder
        ' Find more language files at https://github.com/bytescout/ocrdata
        
        ' Set PDF document rendering resolution
        extractor.OCRResolution = 300
        
        
        ' You can also apply various preprocessing filters
        ' to improve the recognition on low-quality scans.
        
        ' Automatically deskew skewed scans
        'extractor.OCRImagePreprocessingFilters.AddDeskew()

        ' Remove vertical or horizontal lines (sometimes helps to avoid OCR engine's page segmentation errors)
        'extractor.OCRImagePreprocessingFilters.AddVerticalLinesRemover()
        'extractor.OCRImagePreprocessingFilters.AddHorizontalLinesRemover()
        
        ' Repair broken letters
        'extractor.OCRImagePreprocessingFilters.AddDilate()

        ' Remove noise
        'extractor.OCRImagePreprocessingFilters.AddMedian()
        
        ' Apply Gamma Correction
        'extractor.OCRImagePreprocessingFilters.AddGammaCorrection()
        
        ' Add Contrast
		' extractor.OCRImagePreprocessingFilters.AddContrast(20)


        ' (!) You can use new OCRAnalyzer class to find an optimal set of image preprocessing 
		' filters for your specific document.
		' See "OCR Analyser" example.


        ' Save extracted text to file
        extractor.SaveTextToFile("output.txt")

        ' Cleanup
		extractor.Dispose()

        ' Open output file in default associated application
        System.Diagnostics.Process.Start("output.txt")

    End Sub
    
End Class