The sample shows instructions and algorithm of how to read PDF with scanned image and hindi text and how to make it run in your VB.NET application. ByteScout PDF Suite: the bundle that provides six different SDK libraries to work with PDF from generating rich PDF reports to extracting data from PDF documents and converting them to HTML. This bundle includes PDF (Generator) SDK, PDF Renderer SDK, PDF Extractor SDK, PDF to HTML SDK, PDF Viewer SDK and PDF Generator SDK for Javascript. It can read PDF with scanned image and hindi text in VB.NET.
Want to save time? You will save a lot of time on writing and testing code as you may just take the VB.NET code from ByteScout PDF Suite for read PDF with scanned image and hindi text below and use it in your application. Follow the instructions from scratch to work and copy the VB.NET code. Check VB.NET sample code samples to see if they respond to your needs and requirements for the project.
All these programming tutorials along with source code samples and ByteScout free trial version are available for download from our website.
On-demand (REST Web API) version:
Web API (on-demand version)
On-premise offline SDK for Windows:
60 Day Free Trial (on-premise)
Imports System.Drawing.Imaging Imports System.IO Imports Bytescout.PDF Imports Bytescout.PDFExtractor Module Program Sub Main() Try ' Files Dim fileName As String = "hindi_text_with_image.pdf" Dim destFileName As String = "output_hindi_text_with_image.pdf" Dim destFileName_serachable As String = "output_hindi_text_with_image_searchable.pdf" ' Read all text from pdf file Dim allTextExtracted As String = "" Using extractor As New TextExtractor ' Load PDF document extractor.LoadDocumentFromFile(fileName) ' Read all text to a variable allTextExtracted = extractor.GetText End Using ' Get image from pdf file Dim memoryStream As MemoryStream = New MemoryStream Using extractor As New ImageExtractor ' Load PDF document extractor.LoadDocumentFromFile(fileName) If extractor.GetFirstImage Then extractor.SaveCurrentImageToStream(memoryStream, ImageFormat.Png) End If End Using ' Load image from file to System.Drawing.Image object (we need it to get the image resolution) Using sysImage As System.Drawing.Image = System.Drawing.Image.FromStream(memoryStream) ' Compute image size in PDF units (Points) Dim widthInPoints As Single = (sysImage.Width / sysImage.HorizontalResolution * 72.0F) Dim heightInPoints As Single = (sysImage.Height / sysImage.VerticalResolution * 72.0F) ' Create new PDF document Dim outPdfDocument As Document = New Document outPdfDocument.RegistrationName = "demo" outPdfDocument.RegistrationKey = "demo" ' Create page of computed size Dim page As Page = New Page(widthInPoints, heightInPoints) ' Add page to the document outPdfDocument.Pages.Add(page) Dim canvas As Canvas = page.Canvas ' Create Bytescout.PDF.Image object from loaded image Dim pdfImage As Image = New Image(sysImage) ' Draw the image canvas.DrawImage(pdfImage, 0, 0, widthInPoints, heightInPoints) ' Dispose the System.Drawing.Image object to free resources sysImage.Dispose() ' Create brush Dim transparentBrush As SolidBrush = New SolidBrush(New ColorGray(0)) ' ... and make it transparent transparentBrush.Opacity = 0 ' Draw text with transparent brush ' Need to set Font which supports hindi characters. Dim font16 As Font = New Font("Arial Unicode MS", 16) canvas.DrawString(allTextExtracted, font16, transparentBrush, 40, 40) ' Save document to file outPdfDocument.Save(destFileName) End Using 'Make PDF file with hindi text searchable to OCR. Using searchablePDFMaker As New SearchablePDFMaker 'Load PDF document searchablePDFMaker.LoadDocumentFromFile(destFileName) ' Set the location of "tessdata" folder containing language data files ' It used following files for hindi language support. Need to put these files into "testdata" folder. Below location contains these files. ' https://github.com/tesseract-ocr/tessdata/tree/3.04.00 ' hin.traineddata ' hin.cube.bigrams ' hin.cube.lm ' hin.cube.nn ' hin.cube.params ' hin.cube.word-freq ' hin.tesseract_cube.nn ' Set the location of "tessdata" folder containing language data files searchablePDFMaker.OCRLanguageDataFolder = "c:\Program Files\Bytescout PDF Extractor SDK\net2.00\tessdata" ' Set OCR language searchablePDFMaker.OCRLanguage = "hin" ' Need to set Font which supports hindi characters searchablePDFMaker.LabelingFont = "Arial Unicode MS" ' Set PDF document rendering resolution searchablePDFMaker.OCRResolution = 300 ' Make PDF document searchable searchablePDFMaker.MakePDFSearchable(destFileName_serachable) End Using ' Open document in default PDF viewer app Process.Start(destFileName_serachable) Catch ex As Exception Console.WriteLine("ERROR:" + ex.Message) End Try Console.WriteLine() Console.WriteLine("Press any key to exit...") Console.ReadLine() End Sub End Module
60 Day Free Trial or Visit ByteScout PDF Suite Home Page
Explore ByteScout PDF Suite Documentation
Explore Samples
Sign Up for ByteScout PDF Suite Online Training
Get Your API Key
Explore Web API Docs
Explore Web API Samples
60 Day Free Trial or Visit ByteScout PDF Suite Home Page
Explore ByteScout PDF Suite Documentation
Explore Samples
Sign Up for ByteScout PDF Suite Online Training
Get Your API Key
Explore Web API Docs
Explore Web API Samples
also available as: