In this article, we’re going to see how we can make a searchable PDF from the scan PDF. We are having one sample file here. Basically, it is containing some data, but it is a scanned version image. By using the ByteScout Extractor SDK, we will convert this scanned PDF into a searchable PDF returning its layout. Let’s see how we can work around this. I’m going to copy and paste it into the Solution Explorer Window.
First of all, we will need a searchable PDF maker instance. We will Load the Document and specify the OCR Language Data Folder to it. These files will be installed when you install the ByteScout SDK. It’s basically contents, all the lengths, all the data files which are useful when OCR is converting scanned PDF into the text PDF. Basically, it is like an image processing thing. Then we are going to specify the language in which this file is a visual language. It will select files accordingly in the specified folder. We are going to specify the resolution and then generate the output.
Now I am going to create an instance of the searchable PDF maker which is like using(SearchablePDFMaker searchablePDFMaker = new SearchablePDFMaker”demo”, “demo”). I’m going to provide the registration key and name. I’m also using the demo Keys here. Then let us load the document from the file.
Specify the data folder, searchablePDFMaker.OCRLanguageDataFolder= @”C:\Program Files\ByteScout PDF Extractor SDK\net4.00\tessdata”; just copy and paste this path. Now specify the language searchablePDFMaker.OCRLanguage = “eng”; then specify the resolution, searchablePDFMaker.OCRResolution = 300;
Then generate the output, searchablePDFMaker.MakePDFSearchable(“output.pdf”); and then execute it. It is done. Let’s see what is generated in the bin folder. We can see the layout is the same but we can search it.
also available as: