PDF Extractor SDK Explained: Extract Text from PDF - ByteScout

PDF Extractor SDK Explained: Extract Text from PDF

  • Home
  • /
  • Articles
  • /
  • PDF Extractor SDK Explained: Extract Text from PDF

The first program is very basic, simple, where we will have one PDF and see how we can extract text from the PDF using ByteScout PDF Extractor SDK. I am going to open my visual studio and create one console application on the desktop. I need to get a reference for the PDF Extractor SDK. I have already installed this SDK and I hope to find all the references on my machine. Now, open ByteScout PDF Extractor SDK, and the reference is added here.


How to Extract Text from PDF Files

The next thing I will need is a sample PDF file, from which we are going to get the text. We can see a sample text. It contains around four pages of text. Now copy and paste in the Solution Explorer window and also included in the output directory. We are all set in order to get a text from the PDF file.

Now let me get the reference which is like using ByteScout.PDFExtractor; then write TextExtractor extractor = new TextExtractor(); after that I’m going to add the registration key and name. I’m using the demo key here. If you are using this in production, you have to use your own key, which you get when you purchase a product. The next step is to load the file into this extractor. I’m going to use a method like a Load Document From File. You can also load from the stream also. But in our case, we’re simply going to get from the file which is like an extractor.LoadDocumentFromFile(“sample_program1.pdf”);. The next step is like var output = extractor.GetText(); and it will get the text.

We also have the option from the start page to the end page. But in this case, we’re going to have all the text and letters. We like output and letters displayed consoleWriteLine(output); as well as we need to write Console.ReadLine();. I guess we are all set up. After execution, we are getting all the text from the PDF. This way we can extract the text and if we want to save the output to some file, let’s see which method we can use. We can use several methods here to save the text to file. Here extractor.SaveTextToFile(“output.txt”); and execute again.

Now we are having the output text. This way we can extract the text from the PDF by using the ByteScout PDF Text Extractor SDK.

Text Extraction with PDF Extractor SDK