In this tutorial, we will learn how to extract a PDF document with multiple columns such as Periodicals, Newspaper, School Newspaper, etc into Text. We will use this PDF document with three columns to extract and preserve its text formatting in the output.
Screenshot of Source File
First, let’s open our PDF file in the PDF Multitool.
Next, on the left navigation panel click on Extract as TXT under the Data Extraction folder.
It will open a small window where you can choose and set some of the functions for the text extraction. In this demonstration, we will only cover the functions below. The OCR Settings is very useful for scanned PDF files. To learn more about it, please check out this tutorial here.
We will use the default settings and click on the Extract to File button to save the TXT file.
Great! We have converted the PDF document to TXT successfully.
In this tutorial, we learned how to convert a PDF document with multiple columns into text. We preserved the three columns format in the TXT output and covered some of the basic functions in the PDF Extractor.
Screenshot of Output TXT