PDF files are the most commonly used file formats in electronic documents. It is widely known that PDF is the single most popular format for documents outside of the office. You will most likely encounter PDF files at work virtually every other day.
One of the most common problems associated with PDF documents is their conversion to other formats for modification and different usage. You might perhaps want to open and modify a chart in a PDF document in Excel and you are unable.
Inasmuch as the PDF format is used to secure documents, you will from time to time find yourself stuck with a document that you have challenges editing to your needs.
The PDF Extractor SDK by ByteScout comes in handy to help you out of this problem. It helps developers among many other things, convert PDF files into Excel including charts.
We discuss in this article the process of converting PDF files into Excel charts using this tool.
PDF is an acronym that stands for Portable Document Format. The PDF format for documents can be generally termed as a digital format that is used to save and send electronic documents. One of the main reasons why PDF file format is widely used is its use independent of the operating system, software, and hardware.
You need no additional software, hardware or operating system for you to use or transfer PDF files. The PDF file carries in itself a complete document layout description as well as all the information necessary for its display. It further enables viewing and printing while preserving the content and its visual appearance.
A further advantage of using the PDF file format is that PDF files are compact files. This means that documents saved in PDF file format are kept at a minimum size by the use of an inbuilt file compression algorithm, as well as a file structure that keeps the file size at its minimum.
Considering the above, it is evident why the PDF file format is widely used. It enables file sharing without losing the original file format.
Data within a PDF file cannot be easily modified or altered. This is a feature that is considered an advantage or a disadvantage by different people.
Many Excel users find this a problem particularly when they want to modify the data within a PDF file. They will always be looking for ways they can convert the PDF file into Excel. There are tools available that enable the conversion of PDF files into Excel files. These make it easy to get data such as Excel charts and tables on an Excel sheet from a PDF file.
One of these tools is the PDF Extractor SDK developed by Bytesoft, Inc. This tool enables very many possibilities when converting a PDF file to Excel including;
The tool is easy to integrate into an existing system seamlessly regardless of the language used as well as its characters. Whether you are using C#, Visual Basic 6, Visual Basic .NET, or running VBScript file through the command line, you will be able to convert your PDF file easily.
In the steps outlined below, we shall explain how to go about converting a PDF file into the Excel chart using C# language. The codes for the other languages mentioned above are also included.
Let us look at the code below and then analyze how to go about converting step by step.
using System.IO; using Bytescout.PDFExtractor; using System.Diagnostics; namespace PDF2CSV2XLS { class Program { static void Main(string[] args) { // Create Bytescout.PDFExtractor.XLSExtractor instance XLSExtractor extractor = new XLSExtractor(); extractor.RegistrationName = "demo"; extractor.RegistrationKey = "demo"; File.Delete("output.xls"); // Load sample PDF document extractor.LoadDocumentFromFile("sample3.pdf"); // Save the spreadsheet to file extractor.SaveToXLSFile("output.xls"); // Open the spreadsheet in default associated application Process.Start("output.xls"); } } }
// Create Bytescout.PDFExtractor.XLSExtractor instance
XLSExtractor extractor = new XLSExtractor();
extractor.RegistrationName = "demo";
extractor.RegistrationKey = "demo";
The initial step involves creating an extractor instance as seen in the code above. It is at this point where we provide a name (new XLSExtractor in this case) as well as the registration key. The registration key is sent to you when you purchase the ByteScout SDK Package. In this demo, we are using test keys. These, however, cannot be used in the real production environment.
// Load sample PDF document
extractor.LoadDocumentFromFile("sample3.pdf");
In the second step, we load the PDF document that is to be converted. We use the command extractor.LoadDocumentFromFile(“sample3.pdf”) where (sample3.pdf) is the name of the PDF document.
Remember, you can also provide the file path instead of the name of the file here. In this case, we use the command extractor.LoadDocumentFromFile(@”.\sample3.pdf”) where we load the document from the file path stated.
In the third step, we specify the Optical Character Recognition (OCR) options. OCR works well with almost all human languages. It works by specifying the specific characters that are to be converted from a PDF document. It is using the OCR option that we are able to convert Excel charts from the PDF document.
The good thing with OCR is that even rotated charts in a poorly scanned PDF document can be converted into Excel.
You also have the option of specifying the location of the chart in a document to convert it.
// Save the spreadsheet to file
extractor.SaveToXLSFile("output.xls");
After conversion, we specify where we want to save the document. This is as seen in the code above.
The codes below show how to convert PDF documents into Excel using different languages.
In all the languages indicated, we follow the same process as above. We have not gone into details about how to convert using each language but the process is just easy.
' Create Bytescout.PDFExtractor.XLSExtractor object Set extractor = CreateObject("Bytescout.PDFExtractor.XLSExtractor") extractor.RegistrationName = "demo" extractor.RegistrationKey = "demo" ' Load sample PDF document extractor.LoadDocumentFromFile "../../sample3.pdf" extractor.SaveToXLSFile "output.XLS" MsgBox "Data has been extracted to 'output.XLS' file."
Imports System.IO Imports Bytescout.PDFExtractor Imports System.Diagnostics Class Program Friend Shared Sub Main(args As String()) ' Create Bytescout.PDFExtractor.XLSExtractor instance Dim extractor As New XLSExtractor() extractor.RegistrationName = "demo" extractor.RegistrationKey = "demo" File.Delete("output.xls") ' Load sample PDF document extractor.LoadDocumentFromFile("sample3.pdf") ' Save the spreadsheet to file extractor.SaveToXLSFile("output.xls") ' Open the spreadsheet in default associated application Process.Start("output.xls") End Sub End Class
if Wscript.Arguments.Length < 2 Then WScript.Echo "Usage: PDFToXLS.vbs ""input.PDF"" ""output.XLS""" WScript.Quit End If ' Create Bytescout.PDFExtractor.XLSExtractor object Set extractor = CreateObject("Bytescout.PDFExtractor.XLSExtractor") extractor.RegistrationName = "demo" extractor.RegistrationKey = "demo" WScript.Echo "Loading file from " & WScript.Arguments.Item(0) ' Load sample PDF document extractor.LoadDocumentFromFile WScript.Arguments.Item(0) WScript.Echo "Saving file to " & WScript.Arguments.Item(1) extractor.SaveToXLSFile WScript.Arguments.Item(1) WScript.Echo "Success: Data has been extracted to '" & WScript.Arguments.
REM running the VBS through the command line cscript.exe PDFToXLS-CommandLine.vbs "../../sample3.pdf" "output.xls" pause