In this article, we’ll review how to parse PDF invoices and get result data in CSV format using ByteScout Document Parser SDK and SharePoint.
Basically, We’ll be following these steps.
We won’t be going into macro-level details of how to create a SharePoint extension with Visual Studio. Instead, we’ll be focusing on the code.
The full source code for this sample can be found in this GitHub repository.
Let’s dive into a demo of the end result. The following screencast demonstrates it all.
Now, let’s review the source code. In the next section, we’ll go through the analysis of the source code.
using Microsoft.SharePoint; using Newtonsoft.Json.Linq; using System; using System.Globalization; using System.IO; using System.Net; using System.Web.UI; using System.Web.UI.WebControls; using System.Web.UI.WebControls.WebParts; using ByteScout.DocumentParser; using System.Text; namespace ExtractDataWebPart.VisualWebPart1 { ////// Extract data from PDF invoices using PDF.co Document /// Parser (and its default invoice parser template) /// on a SharePoint folder and then put them back /// as CSV files on the same SharePoint folder. /// public partial class VisualWebPart1UserControl : UserControl { public SPWeb CurrentWeb { get; set; } // Destination PDF file name const string DestinationLibName = "Shared Documents"; protected void Page_Load(object sender, EventArgs e) { } protected void StartButton_Click(object sender, EventArgs e) { //string DestinationLibName = FolderTextBox.Text; SPSite site = SPContext.Current.Site; SPWeb web = CurrentWeb; SPSecurity.RunWithElevatedPrivileges(delegate () { using (SPSite ElevatedSite = new SPSite(site.ID)) { using (SPWeb ElevatedWeb = ElevatedSite.OpenWeb(web.ID)) { ConvertDocuments(ElevatedWeb); } } }); LogTextBox.Text += "\n"; LogTextBox.Text += "Done...\n"; } private void ConvertDocuments(SPWeb web) { try { var spLibrary = web.Folders[DestinationLibName]; var spfileColl = spLibrary.Files; foreach (SPFile file in spfileColl) { string inputDocument = file.Name; // Create InvoiceParser instance using (DocumentParser documentParser = new DocumentParser("demo", "demo")) { // Add an internal generic template for typical invoices. // Note, if it does not parse all required fields, you should create // own template using Template Editor application. documentParser.AddTemplate("internal://invoice"); LogTextBox.Text += $"Parsing \"{inputDocument}\"..."; Console.WriteLine(); // Parse document data in JSON format string ret = documentParser.ParseDocument(file.OpenBinaryStream(), OutputFormat.CSV); // Display parsed data in console LogTextBox.Text += "Parsing results in CSV format:"; LogTextBox.Text += ret; var DestinationFile = inputDocument.Split('.')[0] + ".csv"; SaveToSharePoint(ret, DestinationFile); LogTextBox.Text += String.Format("Generated CSV file saved as \"{0}\\{1}\" file. \n", DestinationLibName, DestinationFile); } } } catch (Exception ex) { LogTextBox.Text += ex.ToString() + " \n"; } } private void SaveToSharePoint(string data, string DestinationFile) { byte[] bytes = Encoding.ASCII.GetBytes(data); //Upload file to SharePoint document linrary //Read create stream using (MemoryStream stream = new MemoryStream(bytes)) { //Get handle of library SPFolder spLibrary = CurrentWeb.Folders[DestinationLibName]; //Replace existing file var replaceExistingFile = true; //Upload document to library SPFile spfile = spLibrary.Files.Add(DestinationFile, stream, replaceExistingFile); spLibrary.Update(); } } } }
We can divide the program into the following logical steps:
The entry point for this execution is the event “StartButton_Click”. As the name suggests, this event gets called on click of the Start button.
In the Start Button click event, basically, we’re creating a Site object and finally executing the “ConvertDocuments” method.
At the start of the program, we’re iterating through all files and processing them individually.
var spLibrary = web.Folders[DestinationLibName]; var spfileColl = spLibrary.Files; foreach (SPFile file in spfileColl) {
Then, we’re creating an instance of ByteScout Document Parser SDK. For this sample, we’re using demo keys which give output with a watermark. In production, you should replace it with your License keys which you receive upon the license purchase.
// Create InvoiceParser instance using (DocumentParser documentParser = new DocumentParser("demo", "demo")) {
Now that we have the DocumentParser object initialized, it’s time to provide a template used to parse data.
// Add an internal generic template for typical invoices. // Note, if it does not parse all required fields, you should create // own template using Template Editor application. documentParser.AddTemplate("internal://invoice");
In this case, we’re using a generic template for invoice parsing. However, you can create your own template also.
Next, we’re invoking the ParseDocument method, with an argument specifying CSV output type. It’ll return a string containing invoice data in CSV format. Easy!
// Parse document data in CSV format string ret = documentParser.ParseDocument(file.OpenBinaryStream(), OutputFormat.CSV);
Finally, we’re storing the output in CSV file format at the destination path.
var DestinationFile = inputDocument.Split('.')[0] + ".csv"; SaveToSharePoint(ret, DestinationFile);
That’s all! It’s that easy and convenient to integrate ByteScout Document Parser SDK into SharePoint. This SDK works seamlessly with Scanned Documents and images too!
See you onboard!