ByteScout PDF Extractor SDK - C++ - Find Table And Extract As CSV from PDF - ByteScout

ByteScout PDF Extractor SDK – C++ – Find Table And Extract As CSV from PDF

  • Home
  • /
  • Articles
  • /
  • ByteScout PDF Extractor SDK – C++ – Find Table And Extract As CSV from PDF

How to find table and extract as CSV from PDF in C++ with ByteScout PDF Extractor SDK

This tutorial will show how to find table and extract as CSV from PDF in C++

Learn how to find table and extract as CSV from PDF in C++ with this source code sample. What is ByteScout PDF Extractor SDK? It is the SDK that helps developers to extract data from unstructured documents, pdf, images, scanned and electronic forms. Includes AI functions like automatic table detection, automatic table extraction and restructuring, text recognition and text restoration from pdf and scanned documents. Includes PDF to CSV, PDF to XML, PDF to JSON, PDF to searchable PDF functions as well as methods for low level data extraction. It can help you to find table and extract as CSV from PDF in your C++ application.

This code snippet below for ByteScout PDF Extractor SDK works best when you need to quickly find table and extract as CSV from PDF in your C++ application. In order to implement the functionality, you should copy and paste this code for C++ below into your code editor with your app, compile and run your application. Test C++ sample code examples whether they respond your needs and requirements for the project.

Free trial version of ByteScout PDF Extractor SDK is available for download from our website. Get it to try other source code samples for C++.

Try it today: Get 60 Day Free Trial or sign up for Web API

FindTableAndExtractAsCsv.cpp
      
#include "stdafx.h" #include "comip.h" #import "c:\\Program Files\\Bytescout PDF Extractor SDK\\net4.00\\Bytescout.PDFExtractor.tlb" raw_interfaces_only using namespace Bytescout_PDFExtractor; int _tmain(int argc, _TCHAR* argv[]) { // Initialize COM. HRESULT hr = CoInitializeEx(NULL, COINIT_APARTMENTTHREADED); // Create CSVExtractor instance _CSVExtractorPtr pICSVExtractor(__uuidof(CSVExtractor)); pICSVExtractor->put_RegistrationName(_bstr_t(L"DEMO")); pICSVExtractor->put_RegistrationKey(_bstr_t(L"DEMO")); // Create TableDetector instance _TableDetectorPtr pITableDetector(__uuidof(TableDetector)); pITableDetector->put_RegistrationName(_bstr_t(L"DEMO")); pITableDetector->put_RegistrationKey(_bstr_t(L"DEMO")); // Set table detection mode to "bordered tables" - best for tables with closed solid borders. pITableDetector->put_ColumnDetectionMode(ColumnDetectionMode_BorderedTables); // We should define what kind of tables we should detect. // So we set min required number of columns to 2 ... pITableDetector->put_DetectionMinNumberOfColumns(2); // ... and we set min required number of rows to 2 pITableDetector->put_DetectionMinNumberOfRows(2); // Load sample PDF document _bstr_t inputFile(L"..\\..\\sample3.pdf"); pICSVExtractor->LoadDocumentFromFile(inputFile); pITableDetector->LoadDocumentFromFile(inputFile); // Get page count long pageCount; pITableDetector->GetPageCount(&pageCount); for (int pageIndex = 0; pageIndex < pageCount; pageIndex++) { int t = 1; VARIANT_BOOL vbResult; // Find first table and continue if found pITableDetector->FindTable(pageIndex, &vbResult); if (vbResult == VARIANT_TRUE) { do { float left, top, width, height; pITableDetector->GetFoundTableRectangle_Left(&left); pITableDetector->GetFoundTableRectangle_Top(&top); pITableDetector->GetFoundTableRectangle_Width(&width); pITableDetector->GetFoundTableRectangle_Height(&height); // Set extraction area for CSV extractor to rectangle received from the table detector pICSVExtractor->SetExtractionArea(left, top, width, height); // Export the table to CSV file CString fileName; fileName.Format(L"page-%d-table-%d.csv", pageIndex, t); pICSVExtractor->SavePageCSVToFile(pageIndex, _bstr_t(fileName)); t++; pITableDetector->FindNextTable(&vbResult); } while (vbResult == VARIANT_TRUE); } } pICSVExtractor->Release(); pITableDetector->Release(); CoUninitialize(); return 0; }

Try it today: Get 60 Day Free Trial or sign up for Web API

FindTableAndExtractAsCsv.sln
      
Microsoft Visual Studio Solution File, Format Version 12.00 # Visual Studio 2013 VisualStudioVersion = 12.0.40629.0 MinimumVisualStudioVersion = 10.0.40219.1 Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "FindTableAndExtractAsCsv", "FindTableAndExtractAsCsv.vcxproj", "{74A23FC0-E323-4980-9363-41326A457785}" EndProject Global GlobalSection(SolutionConfigurationPlatforms) = preSolution Debug|Win32 = Debug|Win32 Release|Win32 = Release|Win32 EndGlobalSection GlobalSection(ProjectConfigurationPlatforms) = postSolution {74A23FC0-E323-4980-9363-41326A457785}.Debug|Win32.ActiveCfg = Debug|Win32 {74A23FC0-E323-4980-9363-41326A457785}.Debug|Win32.Build.0 = Debug|Win32 {74A23FC0-E323-4980-9363-41326A457785}.Release|Win32.ActiveCfg = Release|Win32 {74A23FC0-E323-4980-9363-41326A457785}.Release|Win32.Build.0 = Release|Win32 EndGlobalSection GlobalSection(SolutionProperties) = preSolution HideSolutionNode = FALSE EndGlobalSection EndGlobal

Try it today: Get 60 Day Free Trial or sign up for Web API

stdafx.cpp
      
// stdafx.cpp : source file that includes just the standard includes // CPPExample.pch will be the pre-compiled header // stdafx.obj will contain the pre-compiled type information #include "stdafx.h" // TODO: reference any additional headers you need in STDAFX.H // and not in this file

Try it today: Get 60 Day Free Trial or sign up for Web API

stdafx.h
      
// stdafx.h : include file for standard system include files, // or project specific include files that are used frequently, but // are changed infrequently // #pragma once #include "targetver.h" #include <stdio.h> #include <tchar.h> // TODO: reference additional headers your program requires here #include <atlstr.h>

Try it today: Get 60 Day Free Trial or sign up for Web API

targetver.h
      
#pragma once // Including SDKDDKVer.h defines the highest available Windows platform. // If you wish to build your application for a previous Windows platform, include WinSDKVer.h and // set the _WIN32_WINNT macro to the platform you wish to support before including SDKDDKVer.h. #include <SDKDDKVer.h>

Try it today: Get 60 Day Free Trial or sign up for Web API

MORE INFORMATION

Get 60 Day Free Trial or Visit ByteScout PDF Extractor SDK page

Explore ByteScout PDF Extractor SDK documentation

WEB API VERSION

Sign Up for free Web API key

Explore Web API Documentation

Tutorials:

prev
next