In this tutorial, we will show you how to use PDF Extractor SDK to perform multiple PDF activities in C# programming.
PDF Extractor SDK is a complete toolkit of enhanced PDF and image extractor engines in C# and VB.NET. You can quickly customize this SDK in your app allowing you to extract any data from your PDF document automatically.
In this brief guide, we will cover the following features of PDF Extractor SDK in C#:
The source code snippet can be used to easily detect tables in PDF and extract them into a CSV file using PDF Extractor SDK in C#.
Just copy-paste the following C# source code to see the program in action.
using System; using Bytescout.PDFExtractor; namespace ExtractTextByPages { class Program { static void Main(string[] args) { // Create Bytescout.PDFExtractor.TextExtractor instance CSVExtractor extractor = new CSVExtractor(); extractor.RegistrationName = "demo"; extractor.RegistrationKey = "demo"; TableDetector tdetector = new TableDetector(); tdetector.RegistrationKey = "demo"; tdetector.RegistrationName = "demo"; // Load sample PDF document extractor.LoadDocumentFromFile("sample3.pdf"); tdetector.LoadDocumentFromFile("sample3.pdf"); // Get page count int pageCount = tdetector.GetPageCount(); for (int i = 0; i < pageCount; i++) { int j = 1; // find first table and continue if found if (tdetector.FindTable(i)) do { // set extraction area for CSV extractor to rectangle given by table detector extractor.SetExtractionArea(tdetector.GetFoundTableRectangle_Left(), tdetector.GetFoundTableRectangle_Top(), tdetector.GetFoundTableRectangle_Width(), tdetector.GetFoundTableRectangle_Height() ); // and finally save the table into CSV file extractor.SavePageCSVToFile(i, "page-" + i + "-table-" + j + ".csv"); j++; } while (tdetector.FindNextTable()); // search next table } // Open first output file in default associated application System.Diagnostics.Process.Start("page-0-table-1.csv"); } } }
Check out the source code snippet here to make searchable PDF in C# with the help of ByteScout PDF Extractor SDK.
using Bytescout.PDFExtractor; // To make OCR work you should add to your project references to Bytescout.PDFExtractor.dll and Bytescout.PDFExtractor.OCRExtension.dll namespace MakeSearchablePDF { class Program { static void Main(string[] args) { // Create Bytescout.PDFExtractor.TextExtractor instance SearchablePDFMaker searchablePDFMaker = new SearchablePDFMaker(); searchablePDFMaker.RegistrationName = "demo"; searchablePDFMaker.RegistrationKey = "demo"; // Load sample PDF document searchablePDFMaker.LoadDocumentFromFile("sample_ocr.pdf"); // Set the location of "tessdata" folder containing language data files searchablePDFMaker.OCRLanguageDataFolder = @"c:\Program Files\Bytescout PDF Extractor SDK\Redistributable\net2.00\tessdata\"; // Set OCR language searchablePDFMaker.OCRLanguage = "eng"; // "eng" for english, "deu" for German, "fra" for French, "spa" for Spanish etc - according to files in /tessdata // Set PDF document rendering resolution searchablePDFMaker.OCRResolution = 300; // Save extracted text to file searchablePDFMaker.MakePDFSearchable("output.pdf"); // Open output file in default associated application System.Diagnostics.Process.Start("output.pdf"); } } }
If you need to make PDF unsearchable, you can follow this step-by-step tutorial.
Use the next code snippet if you need to split PDF by keywords in C# programming using PDF Extractor SDK. The sample code below can be copy-pasted for PDF splitting based on any keywords.
using Bytescout.PDFExtractor; using System.IO; namespace FindAndExtractPageExample { class Program { static void Main(string[] args) { string inputFile = "sample.pdf"; string keyword = "demographic"; TextExtractor extractor = new TextExtractor("demo", "demo"); extractor.LoadDocumentFromFile(inputFile); // Search each page for keyword for (int i = 0; i < extractor.GetPageCount(); i++) { if (extractor.Find(i, keyword, false)) { // extract the page containing the keyword ExtractPage(inputFile, i, "page" + i + ".pdf"); } } } private static void ExtractPage(string inputFile, int pageIndex, string outputFile) { DocumentSplitter splitter = new DocumentSplitter("demo", "demo"); if (pageIndex == 0) { if (splitter.GetPageCount(inputFile) == 1) { // no splitting required if there is the only page File.Copy(inputFile, outputFile); } else { // split at the second page (page numeration starts from 1 in this function). // the first part will be our sought-for 1-page document. splitter.Split(inputFile, outputFile, "waste", 2); File.Delete("waste"); // delete the waste part } } else { if (pageIndex == splitter.GetPageCount(inputFile) - 1) { // if this is the last page, just split on it. // the second part will be our sought-for 1-page document. splitter.Split(inputFile, "waste", outputFile, pageIndex + 1); File.Delete("waste"); // delete the waste part } else { // if the required page is in the middle of the document, we need two split operations: splitter.Split(inputFile, "waste", "part", pageIndex + 1); File.Delete("waste"); splitter.Split("part", outputFile, "waste", 2); File.Delete("part"); File.Delete("waste"); } } } } }
PDF Extract SDK can rotate a PDF file by degrees in C#, VB.NET, and ASP.NET. If you need to rotate your document with no hassle, just copy-paste the code snippet below into your project.
using System.Diagnostics; using Bytescout.PDFExtractor; namespace RotateDocument { class Program { static void Main(string[] args) { string inputFile = "sample1.pdf"; using (DocumentRotator rotator = new DocumentRotator("demo", "demo")) { rotator.Rotate(inputFile, "result.pdf", RotationAngle.Deg90); } Process.Start("result.pdf"); } } }
This source code snippet can be useful if you need a quick image extraction from PDF. Just copy-paste it into your C# project and speed up the whole process.
using System; using System.Drawing.Imaging; using Bytescout.PDFExtractor; namespace ExtractAllImages { class Program { static void Main(string[] args) { // Create Bytescout.PDFExtractor.ImageExtractor instance ImageExtractor extractor = new ImageExtractor(); extractor.RegistrationName = "demo"; extractor.RegistrationKey = "demo"; // Load sample PDF document extractor.LoadDocumentFromFile("sample1.pdf"); int i = 0; // Initialize image enumeration if (extractor.GetFirstImage()) { do { string outputFileName = "image" + i + ".png"; // Save image to file extractor.SaveCurrentImageToFile(outputFileName, ImageFormat.Png); i++; } while (extractor.GetNextImage()); // Advance image enumeration } // Open first output file in default associated application System.Diagnostics.Process.Start("image0.png"); } } }
Find the source code snippet below to merge PDF files in C# using ByteScout PDF Extractor SDK.
using System.Diagnostics; using Bytescout.PDFExtractor; namespace MergeDocuments { class Program { static void Main(string[] args) { string[] inputFiles = new string[] {"sample1.pdf", "sample2.pdf", "sample3.pdf"}; using (DocumentMerger merger = new DocumentMerger("demo", "demo")) { merger.Merge(inputFiles, "result.pdf"); } Process.Start("result.pdf"); } } }
You can also check out the live demo showing how to merge PDF files using PDF Extractor SDK.
These are just a few uses of powerful PDF Extractor toolkit for C# programming. If you’d like to learn more, don’t hesitate to check our SDK documentation.