ByteScout PDF Extractor SDK - C# - PDF To XML With Images - ByteScout

ByteScout PDF Extractor SDK – C# – PDF To XML With Images

  • Home
  • /
  • Articles
  • /
  • ByteScout PDF Extractor SDK – C# – PDF To XML With Images

PDF to XML with images in C# using ByteScout PDF Extractor SDK

How To: tutorial on PDF to XML with images in C#

ByteScout tutorials explain the material for programmers who use C#. ByteScout PDF Extractor SDK was made to help with PDF to XML with images in C#. ByteScout PDF Extractor SDK is the SDK that helps developers to extract data from unstructured documents, pdf, images, scanned and electronic forms. Includes AI functions like automatic table detection, automatic table extraction and restructuring, text recognition and text restoration from pdf and scanned documents. Includes PDF to CSV, PDF to XML, PDF to JSON, PDF to searchable PDF functions as well as methods for low level data extraction.

Fast application programming interfaces of ByteScout PDF Extractor SDK for C# plus the instruction and the C# code below will help you quickly learn PDF to XML with images. In order to implement this functionality, you should copy and paste code below into your app using code editor. Then compile and run your application. Enjoy writing a code with ready-to-use sample C# codes to implement PDF to XML with images using ByteScout PDF Extractor SDK.

ByteScout PDF Extractor SDK is available as free trial. You may get it from our website along with all other source code samples for C# applications.

Try it today: Get 60 Day Free Trial or sign up for Web API

PDF2XML.NETCore.csproj
      
<?xml version="1.0" encoding="utf-8"?> <Project Sdk="Microsoft.NET.Sdk"> <PropertyGroup> <OutputType>Exe</OutputType> <TargetFramework>netcoreapp2.0</TargetFramework> <EnableDefaultCompileItems>false</EnableDefaultCompileItems> <GenerateAssemblyCompanyAttribute>false</GenerateAssemblyCompanyAttribute> <GenerateAssemblyConfigurationAttribute>false</GenerateAssemblyConfigurationAttribute> <GenerateAssemblyFileVersionAttribute>false</GenerateAssemblyFileVersionAttribute> <GenerateAssemblyInformationalVersionAttribute>false</GenerateAssemblyInformationalVersionAttribute> <GenerateAssemblyProductAttribute>false</GenerateAssemblyProductAttribute> <GenerateAssemblyTitleAttribute>false</GenerateAssemblyTitleAttribute> <GenerateAssemblyVersionAttribute>false</GenerateAssemblyVersionAttribute> <GenerateAssemblyCopyrightAttribute>false</GenerateAssemblyCopyrightAttribute> <GenerateAssemblyTrademarkAttribute>false</GenerateAssemblyTrademarkAttribute> <GenerateAssemblyCultureAttribute>false</GenerateAssemblyCultureAttribute> <GenerateAssemblyDescriptionAttribute>false</GenerateAssemblyDescriptionAttribute> </PropertyGroup> <ItemGroup> <Compile Include="Program.cs" /> <Compile Include="Properties\AssemblyInfo.cs" /> <None Include="..\..\sample1.pdf" Link="sample1.pdf"> <CopyToOutputDirectory>Always</CopyToOutputDirectory> </None> </ItemGroup> <ItemGroup> <PackageReference Include="Microsoft.Windows.Compatibility" Version="2.0.0" /> </ItemGroup> <ItemGroup> <Reference Include="Bytescout.PDFExtractor"> <SpecificVersion>False</SpecificVersion> <HintPath>c:\Program Files\Bytescout PDF Extractor SDK\netcoreapp2.0\Bytescout.PDFExtractor.dll</HintPath> </Reference> <Reference Include="Bytescout.PDFExtractor.OCRExtension"> <SpecificVersion>False</SpecificVersion> <HintPath>c:\Program Files\Bytescout PDF Extractor SDK\netcoreapp2.0\Bytescout.PDFExtractor.OCRExtension.dll</HintPath> </Reference> </ItemGroup> </Project>

Try it today: Get 60 Day Free Trial or sign up for Web API

PDF2XML.VS2005.csproj
      
<Project DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003"> <PropertyGroup> <Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration> <Platform Condition=" '$(Platform)' == '' ">AnyCPU</Platform> <ProductVersion>8.0.50727</ProductVersion> <SchemaVersion>2.0</SchemaVersion> <ProjectGuid>{5FD39006-92D8-4ACC-B2EB-A43ECBC67F99}</ProjectGuid> <OutputType>Exe</OutputType> <AppDesignerFolder>Properties</AppDesignerFolder> <RootNamespace>PDF2XML</RootNamespace> <AssemblyName>PDF2XML</AssemblyName> </PropertyGroup> <PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Debug|AnyCPU' "> <DebugSymbols>true</DebugSymbols> <DebugType>full</DebugType> <Optimize>false</Optimize> <OutputPath>bin\Debug\</OutputPath> <DefineConstants>DEBUG;TRACE</DefineConstants> <ErrorReport>prompt</ErrorReport> <WarningLevel>4</WarningLevel> </PropertyGroup> <PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|AnyCPU' "> <DebugType>pdbonly</DebugType> <Optimize>true</Optimize> <OutputPath>bin\Release\</OutputPath> <DefineConstants>TRACE</DefineConstants> <ErrorReport>prompt</ErrorReport> <WarningLevel>4</WarningLevel> </PropertyGroup> <ItemGroup> <Reference Include="Bytescout.PDFExtractor, Version=1.10.0.73, Culture=neutral, PublicKeyToken=f7dd1bd9d40a50eb, processorArchitecture=MSIL"> <SpecificVersion>False</SpecificVersion> </Reference> <Reference Include="System" /> <Reference Include="System.Data" /> <Reference Include="System.Xml" /> </ItemGroup> <ItemGroup> </ItemGroup> <ItemGroup> <Compile Include="Program.cs" /> <Compile Include="Properties\AssemblyInfo.cs" /> </ItemGroup> <ItemGroup> <Content Include="..\..\sample1.pdf"> <Link>sample1.pdf</Link> <CopyToOutputDirectory>Always</CopyToOutputDirectory> </Content> </ItemGroup> <Import Project="$(MSBuildBinPath)\Microsoft.CSharp.targets" /> <!-- To modify your build process, add your task inside one of the targets below and uncomment it. Other similar extension points exist, see Microsoft.Common.targets. <Target Name="BeforeBuild"> </Target> <Target Name="AfterBuild"> </Target> --> </Project>

Try it today: Get 60 Day Free Trial or sign up for Web API

PDF2XML.VS2008.csproj
      
<Project DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003" ToolsVersion="3.5"> <PropertyGroup> <Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration> <Platform Condition=" '$(Platform)' == '' ">AnyCPU</Platform> <ProductVersion>9.0.21022</ProductVersion> <SchemaVersion>2.0</SchemaVersion> <ProjectGuid>{5FD39006-92D8-4ACC-B2EB-A43ECBC67F99}</ProjectGuid> <OutputType>Exe</OutputType> <AppDesignerFolder>Properties</AppDesignerFolder> <RootNamespace>PDF2XML</RootNamespace> <AssemblyName>PDF2XML</AssemblyName> <OldToolsVersion>2.0</OldToolsVersion> <TargetFrameworkVersion>v3.5</TargetFrameworkVersion> </PropertyGroup> <PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Debug|AnyCPU' "> <DebugSymbols>true</DebugSymbols> <DebugType>full</DebugType> <Optimize>false</Optimize> <OutputPath>bin\Debug\</OutputPath> <DefineConstants>DEBUG;TRACE</DefineConstants> <ErrorReport>prompt</ErrorReport> <WarningLevel>4</WarningLevel> </PropertyGroup> <PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|AnyCPU' "> <DebugType>pdbonly</DebugType> <Optimize>true</Optimize> <OutputPath>bin\Release\</OutputPath> <DefineConstants>TRACE</DefineConstants> <ErrorReport>prompt</ErrorReport> <WarningLevel>4</WarningLevel> </PropertyGroup> <ItemGroup> <Reference Include="Bytescout.PDFExtractor, Version=1.10.0.73, Culture=neutral, PublicKeyToken=f7dd1bd9d40a50eb, processorArchitecture=MSIL"> <SpecificVersion>False</SpecificVersion> </Reference> <Reference Include="System" /> <Reference Include="System.Data" /> <Reference Include="System.Xml" /> </ItemGroup> <ItemGroup> </ItemGroup> <ItemGroup> <Compile Include="Program.cs" /> <Compile Include="Properties\AssemblyInfo.cs" /> </ItemGroup> <ItemGroup> <Content Include="..\..\sample1.pdf"> <Link>sample1.pdf</Link> <CopyToOutputDirectory>Always</CopyToOutputDirectory> </Content> </ItemGroup> <Import Project="$(MSBuildToolsPath)\Microsoft.CSharp.targets" /> <!-- To modify your build process, add your task inside one of the targets below and uncomment it. Other similar extension points exist, see Microsoft.Common.targets. <Target Name="BeforeBuild"> </Target> <Target Name="AfterBuild"> </Target> --> </Project>

Try it today: Get 60 Day Free Trial or sign up for Web API

PDF2XML.VS2010.csproj
      
<Project DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003" ToolsVersion="4.0"> <PropertyGroup> <Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration> <Platform Condition=" '$(Platform)' == '' ">AnyCPU</Platform> <ProductVersion> </ProductVersion> <SchemaVersion>2.0</SchemaVersion> <ProjectGuid>{5FD39006-92D8-4ACC-B2EB-A43ECBC67F99}</ProjectGuid> <OutputType>Exe</OutputType> <AppDesignerFolder>Properties</AppDesignerFolder> <RootNamespace>PDF2XML</RootNamespace> <AssemblyName>PDF2XML</AssemblyName> <OldToolsVersion>3.5</OldToolsVersion> <TargetFrameworkVersion>v4.0</TargetFrameworkVersion> </PropertyGroup> <PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Debug|AnyCPU' "> <DebugSymbols>true</DebugSymbols> <DebugType>full</DebugType> <Optimize>false</Optimize> <OutputPath>bin\Debug\</OutputPath> <DefineConstants>DEBUG;TRACE</DefineConstants> <ErrorReport>prompt</ErrorReport> <WarningLevel>4</WarningLevel> </PropertyGroup> <PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|AnyCPU' "> <DebugType>pdbonly</DebugType> <Optimize>true</Optimize> <OutputPath>bin\Release\</OutputPath> <DefineConstants>TRACE</DefineConstants> <ErrorReport>prompt</ErrorReport> <WarningLevel>4</WarningLevel> </PropertyGroup> <ItemGroup> <Reference Include="Bytescout.PDFExtractor, Version=1.10.0.73, Culture=neutral, PublicKeyToken=f7dd1bd9d40a50eb, processorArchitecture=MSIL"> <SpecificVersion>False</SpecificVersion> </Reference> <Reference Include="System" /> <Reference Include="System.Data" /> <Reference Include="System.Xml" /> </ItemGroup> <ItemGroup> </ItemGroup> <ItemGroup> <Compile Include="Program.cs" /> <Compile Include="Properties\AssemblyInfo.cs" /> </ItemGroup> <ItemGroup> <Content Include="..\..\sample1.pdf"> <Link>sample1.pdf</Link> <CopyToOutputDirectory>Always</CopyToOutputDirectory> </Content> </ItemGroup> <Import Project="$(MSBuildToolsPath)\Microsoft.CSharp.targets" /> <!-- To modify your build process, add your task inside one of the targets below and uncomment it. Other similar extension points exist, see Microsoft.Common.targets. <Target Name="BeforeBuild"> </Target> <Target Name="AfterBuild"> </Target> --> </Project>

Try it today: Get 60 Day Free Trial or sign up for Web API

Program.cs
      
using System; using Bytescout.PDFExtractor; namespace PDF2XML { class Program { static void Main(string[] args) { // Create Bytescout.PDFExtractor.XMLExtractor instance XMLExtractor extractor = new XMLExtractor(); extractor.RegistrationName = "demo"; extractor.RegistrationKey = "demo"; // Load sample PDF document extractor.LoadDocumentFromFile("sample1.pdf"); // Uncomment this line to get rid of empty nodes in XML //extractor.PreserveFormattingOnTextExtraction = false; // Set output image format extractor.ImageFormat = OutputImageFormat.PNG; // Save images to external files extractor.SaveImages = ImageHandling.OuterFile; extractor.ImageFolder = "images"; // Folder for external images extractor.SaveXMLToFile("result_with_external_images.xml"); // Embed images into XML as Base64 encoded string extractor.SaveImages = ImageHandling.Embed; extractor.SaveXMLToFile("result_with_embedded_images.xml"); // Cleanup extractor.Dispose(); } } }

Try it today: Get 60 Day Free Trial or sign up for Web API

MORE INFORMATION

Get 60 Day Free Trial or Visit ByteScout PDF Extractor SDK page

Explore ByteScout PDF Extractor SDK documentation

WEB API VERSION

Sign Up for free Web API key

Explore Web API Documentation

Tutorials:

prev
next