ByteScout PDF Extractor SDK - VB.NET - Find Text in PDF with Regex - ByteScout

ByteScout PDF Extractor SDK – VB.NET – Find Text in PDF with Regex

  • Home
  • /
  • Articles
  • /
  • ByteScout PDF Extractor SDK – VB.NET – Find Text in PDF with Regex

How to find text in PDF with regex in VB.NET with ByteScout PDF Extractor SDK

Write code in VB.NET to find text in PDF with regex with this step-by-step tutorial

Find text in PDF with regex is easy to implement in VB.NET if you use these source codes below. ByteScout PDF Extractor SDK is the SDK that helps developers to extract data from unstructured documents, pdf, images, scanned and electronic forms. Includes AI functions like automatic table detection, automatic table extraction and restructuring, text recognition and text restoration from pdf and scanned documents. Includes PDF to CSV, PDF to XML, PDF to JSON, PDF to searchable PDF functions as well as methods for low level data extraction. It can be used to find text in PDF with regex using VB.NET.

This code snippet below for ByteScout PDF Extractor SDK works best when you need to quickly find text in PDF with regex in your VB.NET application. In order to implement the functionality, you should copy and paste this code for VB.NET below into your code editor with your app, compile and run your application. Use of ByteScout PDF Extractor SDK in VB.NET is also explained in the documentation included along with the product.

Trial version of ByteScout PDF Extractor SDK can be downloaded for free from our website. It also includes source code samples for VB.NET and other programming languages.

Try it today: Get 60 Day Free Trial or sign up for Web API

FindText.VS2005.vbproj
      
<?xml version="1.0" encoding="utf-8"?> <Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003" DefaultTargets="Build"> <PropertyGroup> <Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration> <Platform Condition=" '$(Platform)' == '' ">AnyCPU</Platform> <ProductVersion>8.0.50727</ProductVersion> <SchemaVersion>2.0</SchemaVersion> <ProjectGuid>{EA267CB0-792B-4CBF-ACCC-7560A1451771}</ProjectGuid> <OutputType>Exe</OutputType> <AppDesignerFolder>Properties</AppDesignerFolder> <RootNamespace>FindText</RootNamespace> <AssemblyName>FindText</AssemblyName> </PropertyGroup> <PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Debug|AnyCPU' "> <DebugSymbols>true</DebugSymbols> <DebugType>full</DebugType> <Optimize>false</Optimize> <OutputPath>bin\Debug\</OutputPath> <DefineConstants>DEBUG,TRACE</DefineConstants> <ErrorReport>prompt</ErrorReport> <WarningLevel>4</WarningLevel> </PropertyGroup> <PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|AnyCPU' "> <DebugType>pdbonly</DebugType> <Optimize>true</Optimize> <OutputPath>bin\Release\</OutputPath> <DefineConstants>TRACE</DefineConstants> <ErrorReport>prompt</ErrorReport> <WarningLevel>4</WarningLevel> </PropertyGroup> <ItemGroup> <Import Include="Microsoft.VisualBasic" /> <Import Include="System" /> </ItemGroup> <ItemGroup> <Content Include="..\..\Invoice.pdf"> <Link>Invoice.pdf</Link> <CopyToOutputDirectory>Always</CopyToOutputDirectory> </Content> </ItemGroup> <ItemGroup> <Reference Include="Bytescout.PDFExtractor, Version=1.0.0.12, Culture=neutral, processorArchitecture=MSIL"> <SpecificVersion>False</SpecificVersion> </Reference> <Reference Include="System" /> <Reference Include="System.Data" /> <Reference Include="System.Drawing" /> <Reference Include="System.Xml" /> </ItemGroup> <ItemGroup> <Compile Include="Program.vb" /> <Compile Include="Properties\AssemblyInfo.vb" /> </ItemGroup> <Import Project="$(MSBuildBinPath)\Microsoft.VisualBasic.Targets" /> </Project>

Try it today: Get 60 Day Free Trial or sign up for Web API

FindText.VS2008.vbproj
      
<?xml version="1.0" encoding="utf-8"?> <Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003" DefaultTargets="Build" ToolsVersion="3.5"> <PropertyGroup> <Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration> <Platform Condition=" '$(Platform)' == '' ">AnyCPU</Platform> <ProductVersion>9.0.21022</ProductVersion> <SchemaVersion>2.0</SchemaVersion> <ProjectGuid>{EA267CB0-792B-4CBF-ACCC-7560A1451771}</ProjectGuid> <OutputType>Exe</OutputType> <AppDesignerFolder>Properties</AppDesignerFolder> <RootNamespace>FindText</RootNamespace> <AssemblyName>FindText</AssemblyName> <OldToolsVersion>2.0</OldToolsVersion> <TargetFrameworkVersion>v3.5</TargetFrameworkVersion> </PropertyGroup> <PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Debug|AnyCPU' "> <DebugSymbols>true</DebugSymbols> <DebugType>full</DebugType> <Optimize>false</Optimize> <OutputPath>bin\Debug\</OutputPath> <DefineConstants>DEBUG,TRACE</DefineConstants> <ErrorReport>prompt</ErrorReport> <WarningLevel>4</WarningLevel> </PropertyGroup> <PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|AnyCPU' "> <DebugType>pdbonly</DebugType> <Optimize>true</Optimize> <OutputPath>bin\Release\</OutputPath> <DefineConstants>TRACE</DefineConstants> <ErrorReport>prompt</ErrorReport> <WarningLevel>4</WarningLevel> </PropertyGroup> <ItemGroup> <Import Include="Microsoft.VisualBasic" /> <Import Include="System" /> </ItemGroup> <ItemGroup> <Content Include="..\..\Invoice.pdf"> <Link>Invoice.pdf</Link> <CopyToOutputDirectory>Always</CopyToOutputDirectory> </Content> </ItemGroup> <ItemGroup> <Reference Include="Bytescout.PDFExtractor, Version=1.0.0.12, Culture=neutral, processorArchitecture=MSIL"> <SpecificVersion>False</SpecificVersion> </Reference> <Reference Include="System" /> <Reference Include="System.Data" /> <Reference Include="System.Drawing" /> <Reference Include="System.Xml" /> </ItemGroup> <ItemGroup> <Compile Include="Program.vb" /> <Compile Include="Properties\AssemblyInfo.vb" /> </ItemGroup> <Import Project="$(MSBuildToolsPath)\Microsoft.VisualBasic.Targets" /> </Project>

Try it today: Get 60 Day Free Trial or sign up for Web API

FindText.VS2010.vbproj
      
<?xml version="1.0" encoding="utf-8"?> <Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003" DefaultTargets="Build" ToolsVersion="4.0"> <PropertyGroup> <Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration> <Platform Condition=" '$(Platform)' == '' ">AnyCPU</Platform> <ProductVersion> </ProductVersion> <SchemaVersion>2.0</SchemaVersion> <ProjectGuid>{EA267CB0-792B-4CBF-ACCC-7560A1451771}</ProjectGuid> <OutputType>Exe</OutputType> <AppDesignerFolder>Properties</AppDesignerFolder> <RootNamespace>FindText</RootNamespace> <AssemblyName>FindText</AssemblyName> <OldToolsVersion>3.5</OldToolsVersion> <TargetFrameworkVersion>v4.0</TargetFrameworkVersion> </PropertyGroup> <PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Debug|AnyCPU' "> <DebugSymbols>true</DebugSymbols> <DebugType>full</DebugType> <Optimize>false</Optimize> <OutputPath>bin\Debug\</OutputPath> <DefineConstants>DEBUG,TRACE</DefineConstants> <ErrorReport>prompt</ErrorReport> <WarningLevel>4</WarningLevel> </PropertyGroup> <PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|AnyCPU' "> <DebugType>pdbonly</DebugType> <Optimize>true</Optimize> <OutputPath>bin\Release\</OutputPath> <DefineConstants>TRACE</DefineConstants> <ErrorReport>prompt</ErrorReport> <WarningLevel>4</WarningLevel> </PropertyGroup> <ItemGroup> <Import Include="Microsoft.VisualBasic" /> <Import Include="System" /> </ItemGroup> <ItemGroup> <Content Include="..\..\Invoice.pdf"> <Link>Invoice.pdf</Link> <CopyToOutputDirectory>Always</CopyToOutputDirectory> </Content> </ItemGroup> <ItemGroup> <Reference Include="Bytescout.PDFExtractor, Version=1.0.0.12, Culture=neutral, processorArchitecture=MSIL"> <SpecificVersion>False</SpecificVersion> </Reference> <Reference Include="System" /> <Reference Include="System.Data" /> <Reference Include="System.Drawing" /> <Reference Include="System.Xml" /> </ItemGroup> <ItemGroup> <Compile Include="Program.vb" /> <Compile Include="Properties\AssemblyInfo.vb" /> </ItemGroup> <Import Project="$(MSBuildToolsPath)\Microsoft.VisualBasic.Targets" /> </Project>

Try it today: Get 60 Day Free Trial or sign up for Web API

Program.vb
      
Imports System.Drawing Imports Bytescout.PDFExtractor Class Program Friend Shared Sub Main(args As String()) ' Create Bytescout.PDFExtractor.TextExtractor instance Dim extractor As New TextExtractor() extractor.RegistrationName = "demo" extractor.RegistrationKey = "demo" ' Load sample PDF document extractor.LoadDocumentFromFile(".\Invoice.pdf") extractor.RegexSearch = True ' Enable the regular expressions Dim pageCount As Integer = extractor.GetPageCount() ' Search through pages For i As Integer = 0 To pageCount - 1 ' Search dates in format 12/31/1999 Dim regexPattern As String = "[0-9]{2}/[0-9]{2}/[0-9]{4}" ' See the complete regular expressions reference at https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx ' Search each page for the pattern If extractor.Find(i, regexPattern, False) Then Do Console.WriteLine("") Console.WriteLine(("Found on page " & i & " at location ") + extractor.FoundText.Bounds.ToString()) Console.WriteLine("") ' Iterate through each element in the found text For Each element As ISearchResultElement In extractor.FoundText.Elements Console.WriteLine(" Text: " + element.Text) Console.WriteLine(" Font is bold: " + element.FontIsBold.ToString()) Console.WriteLine(" Font is italic:" + element.FontIsItalic.ToString()) Console.WriteLine(" Font name: " + element.FontName) Console.WriteLine(" Font size:" + element.FontSize.ToString()) Console.WriteLine(" Font color:" + element.FontColor.ToString()) Console.WriteLine() Next Loop While extractor.FindNext() End If Next ' Cleanup extractor.Dispose() Console.WriteLine() Console.WriteLine("Press any key to continue...") Console.ReadLine() End Sub End Class

Try it today: Get 60 Day Free Trial or sign up for Web API

MORE INFORMATION

Get 60 Day Free Trial or Visit ByteScout PDF Extractor SDK page

Explore ByteScout PDF Extractor SDK documentation

WEB API VERSION

Sign Up for free Web API key

Explore Web API Documentation

Tutorials:

prev
next