Document Parser SDK power your app with the reading of tables, fields, and unstructured text
from PDF and scans.
Key Benefits
Decrease time-to-market by with easy to make and easy to use extraction templates
No developers are required to create templates
Built-in invoice auto-parsing template
Re-use the library of pre-made templates that you can adapt to your needs
AI-powered PDF classification engine is included
Available as Web API, on-prem API, and library
Technical Features
AI-powered advanced PDF extractor engine on the core engine developed by ByteScout and battle-tested on millions of documents;
No coding required Document Parser Template Editor (available as an online version and as offline desktop version);
Supports PDF, JPG, PNG, TIFF images as input;
Extracts multiple documents inside a single PDF file;
Reads multiple tables located on the same pages;
Supports malformed and damaged PDF documents;
ML-powered OCR with document cleaning preprocessing filters for improved text recognition quality;
OCR (image to text) supports English, Spanish, Dutch, German, French, and dozens of other languages supported;
Mixed OCR languages are supported;
Data output in JSON, YAML, XML and CSV formats;
Available as Web API, on-prem API Server, on-prem library .NET and ASP.NET. Also available as ActiveX/COM object (through .NET Interop wrapper) for using from Delphi, VC++, VB6, VBScript, JScript, and other languages;