ByteScout Cloud API Server is the ready to use Web API Server that can be deployed in less than 30 minutes into your own in-house server or into private cloud server. Can store data on in-house local server based storage or in Amazon AWS S3 bucket. Processing data solely on the server using buil-in ByteScout powered engine, no cloud services are used to process your data!.
On-demand (REST Web API) version:
Web API (on-demand version)
On-premise offline SDK for Windows:
60 Day Free Trial (on-premise)
---
# Template that demonstrates parsing of multi-page table using only
# regular expressions for the table start, end, and rows.
# If regular expression cannot be written for every table row (for example,
# if the table contains empty cells), try the second method demonstrated
# in `MultiPageTable-template2.yml` template.
templateVersion: 3
templatePriority: 0
sourceId: Multipage Table Test
detectionRules:
keywords:
- Sample document with multi-page table
fields:
total:
type: regex
expression: TOTAL {{DECIMAL}}
dataType: decimal
tables:
- name: table1
start:
# regular expression to find the table start in document
expression: Item\s+Description\s+Price\s+Qty\s+Extended Price
end:
# regular expression to find the table end in document
expression: TOTAL\s+\d+\.\d\d
row:
# regular expression to find table rows
expression: '^\s*(?<itemNo>\d+)\s+(?<description>.+?)\s+(?<price>\d+\.\d\d)\s+(?<qty>\d+)\s+(?<extPrice>\d+\.\d\d)'
columns:
- name: itemNo
type: integer
- name: description
type: string
- name: price
type: decimal
- name: qty
type: integer
- name: extPrice
type: decimal
multipage: true
import os
import requests # pip install requests
# Please NOTE: In this sample we're assuming Cloud Api Server is hosted at "https://localhost".
# If it's not then please replace this with with your hosting url.
# Base URL for PDF.co Web API requests
BASE_URL = "https://localhost"
# Source PDF file url
SourceFileUrl = "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/document-parser/MultiPageTable.pdf"
# Destination JSON file name
DestinationFile = ".\\result.json"
# Template text. Use Document Parser SDK (https://bytescout.com/products/developer/documentparsersdk/index.html)
# to create templates.
# Read template from file:
file_read = open(".\\MultiPageTable-template1.yml", mode='r', encoding="utf-8",errors="ignore")
Template = file_read.read()
file_read.close()
def main(args = None):
PerformDocumentParser(SourceFileUrl, Template, DestinationFile)
def PerformDocumentParser(uploadedFileUrl, template, destinationFile):
# Content
data = {
'url': uploadedFileUrl,
'template': template
}
# Prepare URL for 'Document Parser' API request
url = "{}/pdf/documentparser".format(BASE_URL)
# Execute request and get response as JSON
response = requests.post(url, data= data)
if (response.status_code == 200):
json = response.json()
if json["error"] == False:
# Get URL of result file
resultFileUrl = json["url"]
# Download result file
r = requests.get(resultFileUrl, stream=True)
if (r.status_code == 200):
with open(destinationFile, 'wb') as file:
for chunk in r:
file.write(chunk)
print(f"Result file saved as \"{destinationFile}\" file.")
else:
print(f"Request error: {response.status_code} {response.reason}")
else:
# Show service reported error
print(json["message"])
else:
print(f"Request error: {response.status_code} {response.reason}")
if __name__ == '__main__':
main()
See also:
Get Your API Key
See also: