ByteScout Cloud API Server is the ready to use Web API Server that can be deployed in less than 30 minutes into your own in-house server or into private cloud server. Can store data on in-house local server based storage or in Amazon AWS S3 bucket. Processing data solely on the server using buil-in ByteScout powered engine, no cloud services are used to process your data!.
On-demand (REST Web API) version:
Web API (on-demand version)
On-premise offline SDK for Windows:
60 Day Free Trial (on-premise)
--- # Template that demonstrates parsing of multi-page table using only # regular expressions for the table start, end, and rows. # If regular expression cannot be written for every table row (for example, # if the table contains empty cells), try the second method demonstrated # in `MultiPageTable-template2.yml` template. templateVersion: 3 templatePriority: 0 sourceId: Multipage Table Test detectionRules: keywords: - Sample document with multi-page table fields: total: type: regex expression: TOTAL {{DECIMAL}} dataType: decimal tables: - name: table1 start: # regular expression to find the table start in document expression: Item\s+Description\s+Price\s+Qty\s+Extended Price end: # regular expression to find the table end in document expression: TOTAL\s+\d+\.\d\d row: # regular expression to find table rows expression: '^\s*(?<itemNo>\d+)\s+(?<description>.+?)\s+(?<price>\d+\.\d\d)\s+(?<qty>\d+)\s+(?<extPrice>\d+\.\d\d)' columns: - name: itemNo type: integer - name: description type: string - name: price type: decimal - name: qty type: integer - name: extPrice type: decimal multipage: true
import os import requests # pip install requests # Please NOTE: In this sample we're assuming Cloud Api Server is hosted at "https://localhost". # If it's not then please replace this with with your hosting url. # Base URL for PDF.co Web API requests BASE_URL = "https://localhost" # Source PDF file url SourceFileUrl = "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/document-parser/MultiPageTable.pdf" # Destination JSON file name DestinationFile = ".\\result.json" # Template text. Use Document Parser SDK (https://bytescout.com/products/developer/documentparsersdk/index.html) # to create templates. # Read template from file: file_read = open(".\\MultiPageTable-template1.yml", mode='r', encoding="utf-8",errors="ignore") Template = file_read.read() file_read.close() def main(args = None): PerformDocumentParser(SourceFileUrl, Template, DestinationFile) def PerformDocumentParser(uploadedFileUrl, template, destinationFile): # Content data = { 'url': uploadedFileUrl, 'template': template } # Prepare URL for 'Document Parser' API request url = "{}/pdf/documentparser".format(BASE_URL) # Execute request and get response as JSON response = requests.post(url, data= data) if (response.status_code == 200): json = response.json() if json["error"] == False: # Get URL of result file resultFileUrl = json["url"] # Download result file r = requests.get(resultFileUrl, stream=True) if (r.status_code == 200): with open(destinationFile, 'wb') as file: for chunk in r: file.write(chunk) print(f"Result file saved as \"{destinationFile}\" file.") else: print(f"Request error: {response.status_code} {response.reason}") else: # Show service reported error print(json["message"]) else: print(f"Request error: {response.status_code} {response.reason}") if __name__ == '__main__': main()
See also:
Get Your API Key
See also: