ByteScout Cloud API Server - Document Parser API - Python - Parse From Url - ByteScout

ByteScout Cloud API Server – Document Parser API – Python – Parse From Url

  • Home
  • /
  • Articles
  • /
  • ByteScout Cloud API Server – Document Parser API – Python – Parse From Url

How to parse from url for document parser API in Python and ByteScout Cloud API Server

Learn to write code parse from url for document parser API in Python: Simple How To Tutorial

Writing of the code to parse from url in Python can be done by developers of any level using ByteScout Cloud API Server. ByteScout Cloud API Server was designed to assist document parser API in Python. ByteScout Cloud API Server is API server that is ready to use and can be installed and deployed in less than 30 minutes on your own Windows server or server in a cloud. It can save data and files on your local server-based file storage or in Amazon AWS S3 storage. Data is processed solely on the API server and is powered by ByteScout engine, no cloud services or Internet connection is required for data processing..

If you want to speed up the application’s code writing then Python code samples for Python developers help to implement using ByteScout Cloud API Server. This Python sample code can be used by copying and pasting into your project. Once done,just compile your project and click Run. Want to see how it works with your data then code testing will allow the function to be tested and work properly.

Trial version of ByteScout is available for free download from our website. This and other source code samples for Python and other programming languages are available.

On-demand (REST Web API) version:
 Web API (on-demand version)

On-premise offline SDK for Windows:
 60 Day Free Trial (on-premise)

MultiPageTable-template1.yml
      
--- # Template that demonstrates parsing of multi-page table using only # regular expressions for the table start, end, and rows. # If regular expression cannot be written for every table row (for example, # if the table contains empty cells), try the second method demonstrated # in 'MultiPageTable-template2.yml' template. templateVersion: 2 templatePriority: 0 sourceId: Multipage Table Test detectionRules: keywords: - Sample document with multi-page table fields: total: expression: TOTAL {{DECIMAL}} tables: - name: table1 start: # regular expression to find the table start in document expression: Item\s+Description\s+Price\s+Qty\s+Extended Price end: # regular expression to find the table end in document expression: TOTAL\s+\d+\.\d\d row: # regular expression to find table rows expression: '^\s*(?<itemNo>\d+)\s+(?<description>.+?)\s+(?<price>\d+\.\d\d)\s+(?<qty>\d+)\s+(?<extPrice>\d+\.\d\d)' columns: - name: itemNo type: integer - name: description type: string - name: price type: decimal - name: qty type: integer - name: extPrice type: decimal multipage: true

ON-PREMISE OFFLINE SDK

60 Day Free Trial or Visit ByteScout Cloud API Server Home Page

Explore ByteScout Cloud API Server Documentation

Explore Samples

Sign Up for ByteScout Cloud API Server Online Training

ON-DEMAND REST WEB API

Get Your API Key

Explore Web API Docs

Explore Web API Samples

ParseFromUrl.py
      
import os import requests # pip install requests # Please NOTE: In this sample we're assuming Cloud Api Server is hosted at "https://localhost". # If it's not then please replace this with with your hosting url. # Base URL for PDF.co Web API requests BASE_URL = "https://localhost" # Source PDF file url SourceFileUrl = "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/document-parser/MultiPageTable.pdf" # Destination JSON file name DestinationFile = ".\\result.json" # Template text. Use Document Parser SDK (https://bytescout.com/products/developer/documentparsersdk/index.html) # to create templates. # Read template from file: file_read = open(".\\MultiPageTable-template1.yml", mode='r', encoding="utf-8",errors="ignore") Template = file_read.read() file_read.close() def main(args = None): PerformDocumentParser(SourceFileUrl, Template, DestinationFile) def PerformDocumentParser(uploadedFileUrl, template, destinationFile): # Content data = { 'url': uploadedFileUrl, 'template': template } # Prepare URL for 'Document Parser' API request url = "{}/pdf/documentparser".format(BASE_URL) # Execute request and get response as JSON response = requests.post(url, data= data) if (response.status_code == 200): json = response.json() if json["error"] == False: # Get URL of result file resultFileUrl = json["url"] # Download result file r = requests.get(resultFileUrl, stream=True) if (r.status_code == 200): with open(destinationFile, 'wb') as file: for chunk in r: file.write(chunk) print(f"Result file saved as \"{destinationFile}\" file.") else: print(f"Request error: {response.status_code} {response.reason}") else: # Show service reported error print(json["message"]) else: print(f"Request error: {response.status_code} {response.reason}") if __name__ == '__main__': main()

ON-PREMISE OFFLINE SDK

60 Day Free Trial or Visit ByteScout Cloud API Server Home Page

Explore ByteScout Cloud API Server Documentation

Explore Samples

Sign Up for ByteScout Cloud API Server Online Training

ON-DEMAND REST WEB API

Get Your API Key

Explore Web API Docs

Explore Web API Samples

VIDEO

ON-PREMISE OFFLINE SDK

60 Day Free Trial or Visit ByteScout Cloud API Server Home Page

Explore ByteScout Cloud API Server Documentation

Explore Samples

Sign Up for ByteScout Cloud API Server Online Training

ON-DEMAND REST WEB API

Get Your API Key

Explore Web API Docs

Explore Web API Samples

Tutorials:

prev
next