Our ByteScout SDK products are sunsetting as we focus on expanding new solutions.
Learn More
Important Update
ByteScout SDK Sunsetting Notice
Our ByteScout SDK products are sunsetting as we focus on our new & improved solutions.Thank you for being part of our journey, and we look forward to supporting you in this next chapter!
ByteScout team has released a set of new version updates and fixes for SDK and business tools.
ByteScout continues to fight against COVID-19 by offering our tools for rapid data extraction and document processing at NO CHARGE to any developers working on coronavirus prevention, analysis, treatment, research projects in hospitals, and non-profits.
Check the list of updated SDK for developers and business tools:
Added property ‘TextExtractor.FuzzySearch’ that enables a ‘fuzzy’ text search algorithm. It allows finding ‘approximately equal’ strings.
Added ‘DocumentSplitter2’ class that splits document by found text.
Added ‘CSVExtractor.NormalizeCSV’ property. It makes CSV data produced from different document pages contain the same number of columns.
Added property ‘JSONExtractor.OutputStructure’ that allows changing the structure of the generated JSON to one of the predefined variants for easier postprocessing.
Added property ‘JSONExtractor.OutputTransformation’ that allows applying JSONPath expression to the generated JSON.
Added property ‘OCRPageCount’ to extractor classes that contains a number of pages for which OCR was performed.
‘JSONExtractor’ and ‘XMLExtractor’ now add to the generated JSON and XML result the number of process pages and the number of pages for which OCR was performed.
Added property ‘OCRDetectLines’ to extractor classes that improve column detection in scanned documents.
Added property ‘ConsiderBackgroundColors’ to extractor classes that enables detection of background color under text objects. It may help to improve row and column detection in tables without borders but with color stripes.
Added properties ‘DocumentMerger.GenerateBookmarks’ and ‘DocumentMerger.BookmarkTitles’ to enable automatic generation of bookmarks pointing to the merged parts.
Improved PDF optimization in ‘DocumentSplitter’.
‘DocumentMerger’ now uses the first input document as the base for the merged document. This allows keeping document information properties and outlines.
DocumentMerger: added support for profiles.
MultimediaExtractor: added support for more media types.
‘TextExtractor.FindAll()’ method was ignoring the case sensitivity option.
Fixed issue with junk empty temporary files generated during OCR.
Added ‘fuzzy’ search algorithm to the ‘Find text’ tool.
Improved selection of OCR image preprocessing filters. It now allows to set filter parameters and change the order of filters.
OCR language selector now allows selecting multiple languages to process documents in mixed languages.
‘Extract as JSON’ and ‘Extract as XML’ tools now add to the generated JSON and XML result the number of process pages and the number of pages for which OCR was performed.
Added ‘Profile’ tab to tool property Windows. It contains generated JSON profiles to use in SDK, PDF.co web API, or API Server products.
‘Merge documents’ tool: implemented automatic generation of bookmarks pointing to the merged parts.
The selection cursor changed to ‘full-view cross’.
Improved PDF optimization in the ‘Split document’ tool.
Improved merging of PDF documents.
Improvements and fixes in the ‘Find text’ tool.
The functionality of the ‘Convert to multipage TIFF’ tool is moved to ‘Convert to bitmap’.
‘Embedded multimedia’ extraction tool now supports more media types.
Added ‘AltName’ property to form fields. It contains a fixed identifier (‘Name’) of the form fields where the ID missing, duplicated, or contains invalid characters. You can use the ‘AltName’ to retrieve the field from the ‘Document.Annotations’ collection in the same way as the original ‘Name’.
Added properties ‘ListBox.SelectedIndices’ and ‘ListBox.SelectedItems’ allowing to selected multiple items in ‘ListBox’ form field.
Improved editable and not-editable ‘ComboBox’ fields appearance.
Fixed selected items’ appearance in the ‘ListBox’ form fields.
Fixed value assignment in the ‘RadioButton’ form fields.
Fixed invisible values in form fields in some cases.
Fixed digital signature appearance.
Fixed profiles parsing on platforms with non-English locale.
Added optional ‘Description’ parameter to the template and objects.
Added ‘{{LineEnd2}}’ macro that also takes into account the end of the document.
Added ‘{{EndOfDocument}}’ macro.
Added ‘OcrResolution’ parameter to the template options.
Added property ‘GenerateTimestamp’.
Added elapsed time to the JSON, YAML, and XML parsing results.
Template Editor: Added elapsed time to the parsing preview window.
Text extraction performance improved for some PDF documents.
Improved parsing of regex tables with subitems: ‘subItemEnd’ expression is now optional, only ‘subItemStart’ expression is used to split row item to subitems.
Fixed XML serialization of multiple parsing results (from PDF file with multiple documents separated by ‘documentStart’ parameter).
ByteScout Team of WritersByteScout has a team of professional writers proficient in different technical topics. We select the best writers to cover interesting and trending topics for our readers. We love developers and we hope our articles help you learn about programming and programmers.