ByteScout’s updated PDF Extractor SDK. Here are the 3 most important features this version 5.00.1626:
- Advanced text search (with support for regular expressions, word matching options and more).
- Image to text support (OCR – Optical Character Recognition, includes support for English, German, Spanish and other languages).
- Special mode to repair damaged text (when PDF shows correct text but copies damaged text – this is caused by some pdf generators).
Simply put, the OCR supporting is added in the PDF Extractor SDK 5.00.162. For example, you can take a PDF with a scanned picture and the SDK will be able to extract and recognize the text of your contract (of course it slows down the process a little).
The third feature (repair damaged text) refers to some PDF files that are printed and looked commonly, however, when you try to copy some text from there you get gibberish only and searching does not work. So, the ByteScout PDF Extractor SDK in the repair text mode can recover text from such a PDF file for searching or indexing (for example, to take out text of the pdf file and to insert it to the database).