Humans understand the image and its content by merely looking at it. Machines do not work the same way. It needs something more tangible, organized to understand, and give output. Optical Character Recognition (OCR) is the process, which helps the computer to understand the images. It enables the computer to recognize car plates using a traffic camera. Ocr kicks in to convert handwritten documents into a digital copy. The primary objective is to makes it a lot easier and faster for some people to do their jobs.
Optical Character Recognition is the process that detects the text content on images and translates the images into encoded text that the computer can easily understand. It scans the image, text, and graphic elements and converts them into a bitmap, a black and white dots matrix.
The image gets pre-processed afterward, where the brightness and contrast are adjusted to improve the process’s precision. OCR is not 100% precise, as it needs user/programmer involvement to correct a few elements missed in the scanning process. Natural Language Processing (NLP) is used to achieve error correction.
In python, Optical Character Recognition is achievable by using two different methods.
It is essential to comprehend how one reads and stores images on our machines before proceeding further—every Image forms by merging small but square boxes, known as pixels.
The computer saves images in the form of a matrix of different numbers. The dimension of the matrix depends on the number of pixels according to the picture. Let’s suppose the previous image’s dimensions are 280 x 200 or (h x w). The dimensions are the number of pixels an image consists of height and width. (Height x width)
Pixel values represent the brightness and intensity of the picture, ranges between 0 – 255. 0 illustrates the black color, and 255 denotes white, respectively.
import pandas as pd import numpy as np import matplotlib.pyplot as plt from skimage.io import imread, imshow image = imread('MUFC.jpg') Image.shape, image imshow(image)
Image.shape () function extracts the matrix values of the image.
Imread () function reads the image from the path.
import pandas as pd import numpy as np import matplotlib.pyplot as plt from skimage.io import imread, imshow image = imread('MUFC.jpg' as_gray = true) Grey_image = image imshow(grey_image)
Imread (destination, parameter) = as_gray is a sub-function that allows it to convert the picture in Black and White mode if the value is actual.
Tesseract library in python is an optical character recognition (OCR) tool. It helps recognize and read the text embedded in images. Tesseract works as a stand-alone script, as it supports all image types sustained by the Pillow and Leptonica libraries, including all formats as jpeg, png, gif, BMP, tiff, and others. Suppose used as a script, PyTesseract prints the documented text instead of writing it to a file.
Python libraries are always the easiest to set up. It is usually the one-step if the user is aware of PIP. To use PyTesseract, the user needs two things:
Pip install PyTesseract
Create the directory and initiate the project.
$ mkdir ocr_server && cd ocr_server && pipenv install --two
The user creates a primary function, which takes input from the user as an image and returns it in the text form.
try: from PIL import Image except ImportError: import Image import pytesseract def ocr_core(filename): """ This function will handle the core OCR processing of images. """ text = pytesseract.image_to_string(Image.open(filename)) return text print(ocr_core('images/ocr_example_1.png'))
The function is quite simple, as in the initial five lines, the user is taking an image as an input from the Pillow library and PyTesseract library.
Attached picture in the code:
The user then creates an ocr_core function. It inputs a file name and returns the text contained in the image.
The result of this code is:
OCR script works 100% on the digital text because it was elementary since this is digital text, picture-perfect and precise, unlike handwriting.
A handwritten note is input this time on the OCR PyTesseract. Let’s see the results.
Note:
Output:
The output of this note is as below:
Ad oviling writl
As it is evident that OCR may not entirely extract text from handwriting as it did with other images shown in the examples mentioned above.
The Tesseract engine may extract information about the orientation of the text in an image and variation. The orientation is a figure of the engine’s precision about the orientation identified to act as a guide. The script section represents the confidence marker also follows the writing system used in the text.
Tesseract allows the user to detect the language. It is a built-in function in the form of a flag.
pytesseract.image_to_string(Image.open(filename), lang='ita')
Please refer to the below-mentioned example without the language flag:
Output:
Ques allarm e local solo in so seg one cend 911
Without the flag, it is reading the text in the English language and trying to decrypt it. Hence, it has shown the word’s output, which the compiler could extract in the English language.
Now, refer to the below-mentioned example without the language flag:
Output:
Questo allarme è locale solo in caso di segnalazione di incendio 911
Without the language flag (), the OCR script missed some Italian words. As after leading the flag, it was able to detect all the Italian content. The translation is not yet possible, but this is still notable. Tesseract’s official documentation includes the supported languages.
Through Tesseract and the Python-Tesseract library, users have been able to photograph images and receive output in the form of text. It is Optical Character Recognition, and it can be of boundless use in many situations.
In the examples mentioned above, the user has built a scanner. It inputs an image, returns the text in the pictorial, and integrates it into an interface. It enables a user to render the functionality in a more familiar medium and in a way that can serve various individuals simultaneously.