How to Improve OCR Accuracy using Image Preprocessing

How to Improve OCR Accuracy using Image Preprocessing

OCR stands for Optical Character Recognition. It is a process to convert document photos or scene photos into machine-encoded text. There are several tools available to implement OCR in your system, but the most popular and efficient tools are Tesseract OCR and Cloud Vision.

OCR Tool uses AI and Machine Learning as well as a trained custom model. Text Recognition depends on a variety of factors to produce good quality output. OCR output highly depends on the quality of the input image. This is why every OCR engine provides guidelines regarding the quality of the input image and its size. These guidelines help the OCR engine to produce accurate results.

Here, Image Preprocessing comes into play to improve the input image quality so that the OCR engine gives you accurate output. The following image shows processing operations to enhance the quality of your input image.

OCR Accuracy

As visible in the above flow, there are many other steps required to improve accuracy, but we will learn more about preprocessing in this article.

The main objective of the Pre-processing phase is to make it as straightforward as possible for the OCR system to distinguish a character/word from the background.

Before discussing these techniques, let’s understand how an OCR system comprehends an image. For an OCR system, an Image is a multidimensional array (2D array if the image is grayscale (or) binary, 3D array if the image is colored). Each cell in the matrix is called a pixel, and it can store an 8-bit integer, which means the pixel range is 0–255.

Some of the most basic and essential Preprocessing techniques are:-

  1. Image Scaling
  2. Binarization
  3. De-skew (Skew Correction)

Let’s go through each preprocessing technique mentioned above one-by-one

1. Image Scaling

Image Resizing is essential for image analysis. The OCR engine gives an accurate output of the image, which has 300 DPI. DPI describes the resolution of the image or, in other words, it denotes printed dots per inch.

Image formatting is essential for image analysis. Different images may have other formats, which may affect OCR results. You can get more ideas about image format by studying “System.Drawing.Imaging.PixelFormat” options. Based on personal experience, I would suggest converting the image format to 32 BPP & ARGB format.

Image scaling can be achieved by following the piece of code given below:

2. Binarization

Binarization means converting a colored image into an image that consists of only black and white pixels (Black pixel value=0 and White pixel value=255). This can be done by fixing a threshold (normally threshold=127, exactly half of the pixel range 0–255). If the pixel value is greater than the threshold, it is considered a white pixel or deemed a black pixel. 

The threshold value may differ based on image contrast and brightness. The best practice is to find the minimum and maximum pixel value through the image and then consider the median value as a threshold.

Binarization can be done using the following sample class:

3. Skew Correction

Skewed images directly impact the line segmentation of the OCR engine, which reduces its accuracy.

While scanning a document, it might be slightly skewed (image aligned at a certain angle with horizontal) sometimes. While extracting the information from the scanned image, detecting & correcting the skew is crucial. 

Skew Correction can be done using the following class:

In this method, first, we’ll take the binary image, then.

  • Project it horizontally (taking the sum of pixels along rows of the image matrix) to get a histogram of pixels and the height of the image, i.e., count of foreground pixels for every row.
  • Now the image is rotated at various angles (at a small interval of angles called Delta). The difference between the peaks will be calculated (Variance can also be used as one of the metrics). The angle at which the maximum difference between peaks (or Variance) is found, that corresponding angle will be the Skew angle for the image.
  • After finding the Skew angle, we can correct the skewness by rotating the image through an angle equal to the skew angle in the opposite direction of the skew.
Skew Correction

There are many other pre-processing techniques you should consider to improve OCR result accuracy. Some known methods are:

  • Noise Removal or Denoise
  • Thinning and Skeletonization
  • Lines or borders Removal

Now you know everything there is to know about improving OCR accuracy using image processing. If there is anything else you wish to know about, get in touch with an expert at DEV IT here.

The following two tabs change content below.

Maulik Kansara

Sr. Software Developer (.Net) at Dev Information Technology Ltd.
He is an experienced DotNet developer with the demonstrated history of working in the information technology and services industry. He is Skilled in C#, VB.NET, WPF, WCF, XAML, MVVM, WEB SERVICES, WINDOWS SERVICES, MSSQL, HTML, CSS, ASP.NET. He is a strong engineering professional with a Bachelor's degree (I.T.) focused on Information Technology from University of Rajasthan

Latest posts by Maulik Kansara (see all)

Leave a Reply

Your email address will not be published.