CHAPTER 3
BINARIZATION
3.1 OBJECTIVE
Information acquisition from the degraded historical document has always been a challenging task due to various forms of degradation. Image binarization is very much essential in the restoration of degraded historical documents. Even though many algorithms have been proposed, still there is a need for an effective algorithm to solve all kind of degradation problems. In this chapter, an optimum binarization technique is proposed that addresses these issues by using a combined approach of local image contrast and gradient. Firstly, the input image is binarized and then canny edge map is applied to extract text stroke edge pixels. To enhance further, morphological operations are carried out based on shapes. Finally, adaptive thresholding is applied to segment foreground and background pixels. To determine the quality, the proposed method has been tested on 3 public data sets (DIBCO 2009, 2010, 2011) that were taken from pattern recognition and image analysis (PRImA) research lab. The simulation result shows that the proposed binarization method achieves performance improvement in terms of F-measure, NRM, MPM, and PSNR as 88.4461, 0.0708, 0.00265 and 18.420 respectively.
3.2 INTRODUCTION
Preservation of cultural heritage Historical ancient document is very important throughout the world. To maintain the quality of such original document, conversion of machine editable format is highly required to access that information. However such ancient documents may be degraded due to various forms of degradation factors especially the complex background, poor quality, overlapping of characters, document ageing, etc. So restoration and enhancement of degraded document are very much essential in today’s digital library world.
When compared with traditional filtering methods, document image binarization takes part a major role in any kind of document analysis since it influences the subsequent segmentation, feature extraction and classification process. Global and Local document image binarization techniques are most widely approaches. The global approach determines the global threshold for the entire document whereas the Local approach determines the threshold for every pixel adaptively. Among these two methods, local document image binarization performs better than global document image binarization for the degraded documents. Adaptive binarization also performs better to remove noise while segmenting text. An adaptive reconstructive filter for the recognition of misrepresented text present in the document. An adaptive mask size for connecting the broken lines to increase the recognition rate. The content-based text reconstruction for enhancing the text model from the complex background and overlapping of characters in the historical documents. A multiscale scheme for enhancing Sauvola’s method. The text binarization method for achieving better performance. A binarization method using picture element based approach for the recognition of ancient document images. Punctuation marks cannot be recognized by this method.
A great number of techniques have been proposed in the literature for the binarization of grayscale or coloured documents images, but no one between them is generic and efficient for all types of documents. The binarization techniques of grayscale images may be classified into two categories:
- Global thresholding (Otsu method)
- Local thresholding
The global thresholding methods are widely used in many document image analysis applications for their simplicity and efficiency. However, these methods are powerful only when the original documents are of good quality and well contrasted and have a clear bimodal pattern that separates foreground text and background. For historical document images which are generally noisy and of poor quality, global thresholding methods become not suitable because no single threshold can completely separate the foreground from the background of the image since there is no sufficient distinction between the gray range of background and foreground pixels. This kind of document requires a more detailed analysis, which may be guaranteed by local methods. Local methods calculate a different threshold for each pixel based on the information of its neighbourhoods. These methods are more robust against uneven illumination, low contrast, and varying colours than global ones, but they are very time consuming since a separate threshold is computed for each pixel of the image considering its neighbourhoods. This calculation becomes slower when increasing the size of the neighbourhood considered. Hybrid methods, in contrast, combine global and local information for segmenting the image
3.3 Binarization
Binarization is the process of converting a multi-tone image into a bi-tonal image. In the case of document images, it is typical to map foreground text pixels to black and the rest of the image (background) to white. In many applications, binarization is a critical preprocessing step and helps facilitate other document processing tasks such as layout analysis and character recognition. The processing of the scanned document at an early stage depends on the document type. Several procedures like edge extraction, binarization, noise filtering and segmentation can be applied to maps, line drawings and printed or handwritten text, etc. But, the segmentation methods that are widely applied to the already binarized picture are an interesting area of contemporary research interest. The quality of the binarization can greatly affect system performance, as errors made in the binarization step can propagate to downstream tasks. As a standalone application, binarization can serve as a noise removal process to increase document readability. The file size of binary images is often orders of magnitudes smaller than the original gray or colour images, which makes them cheaper to store on disk. Additionally, with the rise of digital archives, the file size can become a concern as large numbers of images are viewed over the Internet. If a person can still recognize the text in the binary images, then this compression can be obtained with virtually no loss in semantic image content. In the last decade, a tremendous amount of progress has been made in the field of historical document binarization. In 2009, the first Document Image Binarization Contest (DIBCO) introduced the first dataset of real degraded images that have ground truth annotations at the pixel level. This enabled a standardized evaluation procedure that allowed for direct comparison between algorithms. This spurred research in the field and the creation of more datasets in new application domains. The purpose of this review is to highlight the recent advances of the last decade in all facets of historical document binarization. We discuss not only binarization methods, but also topics such as pre-processing, post-processing, algorithm efficiency, datasets, evaluation, and definitions of ground truth. The process of Binarization/Thresholding shown in figure 3.1.
Figure 3.1 Process of Binarization/Thresholding
Binarization is the first step in 11th century handwritten Ancient Tamil script classification. Errors in this stage are propagated and reflected in the other steps of the classification that is why it is very important to have a very robust thresholding method. If the pixel values of the characters and those of the background are constant in their respective values over the image, then a single threshold value can be calculated for the image. The use of a single threshold for all image pixels is called global thresholding. Global thresholding algorithms, automatically determine the best global threshold value. The role of pre-processing is to segment the interesting pattern from the background image. Typical preprocessing includes binarization, smoothing & noise removal, skew detection and correction, slant correction and thinning. Binarization is the initial step for processing, with the fact of degradation of the source document, whichever global or local thresholding approaches are chosen. Binarization is a technique by which the grayscale images are converted into binary images. Binarization separates the foreground (text) and background information. The most common method for binarization is to select a proper threshold for the intensity of the image and then convert all the intensity values above the threshold to one intensity value (“white”), and all intensity values below the threshold to the other chosen intensity (“black”). Otsu’s thresholding technique is one of the most popular thresholding technique frequently used today. The Otsu thresholding algorithm using the histogram shape analysis is most widespread global binarization algorithm. The image contrast defined as local image minimum and maximum when compared with the image gradient process. This method is superior when handling document images with difficult background variation. Finally, the ancient document image is binarized based on the local thresholds that are derived from the detected high contrast image pixels when the same is compared with the previous method based on image contrast, the proposed method uses the image contrast to recognize the text stroke boundary and it can be used to produce highly accurate binarization results. The substantial work on preprocessing of ancient scripts which include tasks such as noise removal, thinning, binarization and segmentation. It is noticed that work on the automated reading of ancient Indian script particularly ancient Kannada script is minimal. Hence, in this research work, an attempt is made in automatic recognition of ancient English characters and the document scripts. The region-based local binarization algorithm for handwritten ancient documents is used for degraded documents. The OCR technique helps for the language script as well as helps to the noise elimination techniques.
3.3.1 Importance of Binarization
Binarization is considered as one of the important preprocessing steps of document image analysis. Image binarization process involves the separation of pixel values into two groups. For one group ‘white’ is background and ‘black’ in the foreground. It is vice versa for the other group. Image binarization process is reported to play a major role to bring finesse for the process of document image binarization. Binarization is usually performed initially for grayscale images. The information is inherent of binary nature namely for text or graphics. Binarization targets for choosing a value for threshold automatically for separating foreground/background data. Choosing the right value for threshold is, in general, a trial-and-error approach. Choosing a threshold is a complex objective when text pixels and background contrast is low (e.g., text printed on a gray background). It is also difficult to choose when background bleeds into the text pixels and during thin text strokes. It is also difficult when the page is illuminated non-uniformly during data capture. Many methods are developed for solving the above objective by modelling background or foreground pixels representing statistical distributions while relying on thresholds changing spatially or adaptively. Perfect results may not be achieved whether global or adaptive methods are used. Perfection or performance is achieved based on the original image quality concerning the line gaps, edges those are ragged on boundaries of the region and extraneous regions of ON-OFF values for pixels. Scanning and printing of documents degrade their visibility, rendering them difficult to understand. Several techniques are proposed for document binarization. Such processing is classified into global and local thresholding techniques as thresholding became an important parameter in binarization. Global thresholding is undoubtedly fit for images having a uniform background and foreground contrast distribution. Local thresholding is used for the images with degraded texts having background noise, less illumination and contrast variation as it is difficult to identify most of the pixels as background or foreground. Based on this argument, the algorithms that currently adopted are classified as global and local binarization. The former one uses a single threshold value for the entire image and the latter one depends on threshold value estimated locally. One of the most conventional approaches used to binarize a script image is the estimation of the rate of negotiations for a global threshold. A value of the threshold is identified for the total image area as per the available local/global information. In local thresholding, the threshold values are determined locally. Then, a single threshold’ is attributed to a specified region which is changed from region to region concerning the threshold over an area given. The source image is introduced with different degrees of degradation results for the failure of the binarization process and degrades the quality. Reasons for the degradation are found to vary from the type of poor sources, to the faulty image acquisition process in a restricted environment. Hence designing an efficient binarization method is important for detection of possible imperfections although care is provided in later stages.
3.3.2 Challenges in Historical Documents
Historical documents are the source of obtaining accurate and meaningful cultural and scientific knowledge that can be used for the information retrieval process. These historical documents are usually available in old libraries and reference centers of several government departments. Often, the importance of these documents has been evidenced concerning their scientific, legal, and cultural values. Recently, there has been growing interest in transforming these documents into digital forms, which converts them to digitally readable text that requires scanning, extracting text, and storing them in a database. This process involves automatically converting handwritten documents from scanned images, which takes advantage of recognition methods to extract textual information. However, handwritten prehistoric document, in general, suffer from various issues referred to as document degradations. These degradations can be corrected with the help of document binarization methods and issues can be resolved to extract accurate information using OCR tools. To achieve this, there have been several approaches developed for various document binarization tasks including palm leaf, music score, parsing floor plan, and historical documents. Although these tasks are inherently the same, these are the same challenges in historical document binarization which makes it a difficult task for accurate binarization. Thus, the main goal of this paper is to review the existing algorithms developed for degraded historical document binarization.
3.4 Issues in Binarization of historical documents
Before understanding and review of the degraded binarization techniques, it is extremely important to understand the nature of defects and degradations in historical documents. The Binarization of historical documents had some difficulties during the binarization process occurred. The certain issues and artefacts in historical documents in Binarization given as,
- Uneven illumination
- Contrast variation
- Bleed-through degradation
- Faded ink or faint characters
- Smear or Show Through
- Blur
- Thin or Weak Text Deteriorated Documents
Uneven illumination
In optical imaging, as incident light gets diminished exponentially along the path of light due to spreading of particles in media, light microscopy images degrade and fall victim to uneven illumination. As a combined outcome of background objects, the overlays in fluorescence absorption, and emission spectra, the light gets scattered, leading to uneven illumination of microscopy. This leads to difficulties in document image analysis. In the case of the document, when the light condition results in an uneven illumination, recognizing characters/texts from the document image using an OCR results in a severely unacceptable effect; this may lead to various difficulties inefficient document recognition. Generally, for OCR to provide better accuracy, the process consists of transforming an image from grayscale to binary images and extracting the text. However, due to uneven illumination, binary images have artefacts which result in the wrong extraction of text from artefact regions
Contrast Variation
Contrast variation can be defined as a variation in brightness. Most times, contrast refers to the differences between an object with high intensity and low-intensity pixels in an image. The contrast can also be measured in terms of differences between high/low pixels of an object in an image and the background pixel values. Factors such as noisy environment, sunlight, illumination, and occlusion often cause contrast variation, which is extremely non-linear and expressive. The variation in contrast existing within historical and handwritten documents results in the document image-analysis algorithms facing difficulties in terms of applying traditional threshold-based methods to separate the foreground text from the background of document images. One can deal with such issues by applying image enhancement methods before performing image binarization. Figure 4 shows an example of contrast variations in historic and hand-written documents.
Bleed-Through Degradation
Bleed-through or ink bleeding happens when the document is written on two sides of the paper and the ink from one side starts to appear on the other side of the document. When it comes to document binarization, bleed-through degradation poses a major threat. It takes place when ink oozes through one side of the page and spreads across over to the other side, ruining the text there. Many ideas were suggested to counter the issue of bleed-through, and researchers working on this issue faced two major challenges. The first was the difficulty in getting access to degraded documents of high resolution unless in connection with a certain digitization project or library. The second challenge is commonly seen in all restoration techniques, where issues arise at the time of analyzing outcomes quantitatively due to unavailability of actual ground truth. This issue can be solved by either preparing an image of certain degradation quality based on the corresponding ground truth or by forming an image as ground truth by knowing the original degraded images. One can always analyze performance despite not having ground truth by quantifying the way restoration impacts a secondary step, just like the enactment of an OCR system on a document image.
Faded Ink or Faint Characters
There exists enough historical, public, and political interest when it comes to analyzing a huge range of organizational official papers and transforming them into digital libraries and archives. Most of these official papers were typewritten, which brings forward various issues regarding their recognition. To begin with, every individual glyph (character) inside the document could appear either faint or much stronger than the glyphs around it in opposition to other printed documents. This is directly related to both the form of the original striking head of the exact key as well as the extent of force employed while tying. Secondly, a wide range of typewritten documents continue to last only as carbon copies, etched on an extremely squeaky paper (Japanese paper) possessing a prominent texture. The machine-driven nature held by the typing process (where there is a need to press the key harder for the printing to take place on the original and carbon copy), most carbon copies appear blurry. Issues like repeated use and ageing, tears, stains, rust from paperclips, punch holes, the disintegration of parts of documents, and discolouration negatively impact historical typewritten documents. Figure 6 shows instances of scanned carbon-copy historical typewritten material with faded ink degradation.
Smear or Show Through
After the documents have been digitized, more challenges come up in the form of noises and low-resolution components appearing over the documents. Due to this, the visual appearance of the document is negatively affected. There could be various forms of degradations a historical document may be suffering from, all of which were introduced with time and may have differing natures. However, one of the largest issues in documents appears as show-through. Many documents in the past were written on both sides of the paper. The show-through problem appears when ink impressions from one side start to appear on the other side, making the document difficult to read. These documents have to be restored to be easily readable. By getting rid of the show through the issue, the image compression time reduces significantly, allowing people to download them faster over the internet. A clear background can be obtained if the show-through on this image were to be removed.
Blur
When it comes to document degradation, two different types of blurring appears in documents: Motion blur and out-of-focus blur. In general, motion blur artefacts are caused by the relative speed between the camera and the object, or a sudden rapid movement of the camera, whereas out-of-focus blur takes place when the light fails to converge in the image. To fix the blur issue, the research topics as of late have turned towards the tools for assessing blur in document images to figure out the accuracy of the OCR, hence providing the required response to the user to help them obtain new images in the hopes of getting better OCR outcomes.
Thin or Weak Text
Many times, documents written in the past consist of very thin or weak text. In general, these documents were written with the help of ink or sometimes drawn with paint. Due to the quality of ink or paint used in historic documents, they start to degrade as their ink start to shrink. In other cases, the use of low-quality ink and the nature of the paper used also resulted in thin or weak text that makes it difficult to apply binarization methods and extract the text accurately. Researchers, as of late, have been showing more interest in prehistoric document image analysis, which has brought forward various new challenges. Degradation in historical documents such as weak or thin texts has encouraged researchers to come up with enhancement and binarization algorithms good enough to fix these issues. Successive phases in terms of algorithms such as skew detection, recognition, and page or line segmentation were then created for binarized data.
Deteriorated Documents
Original paper-based documents could be comprised of various forms of media (such as ink, graphite, watercolour) and formats (such as rolled maps, spreadsheets, and record books). These documents may well be of great importance as they contain informational, evidential, associational, and intrinsic values. A document consisting of historical, legal, or scientific data is said to have great evidential value as long as the original condition of the media, substrate, format, and images are not radically altered due to modification or deterioration. However, adverse use is not the only medium through which documents face deterioration, loss, and damage. Other aspects such as poor storage, handling and environmental conditions, and inherent instability also contribute. Severe damage and deterioration can also be caused by environmental factors, mainly when it comes to inherently unstable documents.
3.5 Degraded Document Binarization Methods
While looking at the trends in image binarization, they can broadly be classified as global, local, and hybrid thresholding approaches. The methods developed using local and global approaches have both been extensively discussed in various articles focused on degraded image binarization. In recent times, not only the global and local, but also the hybrid thresholding, takes advantage of both local and global thresholding as trending topics. The subsequent sections of this article thus discuss the binarization approaches.
3.5.1 Global Thresholding-Based Binarization Methods (OTSU thresholding)
The term global thresholding refers to an approach of image thresholding where a single threshold value is set for the whole image. In thresholding, the pixels are classified and considered for its inclusion and exclusion from an output image based on the set threshold limit. In this case, for an example of a grayscale image, first, all intensity values are calculated, and global value is marked to be the threshold that is usually calculated based on the histogram of signal intensity in the image; then, a comparison is made where the pixel values above/less than that are either included or excluded from the resultant image. The final output is then seen as the location of pixels as a region of interest in an image with the values based on the calculated threshold. One of the most common and widely used methods for global thresholding is Otsu’s method. Similar to Otsu, Kittler and Illingworth have introduced a method based on thresholding which can efficiently to extract the text from the background. However, these global thresholding methods work efficiently in the case of a rather simple and decent-quality document but fail to deliver the accurate quality when documents exhibit degradation such as uneven illumination or noisy background.
Otsu is often used as a global thresholding method. OTSU method keeps the Canny good qualities such as fine detection, good edge localization and only one response to a single edge. These qualities improve the capability of preventing the fake edge. In the field of computer vision and image processing, OTSU method is used to perform clustering-based on image thresholding. It is generally acknowledged that the OTSU is the best method of choosing threshold value automatically. The basic principle of this method is to separate the image pixels into two or more than two groups or classes and search out for best threshold value through the maximum value of variance between the two classes. Here, the multi-level OTSU thresholding method is used to mark true edges in the image. It is an extended form of OTSU method, in which more than two threshold levels are used to divide an image. Thresholding algorithm generally used gray level histogram as an efficient tool for creating threshold level. For converting grey-level image into a binary image, select the threshold point from gray level pixels value then compare the pixels value of image with that threshold point, if pixel value less than the threshold value, pixel treat as zero value and pixel has equal or higher than threshold value mark as one.
Otsu method based on treating the gray level intensities present in the image as values to be clustered into two sets, foreground (black) and background (white). It is important to select an adequate threshold of gray level for extracting foreground from their background. To select the threshold, a histogram of an image is considered, which has a deep and sharp valley between two peaks representing foreground and background respectively, the threshold can be chosen from the bottom of this valley. However, for most images, it is often difficult to detect the valley bottom efficiently. In many cases, no goodness of threshold has been evaluated by any of these techniques.
3.5.2 Adaptive Thresholding-Based Binarization Methods
On the other side, adaptive (or local) thresholding methods compute a threshold for each pixel (or set of pixels) in the image which depends on the content of its neighbourhood. Unlike global thresholding methods, where a single threshold is calculated for a whole image, local/adaptive thresholding phenomenon works based on the calculation of threshold either from each pixel or a set of pixels, which usually depends on certain pixel limits in a given object of an image. An image with multiple objects of similar pixels can be categorized into different classes and a set of thresholds can be calculated for each group of pixels locally in the image. One of the finest examples of local/adaptive threshold, where a local threshold was calculated in an image based on the histogram. Here, the image was classified of pixels with the attention of maximum (max (i, j)) and minimum (min (i, j)) intensities and estimation of mean value was calculated, followed by the set threshold based on the local neighbourhood window centered at pixel (i, j). In this case, contrast is also calculated to estimate the difference between pixels with maximum (max (i, j)) and minimum (min (i, j)) intensities and a local contrast threshold (for an example k = 20) is set for both ranges. If the difference is lower than the threshold (k = 20), the group of pixels with these values can be set to one class and the remaining to another class, either as foreground or background. One can assume that since the contrast difference is calculated with this method, the suitability of such methods is appropriate in the case of bigger contrast values. For the degraded documents, Niblack introduced an algorithm which estimates a threshold based on each pixel in an image (usually a rectangular sliding window) and local mean and standard deviations are taken into consideration. It has been found in the approach proposed by Niblack that, although the method serves to correctly identify text from a degraded image’s document, it also suffers from background noise.
To deal with issues faced by Niblack’s method, an improved method was then proposed by Sauvola and Pietikainen that usually performed better than Niblack’s method but oftentimes introduces thin and broken text due to the extensive sliding window operations. The approach based on multi-stage processing was taken into consideration towards developing a document binarization approach. In this approach, several stages include noise correction and contrast improvement with the help of a Weiner filter, followed by performing segmentation for the text from the background based on the method developed by Sauvola and Pietikainen approaches. In the next steps, an intensity analysis was done to identify the background area in an image, which is further followed by generating the final threshold based on the original and resultant image. This threshold is then utilized for final binarization. Although this method performs better for document image binarization, the main disadvantage of the method was that it dealt with only the area with textual information and on the other areas, the method was not showing efficient results. To enhance the capacities of Niblack’s methods that can deal with the issue associated with Niblacks’s method. In a more recent study, a novel algorithm was introduced to solve degraded document image binarization. Here, background noise removal and document quality enhancement were performed first, followed by applying the variant of Sauvola’s Binarization method, and ultimately performing post-processing to find small areas of connected components in an image and removing the unnecessary components from the image.
3.5.3 Hybrid Thresholding-Based Binarization Methods
The degraded document issues were discussed, followed by the global and adaptive thresholding approaches used during the binarization process. The process of local or global thresholding have their advantages and disadvantages, and one can use the strength of both local and global thresholding methods to provide a better mechanism. This concept has been seen as an efficient solution, and researchers have come up with methods combining both the approaches and named the new approach hybrid binarization methods. There are several benefits of hybrid binarization methods, and one can take into consideration the computation time, flexibility and efficiency of methods, robustness, and accuracies in terms of background and foreground region extractions. This approach, where different thresholding methods were combined to provide significantly better outcomes. In this approach, an improvement was carried out by combining global (Otsu’s) and adaptive (Sauvola’s methods) for better image binarization. In this method, the algorithm used global and local threshold as first iterative global thresholding, followed by the detection of a noisy area, and again, the iterative global thresholding was applied locally in each area with noise.
3.5.4 Machine Learning-Based Binarization Methods
Traditional global and local thresholding methods have difficulties in the binarization of images with noisy and non-uniform backgrounds. Machine learning approaches are an alternative way to overcome these difficulties. Many machine learning algorithms have been widely applied to many document binarization tasks. As an initial work, a neural network-based algorithm for character extraction from document images. In this approach, a multilayer neural network is trained with document samples and their related ground truth for pixel-level document classification. The neural network inherently extracts features from pixels by considering their neighbourhood and can produce adaptive binarization results. However, this process requires more training data and proper training time. To reduce the training time, a two-stage binarization approach. In this method, at first, a region-based binarization stage is applied to the input image to generate the binary output. To do this, the input image is binarized by using global and local threshold values, which were extracted from the input image by calculating the histogram of the entire image and pixel-level intensity. Then at the second stage, a neural network is applied on the extracted binary image to distinguish characters from the background. The multilayer perceptron is a feed-forward structure with high capability in learning complex patterns for the various challenging tasks such as cleaning hand-written data and document binarization. The feed-forward neural networks for document binarization in old manuscripts. This method relies on the classification scheme using a multilayer perceptron network. In this approach, the document images with their related ground truth were fed into a multilayer perceptron network to learn the sample patterns. At test time, a learned feed-forward network is applied to the input document image to generate the binarization result. The experimentation results demonstrated that the proposed Multilayer perceptron (MLP) network can accurately separate foreground information from the degraded documents. Although the artificial neural network has a high capability in learning complex pattern recognition tasks, they still have misclassification results on degraded document binarization. Fortunately, deep neural networks, which are hierarchical NNs, have demonstrated a vast representational capacity and high-performance rate in various document image binarization tasks. A recurrent neural network model helps for the document image binarization
3.3 PROBLEM STATEMENT
Historical document images are, in general, more difficult to binarize than modern scanned documents. This is in part due to their degraded state and partly because many historical documents are digitized with cameras, which do not have controlled illumination conditions like scanners. Some camera produced images have uneven illumination due to bad lighting or because the page is not flat (e.g. curved edges near bookbindings). Address the challenges in enhancement and segmentation of ancient historic documents such as – varying degradation, unwanted symbols or marks, noise embedded and text engraved with much skew. The ancient historical documents detriments the several degradation factors such as non-uniform illumination, contrast issues, quality of an image during binarization, complex background and so on.
3.4 PROPOSED METHODOLOGY
The proposed system with the block diagram is given below in figure 3.2.
Figure 3.2 Proposed System Module
A. Proposed Module
The proposed method includes the following four modules
- RGB to gray conversion
- Canny Edge Map Detector
- Morphological Operation
- Adaptive Thresholding
B. Module Description
1) RGB to Gray conversion: The input image is always considered as RGB image to sense and represent in electronic systems. To preserve the brightness(luminance) of the image, the input 24-bit colour image is converted into an 8-bit image. The intensity of colours RED, GREEN and BLUE are combined in the proportion of 30%, 60% and 11% respectively to obtain a grayscale image.
2) Canny Edge Map Detector:
A wide range of edges can be detected using a multistage algorithm in canny edge map detector. It is used to extract meaningful information from images. The amount of data required to process the image is also reduced by using this operator. It provides a low error rate, good localization and minimal response. It is one of the most widely used edge detection operators due to its optimality and simplicity.
The algorithm to perform canny edge map detection is given as follows:
- The image can be smoothened by applying a Gaussian filter.
Gaussian filter kernel equation of size (2p+1)×(2p+1) is given by:
(3.1)
- Intensity gradients are determined by using the following equation
(3.2)
Where , represent horizontal and vertical directions respectively.
- Estimate potential edges using applying a double threshold.
- Weak edges are suppressed by tracking the edges using hysteresis.
H i j = 1 2 π σ 2 exp ( − ( i − ( k + 1 ) ) 2 + ( j − ( k + 1 ) ) 2 2 σ 2 ) ; 1 ≤ i , j ≤ ( 2 k + 1 ) {\displaystyle H_{ij}={\frac {1}{2\pi \sigma ^{2}}}\exp \left(-{\frac {(i-(k+1))^{2}+(j-(k+1))^{2}}{2\sigma ^{2}}}\right);1\leq i,j\leq (2k+1)}
3) Morphological Operation: The structure and shape of objects can be described by using morphological operations that operate on Sets(objects in an image). The output of morphological operations depends on the size of an original image and Structural Element.
There are four morphological operations namely Dilation, Erosion, Opening and Closing. The opening of A by B is obtained by the erosion operation followed by dilation operation. The closing of A by B is obtained by the dilation operation followed by erosion operation. Dilation is used for expansion of an image by pixel addition to the object boundaries whereas erosion is just the opposite of dilation operation. To shrink the image, erosion operation is used by eliminating the pixels on the boundaries. Addition and deletion of pixels depend on the boundary of the structuring element.
Rules for Dilation and Erosion
Dilation:
Highest pixel value out of all pixel’s in the neighbourhood at the input is assigned as the output pixel in this morphological operation. For example, the pixel in the output is set to 1 if any of the pixels at the input is 1 in a binary image.
Erosion:
Lowest pixel value out of all pixel’s in the neighbourhood at the input is assigned as the output pixel in this morphological operation, For example, the pixel in the output is set to 0 if any of the pixels at the input is 0 in a binary image.
4) Modified Adaptive Thresholding: Thresholding is a technique commonly used to obtain a binary image from a grayscale image. Adaptive thresholding changes its threshold value according to the pixel values present in the image which results in strong illuminated shadows. To segment an image, thresholding can be performed. For every pixel, the threshold value is computed. Then thresholding is done by assigning higher intensity values as foreground values and lower intensity value as the background value. Higher and lower intensity values are estimated by comparing the values with a threshold. Adaptive thresholding also considers spatial variations in illumination.
The proposed approach traverses the image from the left side to the right side and thus covers the entire image. If the pre-processed image value F′(x, y) goes beyond the threshold T, text regions are identified. The threshold T [15] is estimated based on concavities of the histogram C (F′).
q (F′) indicates the image probability mass function. The deepest concavity points become threshold once if the histogram convex hull is computed. The following sigmoid function is used if the gradients of background and the text regions are similar to exhibit the behaviour of the pixel values of the background region.
(3.3)
where r is a weighting parameter, the difference between average black and white pixels is denoted by and q1,q2 are constants.
Database details
The DIBCO (Document Image Binarization Contest) Dataset mainly represents major European libraries. It contains most of the historical document layout which needs to be digitized to preserve the cultural heritage. The content of this dataset includes newspapers, magazines, images and layouts. Majority region covers graphic regions, textual regions with different fonts and sizes.
Table 3.1 Sample images of DIBCO 2009 dataset
S.No | Samples |
1 | |
2 | |
3 | |
4 | |
5 |
Table 3.2 Sample images of DIBCO 2010 dataset
S.No | Samples |
1 | |
2 | |
3 | |
4 | |
5 |
Table 3.3 Sample images of DIBCO 2011 dataset
S.No | Samples |
1 | |
2 | |
3 | |
4 | |
5 |
Table 3.1, Table 3.2 and Table 3.3 shows the samples images of DIBCO 2009, DIBCO 2010 and DIBCO 2011 dataset respectively.
3.5 RESULTS AND DISCUSSION
A. Simulation Analysis
To determine the quality of the proposed method, the various document images have been considered. The proposed method is tested on 150 degraded historical documents. The proposed method is compared with existing famous Otsu method.
3.3 (a)
3.3 (b)
3.3 ©
3.3 (d)
3.3 (e)
3.3 (f)
Figure 3.3 Binarization of Historical Document Image (a)Original document image (b)RGB to gray conversion (c)Canny edge map (d)Dilation (e)Erosion (f) Modified Adaptive Contrast output
Figure 3.3 illustrates the Binarization of Historical Document Image (a)Original document image (b)RGB to gray conversion (c)Canny edge map (d)Dilation (e)Erosion (f) Modified Adaptive Contrast output
b. Performance Metric Analysis
F- measure:
Accuracy of binarization can be tested by evaluating F-measure. Its value should be higher for accurate binarization.
(3.4)
(3.5)
(3.6)
where FP, TP, FN indicates False Positive, True Positive and False Negative respectively.
PSNR (Peak Signal to Noise Ratio):
The restoration of the desired pixels is indicated by PSNR. The ideal value of PSNR lies in the range of 15 dB to 25dB.
(3.7)
Where Mean Square Error
(3.8)
and d denotes a constant
MPM (Misclassification Penalty Metric):
It determines the prediction against ground truth(GT) for analyzing the proposed algorithm in a better manner. Low MPM value indicates better performance.
(3.9)
Where denotes the distance of the false-negative pixel. denotes the distance of the false-positive pixel .D represents a Normalization Factor.
NRM (Negative Rate Metric):
It indicates a better enhancement of output image concerning ground truth image. Better performance is indicated by low NRM value.
(3.10)
Where, TP, TN, FP, FN indicates True Positive, True Negative, False Positive and False Negative respectively.
Table 3.4 Evaluation results of DIBCO 2009, DIBCO 2010 and DIBCO 2011 Dataset
Performance Metric | Existing Otsu Method | The proposed method for DIBCO 2009 Dataset | The proposed method for DIBCO 2010 Dataset | The proposed method for DIBCO 2011 Dataset |
FM | 65.754 | 91.400 | 90.994 | 91.601 |
NRM | 0.0678 | 0.054 | 0.0437 | 0.0463 |
MPM | 0.0526 | 0.0339 | 0.0014 | 0.043 |
PSNR | 11.815 | 19.480 | 19.367 | 20.677 |
From the above table 3.4, it is found that the proposed technique achieves improved results by evaluating the performance metric such as F-measure, PSNR, NRM and MPM.
3.6 SUMMARY
This chapter provides an optimum technique for binarization of degraded ancient historical document images using a combined approach of local image contrast and gradient. The proposed method overcomes the different degradation factors such as non-uniform illumination, complex background and so on.. The proposed technique involves modified adaptive thresholding, Canny edge map detection method and morphological operations. Samples of DIBCO 2009, DIBCO 2010 and DIBCO 2011 datasets are experimented through the proposed method by computing the performance metric such as F-measure, PSNR, NRM and MPM and compared with existing technologies. Significant improvement is achieved for the binarization of degraded ancient historical documents to preserve the ancient cultural heritage.