This essay has been submitted by a student. This is not an example of the work written by professional essay writers.
Uncategorized

AN EFFICIENT BINARIZATION AND SEGMENTATION TECHNIQUE FOR THE RECOGNITION OF ANCIENT DEGRADED HISTORICAL DOCUMENTSk

Pssst… we can write an original essay just for you.

Any subject. Any type of essay. We’ll even meet a 3-hour deadline.

GET YOUR PRICE

writers online

 

AN EFFICIENT BINARIZATION AND SEGMENTATION TECHNIQUE FOR THE RECOGNITION OF ANCIENT DEGRADED HISTORICAL DOCUMENTS

 

5.1 OBJECTIVE

To preserve cultural heritage, ancient Historical document is very important throughout the world. To maintain the quality of such document conversion of machine editable format is highly required to access that information. Information acquisition from the degraded historical document has always been a challenging task due to various form of degradation. Image binarization and segmentation are very much essential in the restoration of historical documents. In this chapter, to propose a method which is based on double binarization technique. The former is based on Otsu thresholding and the latter is Sauvola algorithm. Then the process of segmentation is done using projection profile method and bounding box method. The results provide us with the evident that the proposed approach is of more accuracy of 97%.

 

5.2 INTRODUCTION

In this world, most of the activities of our ancestor are recognized only through evidences given by ancient Historical documents. Every country has its distinct socio-cultural activities. These Historical documents also provide various Philosophies for living. In other words, these documents are very substantial to preserve our cultural heritage. It is hard to preserve those documents for many decades as those undergo degradation. To maintain these documents, it must be converted into machine editable documents. The primary step involves pre-processing. This is to enhance the image and to remove noises as the documents contains Large signal dependent noise, Non-uniform illumination, Smear, Low contrast, Faint characters and strain. Then the image should be transformed into binary image. After binarization, Segmentation should be done which is the important step in converting a document into machine editable format. Segmentation involves three process, Text line detection, Word detection, Character detection. Text line segmentation refers to extracting the lines from the image and it depends on the skew angles. Then the word is segmented from the Text line latter the character is segmented. It is important to do segmentation in the order as the future process involves feature extraction and classification of the characters obtained must be in the same order as the historical document. The dataset used here is HDLAC 2011. Here, the proposed method for binarization which is the double binarization technique which involves Otsu thresholding and Sauvola Algorithm and segmentation is done using projection profile method and bounding box method. The sample images Historical Document Layout Analysis Competition obtained from PRIMA Research Lab shown in figure 5.1.

 

Figure 5.1 Sample images Historical Document Layout Analysis Competition obtained from PRIMA Research Lab.

 

5.2.1 Document Image Processing

Traditionally, paper is the medium of a document presented using ink, either by handwritten or printed means. Through time, documents have also been written with ink on palm leaves, carved on stone or engraved on copper plates. Vast amounts of historical handwritten texts are available in libraries. Digitization of these documents helps in the preservation of these age-old records. Thus, it gives a path to the objective of dealing with the flow of electronic documents in an efficient and an integrated way. The final goal is to use computers for reading and understanding the documents as humans do. Creation of good quality digitized documents for better human perception and further for comprehension, is the goal of the field document image processing. The document image processing involves text processing and graphics processing. The document image processing involves three basic steps, namely

  • Document Image Analysis
  • Document Image Recognition
  • Document Image Understanding

5.2.1.1 Document Image Analysis

Document image analysis involves both the physical decomposition of a page and the derivation of the logical meaning or semantics of the salient fields or regions defined in the image. In general, the analysis involves the extraction and use of attributes and structure relationships in the document in order to label its components based on document type. In addition to this, the text processing deals with the text components of a document image and the tasks include recognition of the text, determination of skew angle and finding paragraphs, text lines and words.

 

 

 

5.2.1.2 Document Image Recognition

The next step is to analyze a document and segregate the text block, graphics block, picture block, etc, so as to facilitate the labeling of the blocks. The process of labeling the blocks is called document image recognition or identification.

5.2.1.3 Document Image Understanding

Document image understanding involves recognizing the characters on a page and assembling them into the format suitable for text processing. Automatic image interpretation is often desirable as it is very fast, robust, flexible, reliable and very accurate in performing sophisticated operations compared to human/manual interpretation. Machine recognition of characters involves the computer receiving input from different sources, processing and recognition. Handwritten Character recognition can be divided into two categories namely:- Offline and Online methods. On-line handwriting recognizers normally use Personal Digital Assistance (PDA), or Tablet PCs, and their performances are fairly acceptable for processing handwritten letters and symbols. In contrast, off-line systems are equipped with scanners and printers.

5.2.2 Degraded documents

Degradations in the scanned text image occur from many sources, which can be categorized into four areas as given below

(a) Defects in the paper: Yellowing, wrinkles, coffee stains, speckles, typeface, point size, spacing and typesetting imperfections (pasted-up layouts, slipped type) etc. are the degradations that come in this category

(b) Defects introduced during printing: Toner dropout, bleeding and scatter, baseline variations, ink-spread, strike-through and paper defects can be considered in this category

(c) Defects introduced during digitization through scanning or camera: Skewness (geometric deformation), misthresholding, resolution reduction, blur, sensor sensitivity noise, sampling, defocusing, low print contrast, non uniform illumination and nonrectilinear camera positioning are the degradations that can be put in this category

(d) Defects introduced during copying through photocopiers and fax machines: Skew, streaking, shading and noise in electronics components can be considered in this category.

5.2.3 Segmentation of degraded documents

            Segmentation phase is the most significant phase of the character recognition process as the result of this phase directly affects the recognition results. The task of segmentation phase is to divide the input document image into region of interest. In optical character recognition system, the regions of interest are usually characters and sub-characters. These characters and sub-characters are collectively called as recognizable units, as they are the smallest possible components of a document image which can be obtained during segmentation phase. The segmentation process itself consists of subtasks which have to be performed in order to extract the recognizable units from the document image. It is not always necessary that text document contains only text, it may consists of both text and non-text regions. The non-text regions usually contain graphics, tables, diagrams etc. The text regions have to be extracted from such documents. This process is commonly known as text/non-text classification. Similarly the text region within the document may be arranged in multicolumn structure. As an example, news in newspapers has both properties described above i.e. they often have graphics and are also in multicolumn format. The text regions from above mentioned type of documents have to be extracted before actual segmentation of text regions begins. This task of extracting text regions can be performed either manually or by assuming preprocessing stage. However in our work we have considered the document images having uniform text regions. In order to extract the recognizable units from text document images, segmentation system performs the following subtasks.

  • Text Line Segmentation
  • Word Segmentation
  • Character Segmentation

5.2.3.1 Line Segmentation

Line segmentation is the first subtask of whole segmentation process. Line segmentation subtask takes on the input text document and splits the whole document image into non overlapping text lines. As described earlier, the input document image should be preprocessed (binarized, de-skewed and noise cleaned) before the line segmentation proceeds. Line segmentation process extracts the individual text lines from a scanned document image. The two types of line segmentation approaches discussed below

  1. Projection Profile Approach
  2. Connected Component Approach

5.2.3.1.1 Projection Profile Approach

In projection profile method, first to segment the text lines of document images. In machine printed documents the text lines are generally well spaced i.e. the text lines are separated from each other by prominent white spaces in between. This property of the machine printed text has commonly employed to split the entire document image into isolated text lines. The technique which uses this property of the printed text images is known as projection profiles. Projection profiles can be either horizontal or vertical. Horizontal projection profile of document image is used to demarcate the text lines within document image and the line segmentation enable the form of horizontal axis.

Horizontal Projection Profile Approach – The horizontal projection profile of a binarized document image is evaluated by adding the values of each pixel which appear along horizontal axis for each row.

Though appearing to be a very simple technique for line segmentation, but in some cases it may lead to 1. Over-segmentation and 2. Under segmentation.

Over-segmentation

As the name implies over-segmentation means splitting something into more fragments then required. Over-segmentation in line segmentation process occurs when a text line of document image gets divided into two or more horizontal text regions. Over-segmentation of text lines usually occur when ascenders or descenders or both are not attached to the middle region of text line. This separation of ascenders and descenders is indicated by space in projection profiles, and hence marked as separate text line. The bounding boxes indicate the 8 text lines obtained. The image actually has text lines, but the segmentation results into 12 text lines. This over-splitting of the lines is clearly due to the separation of ascenders/descenders from middle region. The arrows indicate the gap in the profile and the corresponding over-segmentation location in image. The text strips obtained from the profile of the document image are divided into following three categories:

  • Text strip containing upper region (ascenders) of text line (upper strip)
  • Text strip containing lower region (descenders) of a text line (lower strip)
  • Text strip comprising of middle region/middle and ascenders/middle and descenders/middle, ascenders and descenders (core strip)

The first two categories, i.e. upper strips and lower strips can be regarded as component text lines as they cannot exist independently in a document. In order to determine the bounding box (coordinates) of the ground truth text lines, all the segmented text line strips are analyzed, and the component text lines are combined with the corresponding core region to make up the complete text line. After segmenting the image the text lines so obtained are classified into one of the above defined categories. Initially the height of all text strips is calculated immediately after segmentation. For classifying each text strip, a threshold value (line height) is required. It has been found that the threshold line height cannot be taken as arithmetic mean of all the segmented text strips, since the height of the lower strips, upper strips and core strips present in the document image can greatly affect the overall estimate. In the over-segmentation process, the distance of the component strips is measured from the neighboring core strips. The component strip is combined with the core strip which is at minimum distance. If the component strip is at same distance from two core strips then the component strip is combined with that core strip whose height remains smaller even on combining component strip as compared to the height of other core strip.

Under-segmentation

Like over-segmentation under-segmentation also leads to wrong text line segmentation of document image. Under-segmentation means splitting something into less number of fragments then required. Under segmentation occurs when line segmentation fails to split two or more overlapping text lines. In some text lines the descenders may get overlapped with the ascenders of the consecutive text lines, which result in horizontally overlapped lines, and hence one cannot find continuous horizontal white space between such lines. In such documents the inter line spacing is usually not uniform.

5.2.3.1.2 Connected Component Approach

To solve the problem of under-segmentation, connected component approach has been used. Connected component approach is a general approach to identify the isolated components of an input image. The connected components are the set of connected pixels. In this approach the connected pixels are grouped together to create a set of pixels. In this set of pixels, any pixel can be visited from any other pixel in the set by passing through the pixels contained in the set. These components of the image are used as basic units for further processing. Connected components are obtained from the given image by marking the individual pixel with a value commonly known as label, and hence called connected component labeling. Connected component labeling is generally used in various image processing tasks. In this process each group of connected pixels is represented by a unique label. These labels help in distinguishing one group of connected pixels from another group. In case of binary images, connected component labeling is performed by assigning labels to the black pixels in such a way that all the connected neighbor black pixels are also assigned the same label. The connected components can be obtained either by 4-connected pixels or 8-connected schemes.

In 4-connected scheme the horizontally and vertically adjacent pixels are considered as neighboring pixels, whereas in 8-connected method in addition to horizontally and vertically adjacent pixels, diagonally adjacent pixels are also considered as neighbouring pixels.

            In 8-connected scheme to find the connected component from the input document image. Connected component analysis of an input binary document image results into a labeled binary image, where every pixel belongs to a particular group of pixels represented by unique labels. These labeled pixels of document image are used to obtain the coordinates and dimensions of connected components. The input document image is scanned from left to right for each row. Each black pixel encountered is assigned a label after examining the neighboring pixels. The labels assigned are then used to extract the connected components from the input image. The connected components so obtained are represented by minimal bounding rectangles (MBR) sometimes also known as minimal bounding boxes. As the term itself suggests, minimal bounding rectangles are the smallest possible regions which enclose the connected components.

 

 

5.2.3.2 Word Segmentation

Word Segmentation Word segmentation is the second subtask of segmentation process. In this subtask word boundaries which separate the individual words are detected. Therefore word segmentation refers to the process of extracting individual words from document image. The isolated text lines detected in the previous step are used as an input in this phase. To extract individual words, projection profile approach has been used, wherein vertical projection profile of all segmented text lines is obtained.

Vertical Projection Profile: The vertical projection profile of a binarized document image is evaluated by adding the values of each pixel which appear along vertical axis for each column.

            Each text line is scanned from top row to bottom row for each column, and total number of foreground pixels is counted for each scan, which results in profile curves of text line. The empty space in projection profile curve (where foreground pixel count is zero) of text line indicates the presence of vertical gaps in the text line and hence represents the word boundaries. This technique of word segmentation works well, as it is often assumed that prominent gaps exist between adjacent words, and therefore can be used to demarcate word boundaries. But in actual practice it has been noticed that this assumption does not always hold. There are documents images in which word separation is not uniform and gaps of small sizes often exist within the words other than the gaps between the words. The text lines of document images may have two types of vertical gaps. The first type of gaps exists between the words called inter-word gaps which truly separate the words from each other and the second type of gaps exist within the words known as intra-word gaps. Inter-word gaps exist in all document images, but intra-word gaps may or may not exist. If these intra-word gaps exist in the text lines then these cannot be considered as the word boundaries. The presence of intra-word gaps often leads to wrong word segmentation. These unwanted gaps are caused by the broken headline of text lines.

Sources of Errors in Word Segmentation

The percentage error in word segmentation when it is performed on various document images, selected from books and newspapers. A close analysis of corresponding document images reveals that these errors occur due to under-segmentation or over-segmentation of some word. The parts of some text lines with word(s) which lead to wrong word segmentation. The presence of intra-word gaps which are comparable to inter-word gaps leads to over-segmentation of words. Similarly in some cases the words with in the text line are so close that inter-word separation becomes smaller than the computed threshold, hence leading to under-segmentation of words. Under-segmentation of words is a commonly occurring problem in newspaper images as the separation between the words is manipulated to adjust the news. It has been found that more than 90% of the words segmentation errors in newspapers are because of under-segmentation, whereas in case of document images chosen from book both under and over-segmentation is equally likely to occur. The presence of punctuation marks, noise or unwanted strokes, also leads to under-segmentation of words.

5.2.3.3 Character Segmentation

In segmentation process line and word segmentation is followed by character segmentation. Character segmentation is the last subtask of segmentation process. The results of character segmentation heavily affect the results of recognition system. Character segmentation basically extracts the recognizable units from the segmented words which are later recognized by subsequent phase to generate the final text. Wrong character segmentation results in such shapes which does not belong to any valid ancient historical characters, and hence wrong character recognition occurs. Therefore the performance of text recognition system largely depends upon the character segmentation.

In character segmentation, initially the word is divided into three zones by using headline and baseline. The headline of each word is detected by using horizontal projection profile. The headline of the text line of which the word is a part is also used as reference while determining the word‘s headline. The headline of each text line is determined immediately after line segmentation process. Generally the headline of a text line/word is determined by finding the head row. The head row is the row which contains maximum number of foreground pixels. The highest peak in horizontal projection profile of a text line/word represents the head row. The following figures show the horizontal projection profile of a word. The start and end of the headline is determined by scanning the rows adjacent to the head row. If the variation of foreground pixel count in the adjacent row is less than 35 percent then that row is considered as part of headline. The headline separates the upper zone from middle and lower zone and very often the upper and middle region characters are connected to the headline. Therefore to isolate these characters the headline of the word has to be removed. The most common approach used to separate the characters from the words is vertical projection profile. In this approach the vertical projection profile of word (without headline) is obtained. The gaps in the profile mark the character boundaries. Though very common, this approach is not always applicable. The presence of shadow characters in the words blocks the vertical white space which separates the two characters and therefore makes it difficult to segment the characters.

In this approach, a connected component analysis is performed on the word without headline to find all the connected components present in the word. The minimal bounding regions of all connected components. These components are then categorized on the basis of the zone to which they belong. The head row helps to identify the upper zone characters, as it separates the upper zone from middle and lower zone. Although in our approach we have considered fused consonant-descender combination as single recognizable unit, but still some of the descenders exist independently (not touching the middle zone). Therefore the lower zone characters are identified with the help of baseline, which separates the middle zone from lower zone. Baseline is determined from the average character height, which is evaluated by performing connected component analysis on all segmented words. All the connected components which lie below the baseline are categorized as lower zone characters. In connected component approach sometimes a single character gets splitted into more than one connected component after removing headline. Therefore to resolve this problem, overlapping of adjacent connected components has been computed. If the overlapping of components is more than 30% then they are considered as single component.

Sources of error in character segmentation

            The performance of character recognition system heavily depends upon character segmentation. Wrong character segmentation directly means wrong character recognition. The analysis performed on character segmentation reveals that in approximately 75% of cases, wrong character segmentation is because of the presence of broken characters which results in over-segmentation and hence degraded recognition performance and some sample words with broken characters. The problem of broken characters also occurs in upper and lower zone characters. Another reason of wrong Page 100 of 185 character segmentation is the occurrence of touching characters in middle and upper zones, merging of characters due to the presence of noise. Misalignment of characters within the word also causes wrong segmentation of characters such a word which leads to the appearance of upper zone characters (because of over-segmentation due to misalignment) after segmentation even though there are no characters in upper zone.

5.3 PROBLEM STATEMENT

The problems during binarization and segmentation leads the documents get damaged. During binarization, Niblack binarization method established the large amount of binarization noise in non-text region also. It generates binarization black noise in empty windows. In Bernsen’s method dependent on the value of threshold value (k) and also on the size of the window. The threshold value always never projected the dependency of this method. The structural segmentation that relies upon the information of the structure of required portion of the image i.e. the required region which is to be segmented. The region based segmentation growing method noticed the neighboring pixels within one region have similar values. The Statistical Region Merging (SRM) starts by building the graph of pixels using 4-connectedness with edges weighted by the absolute value of the intensity difference of an image. Initially each pixel forms a single pixel region. SRM then sorts those edges in a priority queue and decides whether or not to merge the current regions belonging to the edge pixels using a statistical predicate. So, it generated the edge-based issues over the pixel intensity and also never retained the required threshold value. So, the segmentation process had some issues to performed the better binarization and segmentation process.

5.4 PROPOSED METHODOLOGY

The binarization is the process where to eliminate most of the noises. Here to use Otsu and Sauvola algorithm and Projection Profile and Bounding Box method for segmentation.

 

 

Figure 5.2 Flow diagram of the proposed work

 

5.4.1 Binarization

5.4.1.1 Input Image

        Firstly, image of input data is historical document image. The historical document image can be any document of different styles, dimensions. This historical input image is fed to pre-processing section so as to process over that historical image.

 

5.4.1.2 Pre-Processing

A true colour image can be converted to a grayscale image by preserving the luminance (brightness) of the image. A grayscale image is composed of different shades of gray colour. Here the RGB image is a combination of RED, BLUE AND GREEN colors. The RGB image (24 bit) is 3 dimensional. The grayscale image (8 bit) is obtained from the RGB image by combining 30% of RED, 60% of GREEN and 11% of BLUE.

  1. Otsu Thresholding Algorithm

Otsu thresholding algorithm involves obtaining the threshold value for binarization of handwritten document. This algorithm is mostly used by everyone for obtaining binary image. The working process involves selecting the best threshold value considering every possible value to evaluate the result.

In image processing, Otsu’s thresholding method is used for automatic binarization level decision, based on the shape of the histogram. Otsu’s thresholding method involves iterating through all the possible threshold values and calculating a measure of spread for the pixel levels each side of the threshold, i.e. the pixels that either falls in foreground or background. The aim is to find the threshold value where the sum of foreground and background spreads is at its minimum.

 

Algorithm

The algorithm exhaustively searches for the threshold that minimizes the intra-class variance, defined as a weighted sum of variances of the two classes

  1. Histogram and probabilities of each intensity level are to be computed
  2. Set up initial class probability and class means
  3. Step through all possible thresholds maximum intensity
    1. Update class probability and class means
    2. Compute Variance
  4. Desired threshold corresponds to the maximum Variance.

The algorithm represented as follow,

 

 

 

1.      Compute histogram and probabilities of each intensity level

2.      Set up initial ω i ( 0 ) {\displaystyle \omega _{i}(0)} class probability and μ i ( 0 ) {\displaystyle \mu _{i}(0)} class means.

3.      Step through all possible thresholds t = 1 , … {\displaystyle t=1,\ldots } maximum intensity

1.      Update ω i {\displaystyle \omega _{i}} class probability and μ i ( 0 ) {\displaystyle \mu _{i}(0)} class means μ i {\displaystyle \mu _{i}}

2.      Compute σ b 2 ( t ) {\displaystyle \sigma _{b}^{2}(t)} Variance

4.      Desired threshold corresponds to the maximum σ b 2 ( t ) {\displaystyle \sigma _{b}^{2}(t)} Variance.

 

 

 

 

 

 

 

 

 

 

 

 

B)    Sauvola Thresholding Method

It is the local thresholding method and the advanced method of Niblack’s binarization method especially when the background contains light texture, noises, stained and illumination (Di Lu, et al, 2015). The threshold value of Sauvola algorithm is locally adaptive. It is calculated using local gray level mean and Standard deviation. The main advantage of this method is to detect background regions and prevent noise pixels from appearing on the output. The threshold value can be obtained using the following equation,

T= m(1- k(1- σ/R ))                                                                                                       (5.1)

 

Where,         m is the local gray level mean k is a user defined parameter

k is a user-defined parameter

R is the standard deviation’s dynamic range

σ is the Standard Deviation

The size of the window used to compute mean and standard deviation remains user-defined. The relatively well performance on noisy and blurred documents and the computational efficiency are the main advantages of Sauvola’smethod.In this representation, black is represented by a value 1, and white by a value 0. Conventional Sauvola’s method consists of just one step, and its operation is governed by only three parameters. In spite of its advantages, Sauvola’s method suffers from the high computational complexity needed to calculate the required statistics, especially for big values of s. So as to overcome this computational complexity and improve the fault tolerance to internal circuits stochastic computing can be used. Figure 5.3 shows the 5.3 (a) the input image and 5.3 (b) binarized image of (a).

 

Figure 5.3 (a) Input image 1, (b) Binarized image of (a)

 

 

Figure 5.4 (a) Input image 2, (b) Binarized image of (a)

Figure 5.4 represents the 5.4 (a) Input image 2, and 5.4 (b) Binarized image of (a).

5.4.2 Segmentation

In Image Processing, projection profile refers to projection of sum of hits/positives along an axis from bi-dimensional image. Projection profile method is majorly used for segmentation of text objects present inside text documents. Projection profile is calculated separately for different axis. Projection profile along vertical axis is called Vertical Projection profile. Vertical projection profile is calculated for every column as sum of all row pixel values inside the column. Horizontal Projection profile is the projection profile of an image along horizontal axis. Horizontal Projection profile is calculated for every row as sum of all column pixel values inside the row. In proposed method, vertical profile is calculated by adding every elements in 2D matrix resulting in the conversion of n x n matrix into n x 1 matrix and then the threshold value can be fixed using the mean value of that n x 1 matrix.

  1. Line Segmentation

Extracting Text line is the initial step of segmentation, here to use projection profile method. Projection profile method is the one in which works on two- dimensional matrix. Here the 2D matrix is the input image. This method involves in converting the axis of the matrix. Depends on the axis it can be classified as Vertical and Horizontal projection profile. For text line extraction Vertical projection profile is used. Here the matrix formed by the pixel values (M x N) is converted into (M x 1) by adding the elements in each row. Then the threshold values are calculated by the summation of the elements. Text line segmentation representation illustrates in figure 5.5.

 

Figure 5.5 Text line segmentation

B. Word Segmentation

It involves segmenting every word from the segmented lines. This can be done by the using Horizontal profile method where the M x N matrix is converted into 1 x M matrix by adding the elements in the columns. After calculating mean of the matrix, the threshold range is fixed. At the termination of the process word is extracted from the extracted text line. The word segmentation representation shown in figure 5.6.

Figure 5.6 Word segmentation

C. Character Segmentation

Bounding Box method is the most popular method meant for segmenting objects or part of some objects. In proposed method, bounding box technique is used for Character Segmentation. Before segmentation, noise pixels which are less than 30 pixels in an image are removed by thresholding to get clear clean image. In this method, the unjointed characters are separated by making a boundary of rectangle. Each character is labelled using labelling connected components analysis. Then bounding box technique is performed to each character in an image. After each character is detected it draws a green colour rectangular box around the character. Segmented character image is resized to 25*50 pixels. Finally, the characters are segmented. The Character Segmentation shown in figure 5.7.

Figure 5.7 Character Segmentation

 

5.5 RESULTS AND DISCUSSION

Various dataset images are taken and analysed by using Projection Profile and Bounding Box Method. Here, we evaluate the performance of the proposed method. The performance metrics for historical images are improved though the proposed method. The proposed method makes use of MATLAB software to perform image processing algorithm. As mentioned we are using the dataset from HDLAC 2011.  Those images contain noises and some have complex backgrounds.

The PRIMA (Pattern Recognition and Image Analysis Research Lab) Dataset contains Images that range from Gray scale to color, from machine printed to Handwritten and finally real to synthetic.Our proposed method has been tested on HDLAC 2011 Dataset

Dataset Description

  1. HDLAC 2011 Dataset,
  2. The complete dataset consists of printed documents of various types, such as books, newspapers, journals and legal documents.
  3. 17 different languages and 11 scripts from the 17th to the early 20th century.

5.5.1 HDLAC 2011 Datasets

 

Image 1                                                          Image 2

 

 

`                   Image 3                                                                         Image 4

                                                Figure 5.8 Sample Images

The above figure 5.8 mentioned figures are sample images of HDLAC 2011 dataset and are obtained from PRIMA Research Lab.

Objective Evaluation Criteria

True Positive (TP)

The test result of one that detects the state when the state is present.

True Negative (TN)

The test result of one that does not detect the state when the state is absent.

False Positive (FP)

The test result of one that detects the state when the state is absent.

False Negative (FN)

The test result of one that does not detect the state when the state is present.

Precision

Precision is the proportion of positives that correspond to the presence of the state.

Precision =                                                                                                                  (5.2)

Recall

Recall measures the ability of a test to detect the state when the state is present.

 

Recall =                                                                                                                      (5.3)

F-Measure

F Measure is the evaluation of a test’s accuracy where weighted harmonic mean of recall and precision are calculated.

F-Measure = 2*Recall +                                                                             (5.4)

Accuracy

It refers to closeness of the measurements to a specific value.

Accuracy =                                                                                                  (5.5)

 

Table 5.1 shows the performance results for Different Images

 

FILE NAMEPRECISIONRECALLFMEASUREACCURACY
Image 10.9783280.7737790.8641130.963779
Image 20.9948090.8577910.9212330.980051
Image 30.9614350.8565580.9059710.951996
Image 40.9920910.8630270.9230690.97675
Image 50.9872180.5976940.9445890.942095
Image 60.9956290.8154210.8965590.950562
Image 70.9973090.8760530.9327570.981518
Image 80.9996240.8859370.9393530.968921
Image 90.9985320.8758330.9331670.973874
Image 100.9999390.8738130.9326310.979924
Image 110.9986730.8268320.9046640.966268
Image 120.9974480.7919210.8828810.968758
Image 130.9956590.8305210.9056240.967484
Image 140.9961420.7920490.8824490.961678
Image 150.9982910.8115770.8953030.969484
Image 160.99610.799670.887150.960956
Image 170.9988310.8256870.9040430.963291
Image 180.9983910.7816550.8768280.967367
Image 190.9971830.8347620.9087730.976425
Image 200.9992660.8333710.908810.978846
Image 210.9953020.8527730.9185410.973992
Image 220.9951080.8066930.8910490.975988
Image 230.9953210.7863270.8785660.976914
Image 240.9945660.8144840.8955620.972961
Image 250.9922720.8054420.8891490.97745
Image 260.9945660.8144840.8955620.972961
Image 270.9937310.8108240.8930080.975885
Image 280.9871710.810660.890250.975221
Image 290.9943030.822750.9004280.978085
Image 300.9992870.8485590.9177760.981563
Image 310.9919510.7769370.8713760.971672
Image 320.9914280.7746480.8697330.97181
Image 330.9948040.7916170.8816550.971533
Image 340.9972470.8172950.8983480.981758
Image 350.9971950.809140.8933790.975637
Image 360.9944940.7334820.8442750.962931
Image 370.9926130.7509860.8550570.958432
Image 380.9942630.7804950.8745050.980619
Image 390.9977160.7749950.8723640.961904
Image 400.9975840.7629740.8646470.969345

 

Table 5.1 Performance results for Different Images

Table 5.2 represents the overall performance results

 

PRECISIONRECALLF-MEASUREACCURACY
99.40%80.80%89.60%97%

Table 5.2 Overall performance results

 

Figure 5.9 represents the performance metrics of proposed method

Figure 5.9 Plotted performance metrics of proposed method

 

 

 

 

5.6 SUMMARY

In Proposed method two binarization algorithms and projection profile and bounding box methods for segmentation of manuscripts documents. The main idea is to perform double binarization. The method uses Otsu’s threshold followed by Sauvola’s thresholds. The proposed approach is lusty enough to solve degradation of the document images such as Large signal dependent noise, Non-uniform illumination, Smear, Low contrast, Faint characters and strain (Minhua Li, et al, 2010). Extensive experiments on the HDLAC series datasets show that the proposed method is appropriate for the binarization of historical images, and that our method demonstrated better performance compared to state of art techniques in terms of F- score, Recall and Precision. Here projection profile method is used to extract text lines from the document images and Bounding Box method for individual characters. The proposed binarization and segmentation method has performed some processing over the HDLAC 2011 images and has obtained certain amount of accuracy. Future work involves Feature Extraction and Classification of the images.

 

 

 

  Remember! This is just a sample.

Save time and get your custom paper from our expert writers

 Get started in just 3 minutes
 Sit back relax and leave the writing to us
 Sources and citations are provided
 100% Plagiarism free
error: Content is protected !!
×
Hi, my name is Jenn 👋

In case you can’t find a sample example, our professional writers are ready to help you with writing your own paper. All you need to do is fill out a short form and submit an order

Check Out the Form
Need Help?
Dont be shy to ask