In this activity, we are to extract handwritten text from a scanned document, shown below.
Notice that the image is tilted. The angle of rotation was calculated by copying the image in MS Powerpoint, drawing a line following the writing guides. The MS Powerpoint displays the "length" and "width" of the line, or its x-component and y-component. The angle is calculated by taking the inverse tangent of the ratio of the y-component and the x-component, and is found to be tilted by 1.284 degrees.
The image is rotated by using the mogrify command in Scilab, and the resulting image is shown below.
After the image is rotated, the next step is to remove the horizontal lines. The FT of the image was taken, and the frequencies that correspond to the horizontal lines were removed by multiplying an appropriate mask. Since we want to remove the horizontal lines, the mask removes the frequencies along the vertical in the frequency domain.
Applying the mask to the image resulted in the removal of the horizontal lines, shown below. (Recall that after taking the inverse transform, the image is rotated by 180 degrees. The image below has been corrected for that.)
I selected only a portion of the image containing a handwritten text so that it will be easier to process. The original image is in the first panel (1st row, 1st column) in the image below.
To enhance the text, I used the command sharpen in Scilab. The text is sharpened three times, seen in the second panel (1st row, 2nd column) in the image above, then binarized (third panel). The opening operator was then applied, to remove the isolated pixels in the neighborhood of the handwritten text. The closing operator was then applied to close the gaps in each letter. The image is then dilated to further close the gap, especially for the letter 'D'.
To make the handwriting one pixel thich, the thin operator is applied. Below is the final result.
The resulting extracted handwriting is not as legible as I wanted it to be. In the original text, the handwriting is either "DENIO III" or "DERIO III", but in the extraction it seems like "DErIC III". The letter 'D' still looks like letter 'D', maybe because our minds can connect the dots. The letter 'E' still looks like 'E'. The next letter looks like the lowercase 'r', far from the original text. The letter 'I' is easy because it's just a straight line. The letter 'O' now looks like letter 'C'. The 'III' is also correct, because they are just lines.
Other morphological operations may be applied to any step in this process which will probably give better results. I just haven't though of it yet.
If we want to just find other instances of the word in the image, we can use template matching by correlating a sample image of the word with the image. To make the sample image of the word, I used font size 11 and font type Arial Bold Italic, shown below. The original image is binarized first.
By correlating the above image with the binarized image (above, left), we get the locations of the word DESCRIPTION in the images.
In all of the images, white areas indicate that there are similarities of the template to the images. But the brightest spots are the positions of highest correlation, which indicated the locations of the word in the subimages. While for the true locations of the word DESCRIPTION in the original image have a good positive correlation with the template,tIt seems that almost all positions with text are very highly correlated with the template. Areas with a high density of text, whether typewritten or handwritten, are highly correlated with the template.
I give myself 8.5 points for this activity. I was able to extract the handwritten text up to the part where it is only one pixel thick. However, the resulting extraction is not very good, with some letters looking very different from the original text. The correlation also gave negative results, as areas which did not really contain the word DESCRIPTION have high values of correlation.
0 comments:
Post a Comment