EPrints@IIT Delhi >
Faculty Research Publicatons  >
Electrical Engineering >

Please use this identifier to cite or link to this item: http://eprint.iitd.ac.in/handle/2074/1973

Title: Trainable script identification strategies for Indian languages
Authors: Chaudhury, S
Sheth, R
Keywords: multi-lingual documents
width-to-height ratio
Issue Date: 1999
Citation: Document Analysis and Recognition, ICDAR Proceedings of the Fifth International Conference on, 657 - 660p.
Abstract: Identification of the script in an image of a document page is of primary importance for a system processing multi-lingual documents. In this paper three trainable classification schemes have been proposed for identification of Indian scripts. The first scheme is based upon a frequency domain representation of the horizontal profile of the textual blocks. The other two schemes use connected components extracted from the textual region. We have proposed a novel Gabor filter-based feature extraction scheme for the connected components. We have also found that frequency distribution of the width-to-height ratio of the connected components can also be used for script recognition. It has been experimentally found that the Gabor filter-based scheme provides the most reliable performance. However, the other two techniques are computationally more efficient
URI: http://eprint.iitd.ac.in/dspace/handle/2074/1973
Appears in Collections:Electrical Engineering

Files in This Item:

File Description SizeFormat
chaudhurytra1999.pdf172.3 kBAdobe PDFView/Open
View Statistics

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.


Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - Feedback