The impact of discretization methods on Chinese handwriting identification

Identification based on Chinese handwriting is an interesting research in the field of pattern recognition and computer vision. Recently, many innovative methods and approaches have been developed for writer identification. Unlike character of western alphabet such as English, German, French, some o...

Full description

Bibliographic Details
Main Author:	Wong, Yee Leng
Format:	Thesis
Published:	2010
Subjects:	QA75 Electronic computers. Computer science

_version_	1796855739745042432
author	Wong, Yee Leng
author_facet	Wong, Yee Leng
author_sort	Wong, Yee Leng
collection	ePrints
description	Identification based on Chinese handwriting is an interesting research in the field of pattern recognition and computer vision. Recently, many innovative methods and approaches have been developed for writer identification. Unlike character of western alphabet such as English, German, French, some oriental character such as Korean, Arabic and Chinese have structural characteristics. These structural characteristics, particularly on Chinese character have a complex structure due to the numerous strokes that warped into a cursive shape and have much larger set of characters. Hence, more features are needed to be generated prior to the classification phase for better identification. However, these features need to be well-represented for identification purposes. Hence in this study, an improved discretization is implemented to transform the range of continuous quantitative values of writer’s features into a number of appropriate intervals, denoted as an integer label. Several experiments have been conducted with two different types of datasets: pre-discretized and post-discretized datasets. Post-discretized datasets is the extarcted features that have performed with discretization process; while pre-discretized are the original features, obtained from Direction-based Feature Extraction (DFE) technique. For reliable identification performance through discretization, 10, 7 and 5 crossvalidations (CV) have been tested on both datasets. The experiments have shown that the overall best result are obtained with discretized data, with identification accuracy above 94.0% compared to pre-discretized with identification accuracy below 50.0%. It can be concluded that the discretization process is efficient for representing the writers’ features in obtaining higher identification rates for better forensic document analysis.
first_indexed	2024-03-05T18:33:00Z
format	Thesis
id	utm.eprints-19120
institution	Universiti Teknologi Malaysia - ePrints
last_indexed	2024-03-05T18:33:00Z
publishDate	2010
record_format	dspace
spelling	utm.eprints-191202020-02-06T01:33:52Z http://eprints.utm.my/19120/ The impact of discretization methods on Chinese handwriting identification Wong, Yee Leng QA75 Electronic computers. Computer science Identification based on Chinese handwriting is an interesting research in the field of pattern recognition and computer vision. Recently, many innovative methods and approaches have been developed for writer identification. Unlike character of western alphabet such as English, German, French, some oriental character such as Korean, Arabic and Chinese have structural characteristics. These structural characteristics, particularly on Chinese character have a complex structure due to the numerous strokes that warped into a cursive shape and have much larger set of characters. Hence, more features are needed to be generated prior to the classification phase for better identification. However, these features need to be well-represented for identification purposes. Hence in this study, an improved discretization is implemented to transform the range of continuous quantitative values of writer’s features into a number of appropriate intervals, denoted as an integer label. Several experiments have been conducted with two different types of datasets: pre-discretized and post-discretized datasets. Post-discretized datasets is the extarcted features that have performed with discretization process; while pre-discretized are the original features, obtained from Direction-based Feature Extraction (DFE) technique. For reliable identification performance through discretization, 10, 7 and 5 crossvalidations (CV) have been tested on both datasets. The experiments have shown that the overall best result are obtained with discretized data, with identification accuracy above 94.0% compared to pre-discretized with identification accuracy below 50.0%. It can be concluded that the discretization process is efficient for representing the writers’ features in obtaining higher identification rates for better forensic document analysis. 2010-12 Thesis NonPeerReviewed Wong, Yee Leng (2010) The impact of discretization methods on Chinese handwriting identification. Masters thesis, Universiti Teknologi Malaysia, Faculty of Computer Science and Information Systems.
spellingShingle	QA75 Electronic computers. Computer science Wong, Yee Leng The impact of discretization methods on Chinese handwriting identification
title	The impact of discretization methods on Chinese handwriting identification
title_full	The impact of discretization methods on Chinese handwriting identification
title_fullStr	The impact of discretization methods on Chinese handwriting identification
title_full_unstemmed	The impact of discretization methods on Chinese handwriting identification
title_short	The impact of discretization methods on Chinese handwriting identification
title_sort	impact of discretization methods on chinese handwriting identification
topic	QA75 Electronic computers. Computer science
work_keys_str_mv	AT wongyeeleng theimpactofdiscretizationmethodsonchinesehandwritingidentification AT wongyeeleng impactofdiscretizationmethodsonchinesehandwritingidentification

The impact of discretization methods on Chinese handwriting identification

Similar Items