Summary: | Isolated characters, especially Latin characters, usually contain many branches on their characters’ nodes that causes difficulties to decide which direction would a traverse continues. Furthermore, a revisit to previous visited nodes is often required in order to visit all the nodes in one continuous route. In this thesis, some techniques to solve problems for Handwritten Character Recognition (HCR) involving isolated characters are proposed. HCR consists of three stages which are pre-processing, feature extraction and classification. In the pre-processing, thinning algorithm was applied to remove the redundancies of pixel in character binary image. In the feature extraction, Freeman Chain Code (FCC) was used as data representation that uses 8-neighbourhood directions labelled as 1 to 8. However, the FCC representation is dependent on the route length and branches of the characters’ node. The larger the number of branches, which is common for isolated characters, the longer the time required for the extraction. Here, a FCC extraction based on Heuristic Randomized-based algorithm was proposed to reduce the route length and computational time. Based on the experiment, it was demonstrated that the proposed FCC extraction is superior in terms of producing the shortest route length with minimum computational time, compared to Enumeration-based algorithm, Genetic Algorithm and Ant Colony Optimization. In this thesis, features vector extracted using the FCC extraction was used as input to the classification. There were 69 features used, 64 features were from the chain codes and 5 features were from original image. Support Vector Machine (SVM) and Artificial Neural Network (ANN) were chosen as classifier in the classification of image characters. The performance of ANN is better than SVM in terms of accuracy. The accuracy of ANN on sample data from the National Institute of Standards and Technology database reached more than 96% for all upper-case and lower-case, more than 98% for all upper-case, lower-case and characters, and more than 90% for digits only.
|