Arabic script web page language identifications using decision tree neural networks

In this paper, we propose a hybrid approach of Arabic scripts web page language identification based on decision tree and ARTMAP approaches. We use the decision tree approach to find the general identities of a web document, be it an Arabic script-based or a non-Arabic-based. Then, we use the select...

Full description

Bibliographic Details
Main Authors: Selamat, Ali, Ng, Choon Ching
Format: Article
Published: Elsevier Limited 2011
Subjects:
_version_ 1796856459186667520
author Selamat, Ali
Ng, Choon Ching
author_facet Selamat, Ali
Ng, Choon Ching
author_sort Selamat, Ali
collection ePrints
description In this paper, we propose a hybrid approach of Arabic scripts web page language identification based on decision tree and ARTMAP approaches. We use the decision tree approach to find the general identities of a web document, be it an Arabic script-based or a non-Arabic-based. Then, we use the selected representations of identified pages from the decision tree approach as an input to the ARTMAP neural network for further verification of the diversity of languages detected by the algorithm. From our initial experiments, we found that, although the decision tree approach may achieve a higher accuracy than ARTMAP, the former may not be as reliable as the ARTMAP approach if the language used is extended to other types of Arabic script web documents in different languages (e.g., Urdu, Arabic, Persian, etc.). Therefore, we propose this hybrid decision tree-ARTMAP approach in order to improve the performance of the Arabic script language identification on web documents in a variety of languages. The result shows that the proposed approach has outperformed both decision tree and the default ARTMAP approaches.
first_indexed 2024-03-05T18:43:16Z
format Article
id utm.eprints-28885
institution Universiti Teknologi Malaysia - ePrints
last_indexed 2024-03-05T18:43:16Z
publishDate 2011
publisher Elsevier Limited
record_format dspace
spelling utm.eprints-288852019-01-31T11:30:15Z http://eprints.utm.my/28885/ Arabic script web page language identifications using decision tree neural networks Selamat, Ali Ng, Choon Ching PL Languages and literatures of Eastern Asia, Africa, Oceania QA75 Electronic computers. Computer science In this paper, we propose a hybrid approach of Arabic scripts web page language identification based on decision tree and ARTMAP approaches. We use the decision tree approach to find the general identities of a web document, be it an Arabic script-based or a non-Arabic-based. Then, we use the selected representations of identified pages from the decision tree approach as an input to the ARTMAP neural network for further verification of the diversity of languages detected by the algorithm. From our initial experiments, we found that, although the decision tree approach may achieve a higher accuracy than ARTMAP, the former may not be as reliable as the ARTMAP approach if the language used is extended to other types of Arabic script web documents in different languages (e.g., Urdu, Arabic, Persian, etc.). Therefore, we propose this hybrid decision tree-ARTMAP approach in order to improve the performance of the Arabic script language identification on web documents in a variety of languages. The result shows that the proposed approach has outperformed both decision tree and the default ARTMAP approaches. Elsevier Limited 2011-01 Article PeerReviewed Selamat, Ali and Ng, Choon Ching (2011) Arabic script web page language identifications using decision tree neural networks. Pattern Recognition, 44 (1). pp. 133-144. ISSN 0031-3203 http://dx.doi.org/10.1016/j.patcog.2010.07.009 DOI:10.1016/j.patcog.2010.07.009
spellingShingle PL Languages and literatures of Eastern Asia, Africa, Oceania
QA75 Electronic computers. Computer science
Selamat, Ali
Ng, Choon Ching
Arabic script web page language identifications using decision tree neural networks
title Arabic script web page language identifications using decision tree neural networks
title_full Arabic script web page language identifications using decision tree neural networks
title_fullStr Arabic script web page language identifications using decision tree neural networks
title_full_unstemmed Arabic script web page language identifications using decision tree neural networks
title_short Arabic script web page language identifications using decision tree neural networks
title_sort arabic script web page language identifications using decision tree neural networks
topic PL Languages and literatures of Eastern Asia, Africa, Oceania
QA75 Electronic computers. Computer science
work_keys_str_mv AT selamatali arabicscriptwebpagelanguageidentificationsusingdecisiontreeneuralnetworks
AT ngchoonching arabicscriptwebpagelanguageidentificationsusingdecisiontreeneuralnetworks