Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation

Bahasa Melayu (Malay language) is a language spoken in Malaysia and many countries around it. It has rich literature and deep roots in culture. Bahasa Melayu language uses roman character set (i.e. A-Z) identical to English language. The written language uses the character set as building blocks to...

Full description

Bibliographic Details
Main Authors: Shah, Asadullah, Saidin, Aznan Zuhid, Alshaikhli, Imad Fakhri Taha, Zeki, Akram M.
Format: Article
Language:English
Published: Elsevier 2011
Subjects:
Online Access:http://irep.iium.edu.my/11804/1/frequencies_determination_bahasamelayu_asadullah.pdf
_version_ 1796875909733548032
author Shah, Asadullah
Saidin, Aznan Zuhid
Alshaikhli, Imad Fakhri Taha
Zeki, Akram M.
author_facet Shah, Asadullah
Saidin, Aznan Zuhid
Alshaikhli, Imad Fakhri Taha
Zeki, Akram M.
author_sort Shah, Asadullah
collection IIUM
description Bahasa Melayu (Malay language) is a language spoken in Malaysia and many countries around it. It has rich literature and deep roots in culture. Bahasa Melayu language uses roman character set (i.e. A-Z) identical to English language. The written language uses the character set as building blocks to build word, sentences and phrases along with special punctuations and signs to create documents of interest. In this paper, results of preliminary investigation of Malay text documents are provided. For this purpose scanning of articles written upon various topics in Malay were carried out. Approximately 31 thousand characters from different articles are scanned. Preliminary observations indicate that on average, character “A” occurs 19%, character “N” occur 10%, character “E” occur “9%”and character “I” 8% in text. However, it is also observed from the data that, these are the characters from over all set with highest frequencies of occurrences and it is expected that during further investigation they will remain as higher frequency occurring characters. Furthermore, the results indicate that for Bahasa Melayu characters appearance in text is very close in character frequencies of Bahasa Indonesia, but having different appearance of characters than English language. The investigation also indicate that these two languages, Bahasa Melayu and Bahasa Indonesia share close phonetic structure but not English, though all three use same character set.
first_indexed 2024-03-05T22:45:15Z
format Article
id oai:generic.eprints.org:11804
institution International Islamic University Malaysia
language English
last_indexed 2024-03-05T22:45:15Z
publishDate 2011
publisher Elsevier
record_format dspace
spelling oai:generic.eprints.org:118042021-01-18T07:44:06Z http://irep.iium.edu.my/11804/ Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation Shah, Asadullah Saidin, Aznan Zuhid Alshaikhli, Imad Fakhri Taha Zeki, Akram M. QA75 Electronic computers. Computer science Bahasa Melayu (Malay language) is a language spoken in Malaysia and many countries around it. It has rich literature and deep roots in culture. Bahasa Melayu language uses roman character set (i.e. A-Z) identical to English language. The written language uses the character set as building blocks to build word, sentences and phrases along with special punctuations and signs to create documents of interest. In this paper, results of preliminary investigation of Malay text documents are provided. For this purpose scanning of articles written upon various topics in Malay were carried out. Approximately 31 thousand characters from different articles are scanned. Preliminary observations indicate that on average, character “A” occurs 19%, character “N” occur 10%, character “E” occur “9%”and character “I” 8% in text. However, it is also observed from the data that, these are the characters from over all set with highest frequencies of occurrences and it is expected that during further investigation they will remain as higher frequency occurring characters. Furthermore, the results indicate that for Bahasa Melayu characters appearance in text is very close in character frequencies of Bahasa Indonesia, but having different appearance of characters than English language. The investigation also indicate that these two languages, Bahasa Melayu and Bahasa Indonesia share close phonetic structure but not English, though all three use same character set. Elsevier 2011-12-07 Article PeerReviewed application/pdf en http://irep.iium.edu.my/11804/1/frequencies_determination_bahasamelayu_asadullah.pdf Shah, Asadullah and Saidin, Aznan Zuhid and Alshaikhli, Imad Fakhri Taha and Zeki, Akram M. (2011) Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation. Procedia - Social and Behavioral Sciences, 27. pp. 233-240. ISSN 18770428 http://dx.doi.org/10.1016/j.sbspro.2011.10.603 doi:10.1016/j.sbspro.2011.10.603
spellingShingle QA75 Electronic computers. Computer science
Shah, Asadullah
Saidin, Aznan Zuhid
Alshaikhli, Imad Fakhri Taha
Zeki, Akram M.
Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation
title Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation
title_full Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation
title_fullStr Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation
title_full_unstemmed Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation
title_short Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation
title_sort frequencies determination of characters for bahasa melayu results of preliminary investigation
topic QA75 Electronic computers. Computer science
url http://irep.iium.edu.my/11804/1/frequencies_determination_bahasamelayu_asadullah.pdf
work_keys_str_mv AT shahasadullah frequenciesdeterminationofcharactersforbahasamelayuresultsofpreliminaryinvestigation
AT saidinaznanzuhid frequenciesdeterminationofcharactersforbahasamelayuresultsofpreliminaryinvestigation
AT alshaikhliimadfakhritaha frequenciesdeterminationofcharactersforbahasamelayuresultsofpreliminaryinvestigation
AT zekiakramm frequenciesdeterminationofcharactersforbahasamelayuresultsofpreliminaryinvestigation