Multiword phrases indexing for Malay-English cross-language information retrieval

Cross-Language Information Retrieval (CLIR) is the process of providing queries in one language and returning documents relevant to that query which is written in a different language. A popular approach to CLIR is to translate the query into the language of the documents being retrieved. One of the...

Full description

Bibliographic Details
Main Authors: Rais, Nurjannaton Hidayah, Abdullah, Muhamad Taufik, Abdul Kadir, Rabiah
Format: Article
Language:English
Published: Asian Network for Scientific Information 2011
Online Access:http://psasir.upm.edu.my/id/eprint/22487/1/Multiword%20phrases%20indexing%20for%20Malay-English%20cross-language%20information%20retrieval.pdf
_version_ 1825946880911081472
author Rais, Nurjannaton Hidayah
Abdullah, Muhamad Taufik
Abdul Kadir, Rabiah
author_facet Rais, Nurjannaton Hidayah
Abdullah, Muhamad Taufik
Abdul Kadir, Rabiah
author_sort Rais, Nurjannaton Hidayah
collection UPM
description Cross-Language Information Retrieval (CLIR) is the process of providing queries in one language and returning documents relevant to that query which is written in a different language. A popular approach to CLIR is to translate the query into the language of the documents being retrieved. One of the simplest and most effective methods for query translation is to perform dictionary look-up based on a bilingual dictionary. However, lack of dictionary coverage prune two problems: proper names and compound words handling. Relevance concept words consist of proper names and compound words, were applied in document and query indexing and query translation processes. We believed by using concept-based indexing and translations makes proper names and compound words translation possible. A series of experiments conducted to test the compound words and proper names translation methods in CLIR system. The best retrieval performance obtained from the combination of query translation approach-select all translations listed in the dictionary, alternative weighting scheme and proper names identification and translation. For both Malay and English document collection, these approaches outperformed query translation approach, select all translations listed in the dictionary, by 1.0 and 9%. The results show that proper names and compound words translations were important in query translation for Malay-English CLIR.
first_indexed 2024-03-06T07:54:05Z
format Article
id upm.eprints-22487
institution Universiti Putra Malaysia
language English
last_indexed 2024-03-06T07:54:05Z
publishDate 2011
publisher Asian Network for Scientific Information
record_format dspace
spelling upm.eprints-224872019-11-12T07:39:01Z http://psasir.upm.edu.my/id/eprint/22487/ Multiword phrases indexing for Malay-English cross-language information retrieval Rais, Nurjannaton Hidayah Abdullah, Muhamad Taufik Abdul Kadir, Rabiah Cross-Language Information Retrieval (CLIR) is the process of providing queries in one language and returning documents relevant to that query which is written in a different language. A popular approach to CLIR is to translate the query into the language of the documents being retrieved. One of the simplest and most effective methods for query translation is to perform dictionary look-up based on a bilingual dictionary. However, lack of dictionary coverage prune two problems: proper names and compound words handling. Relevance concept words consist of proper names and compound words, were applied in document and query indexing and query translation processes. We believed by using concept-based indexing and translations makes proper names and compound words translation possible. A series of experiments conducted to test the compound words and proper names translation methods in CLIR system. The best retrieval performance obtained from the combination of query translation approach-select all translations listed in the dictionary, alternative weighting scheme and proper names identification and translation. For both Malay and English document collection, these approaches outperformed query translation approach, select all translations listed in the dictionary, by 1.0 and 9%. The results show that proper names and compound words translations were important in query translation for Malay-English CLIR. Asian Network for Scientific Information 2011 Article PeerReviewed text en http://psasir.upm.edu.my/id/eprint/22487/1/Multiword%20phrases%20indexing%20for%20Malay-English%20cross-language%20information%20retrieval.pdf Rais, Nurjannaton Hidayah and Abdullah, Muhamad Taufik and Abdul Kadir, Rabiah (2011) Multiword phrases indexing for Malay-English cross-language information retrieval. Information Technology Journal, 10 (8). pp. 1554-1562. ISSN 1812-5638; ESSN: 1812-5646 https://scialert.net/abstract/?doi=itj.2011.1554.1562 10.3923/itj.2011.1554.1562
spellingShingle Rais, Nurjannaton Hidayah
Abdullah, Muhamad Taufik
Abdul Kadir, Rabiah
Multiword phrases indexing for Malay-English cross-language information retrieval
title Multiword phrases indexing for Malay-English cross-language information retrieval
title_full Multiword phrases indexing for Malay-English cross-language information retrieval
title_fullStr Multiword phrases indexing for Malay-English cross-language information retrieval
title_full_unstemmed Multiword phrases indexing for Malay-English cross-language information retrieval
title_short Multiword phrases indexing for Malay-English cross-language information retrieval
title_sort multiword phrases indexing for malay english cross language information retrieval
url http://psasir.upm.edu.my/id/eprint/22487/1/Multiword%20phrases%20indexing%20for%20Malay-English%20cross-language%20information%20retrieval.pdf
work_keys_str_mv AT raisnurjannatonhidayah multiwordphrasesindexingformalayenglishcrosslanguageinformationretrieval
AT abdullahmuhamadtaufik multiwordphrasesindexingformalayenglishcrosslanguageinformationretrieval
AT abdulkadirrabiah multiwordphrasesindexingformalayenglishcrosslanguageinformationretrieval