Multiword phrases indexing for Malay-English cross-language information retrieval
Cross-Language Information Retrieval (CLIR) is the process of providing queries in one language and returning documents relevant to that query which is written in a different language. A popular approach to CLIR is to translate the query into the language of the documents being retrieved. One of the...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Asian Network for Scientific Information
2011
|
Online Access: | http://psasir.upm.edu.my/id/eprint/22487/1/Multiword%20phrases%20indexing%20for%20Malay-English%20cross-language%20information%20retrieval.pdf |
_version_ | 1825946880911081472 |
---|---|
author | Rais, Nurjannaton Hidayah Abdullah, Muhamad Taufik Abdul Kadir, Rabiah |
author_facet | Rais, Nurjannaton Hidayah Abdullah, Muhamad Taufik Abdul Kadir, Rabiah |
author_sort | Rais, Nurjannaton Hidayah |
collection | UPM |
description | Cross-Language Information Retrieval (CLIR) is the process of providing queries in one language and returning documents relevant to that query which is written in a different language. A popular approach to CLIR is to translate the query into the language of the documents being retrieved. One of the simplest and most effective methods for query translation is to perform dictionary look-up based on a bilingual dictionary. However, lack of dictionary coverage prune two problems: proper names and compound words handling. Relevance concept words consist of proper names and compound words, were applied in document and query indexing and query translation processes. We believed by using concept-based indexing and translations makes proper names and compound words translation possible. A series of experiments conducted to test the compound words and proper names translation methods in CLIR system. The best retrieval performance obtained from the combination of query translation approach-select all translations listed in the dictionary, alternative weighting scheme and proper names identification and translation. For both Malay and English document collection, these approaches outperformed query translation approach, select all translations listed in the dictionary, by 1.0 and 9%. The results show that proper names and compound words translations were important in query translation for Malay-English CLIR. |
first_indexed | 2024-03-06T07:54:05Z |
format | Article |
id | upm.eprints-22487 |
institution | Universiti Putra Malaysia |
language | English |
last_indexed | 2024-03-06T07:54:05Z |
publishDate | 2011 |
publisher | Asian Network for Scientific Information |
record_format | dspace |
spelling | upm.eprints-224872019-11-12T07:39:01Z http://psasir.upm.edu.my/id/eprint/22487/ Multiword phrases indexing for Malay-English cross-language information retrieval Rais, Nurjannaton Hidayah Abdullah, Muhamad Taufik Abdul Kadir, Rabiah Cross-Language Information Retrieval (CLIR) is the process of providing queries in one language and returning documents relevant to that query which is written in a different language. A popular approach to CLIR is to translate the query into the language of the documents being retrieved. One of the simplest and most effective methods for query translation is to perform dictionary look-up based on a bilingual dictionary. However, lack of dictionary coverage prune two problems: proper names and compound words handling. Relevance concept words consist of proper names and compound words, were applied in document and query indexing and query translation processes. We believed by using concept-based indexing and translations makes proper names and compound words translation possible. A series of experiments conducted to test the compound words and proper names translation methods in CLIR system. The best retrieval performance obtained from the combination of query translation approach-select all translations listed in the dictionary, alternative weighting scheme and proper names identification and translation. For both Malay and English document collection, these approaches outperformed query translation approach, select all translations listed in the dictionary, by 1.0 and 9%. The results show that proper names and compound words translations were important in query translation for Malay-English CLIR. Asian Network for Scientific Information 2011 Article PeerReviewed text en http://psasir.upm.edu.my/id/eprint/22487/1/Multiword%20phrases%20indexing%20for%20Malay-English%20cross-language%20information%20retrieval.pdf Rais, Nurjannaton Hidayah and Abdullah, Muhamad Taufik and Abdul Kadir, Rabiah (2011) Multiword phrases indexing for Malay-English cross-language information retrieval. Information Technology Journal, 10 (8). pp. 1554-1562. ISSN 1812-5638; ESSN: 1812-5646 https://scialert.net/abstract/?doi=itj.2011.1554.1562 10.3923/itj.2011.1554.1562 |
spellingShingle | Rais, Nurjannaton Hidayah Abdullah, Muhamad Taufik Abdul Kadir, Rabiah Multiword phrases indexing for Malay-English cross-language information retrieval |
title | Multiword phrases indexing for Malay-English cross-language information retrieval |
title_full | Multiword phrases indexing for Malay-English cross-language information retrieval |
title_fullStr | Multiword phrases indexing for Malay-English cross-language information retrieval |
title_full_unstemmed | Multiword phrases indexing for Malay-English cross-language information retrieval |
title_short | Multiword phrases indexing for Malay-English cross-language information retrieval |
title_sort | multiword phrases indexing for malay english cross language information retrieval |
url | http://psasir.upm.edu.my/id/eprint/22487/1/Multiword%20phrases%20indexing%20for%20Malay-English%20cross-language%20information%20retrieval.pdf |
work_keys_str_mv | AT raisnurjannatonhidayah multiwordphrasesindexingformalayenglishcrosslanguageinformationretrieval AT abdullahmuhamadtaufik multiwordphrasesindexingformalayenglishcrosslanguageinformationretrieval AT abdulkadirrabiah multiwordphrasesindexingformalayenglishcrosslanguageinformationretrieval |