Improving document relevancy using integrated language modeling techniques

This paper presents an integrated language model to improve document relevancy for text-queries. To be precise, an integrated stemming-lemmatization (S-L) model was developed and its retrieval performance was compared at three document levels, that is, at top 5, 10 and 15. A prototype search engine...

Full description

Bibliographic Details
Main Authors: Balakrishnan, Vimala, Humaidi, N., Lloyd-Yemoh, E.
Format: Article
Published: Faculty of Computer Science and Information Technology, University of Malaya 2016
Subjects:
_version_ 1825721116697559040
author Balakrishnan, Vimala
Humaidi, N.
Lloyd-Yemoh, E.
author_facet Balakrishnan, Vimala
Humaidi, N.
Lloyd-Yemoh, E.
author_sort Balakrishnan, Vimala
collection UM
description This paper presents an integrated language model to improve document relevancy for text-queries. To be precise, an integrated stemming-lemmatization (S-L) model was developed and its retrieval performance was compared at three document levels, that is, at top 5, 10 and 15. A prototype search engine was developed and fifteen queries were executed. The mean average precisions revealed the S-L model to outperform the baseline (i.e. no language processing), stemming and also the lemmatization models at all three levels of the documents. These results were also supported by the histogram precisions which illustrated the integrated model to improve the document relevancy. However, it is to note that the precision differences between the various models were insignificant. Overall the study found that when language processing techniques, that is, stemming and lemmatization are combined, more relevant documents are retrieved.
first_indexed 2024-03-06T05:45:28Z
format Article
id um.eprints-18453
institution Universiti Malaya
last_indexed 2024-03-06T05:45:28Z
publishDate 2016
publisher Faculty of Computer Science and Information Technology, University of Malaya
record_format dspace
spelling um.eprints-184532020-01-08T07:41:38Z http://eprints.um.edu.my/18453/ Improving document relevancy using integrated language modeling techniques Balakrishnan, Vimala Humaidi, N. Lloyd-Yemoh, E. HF Commerce QA75 Electronic computers. Computer science This paper presents an integrated language model to improve document relevancy for text-queries. To be precise, an integrated stemming-lemmatization (S-L) model was developed and its retrieval performance was compared at three document levels, that is, at top 5, 10 and 15. A prototype search engine was developed and fifteen queries were executed. The mean average precisions revealed the S-L model to outperform the baseline (i.e. no language processing), stemming and also the lemmatization models at all three levels of the documents. These results were also supported by the histogram precisions which illustrated the integrated model to improve the document relevancy. However, it is to note that the precision differences between the various models were insignificant. Overall the study found that when language processing techniques, that is, stemming and lemmatization are combined, more relevant documents are retrieved. Faculty of Computer Science and Information Technology, University of Malaya 2016 Article PeerReviewed Balakrishnan, Vimala and Humaidi, N. and Lloyd-Yemoh, E. (2016) Improving document relevancy using integrated language modeling techniques. Malaysian Journal of Computer Science, 29 (1). pp. 45-55. ISSN 0127-9084, DOI https://doi.org/10.22452/mjcs.vol29no1.4 <https://doi.org/10.22452/mjcs.vol29no1.4>. https://doi.org/10.22452/mjcs.vol29no1.4 doi:10.22452/mjcs.vol29no1.4
spellingShingle HF Commerce
QA75 Electronic computers. Computer science
Balakrishnan, Vimala
Humaidi, N.
Lloyd-Yemoh, E.
Improving document relevancy using integrated language modeling techniques
title Improving document relevancy using integrated language modeling techniques
title_full Improving document relevancy using integrated language modeling techniques
title_fullStr Improving document relevancy using integrated language modeling techniques
title_full_unstemmed Improving document relevancy using integrated language modeling techniques
title_short Improving document relevancy using integrated language modeling techniques
title_sort improving document relevancy using integrated language modeling techniques
topic HF Commerce
QA75 Electronic computers. Computer science
work_keys_str_mv AT balakrishnanvimala improvingdocumentrelevancyusingintegratedlanguagemodelingtechniques
AT humaidin improvingdocumentrelevancyusingintegratedlanguagemodelingtechniques
AT lloydyemohe improvingdocumentrelevancyusingintegratedlanguagemodelingtechniques