Amharic <i>Adhoc</i> Information Retrieval System Based on Morphological Features
Information retrieval (IR) is one of the most important research and development areas due to the explosion of digital data and the need of accessing relevant information from huge corpora. Although IR systems function well for technologically advanced languages such as English, this is not the case...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-01-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/12/3/1294 |
_version_ | 1827661648270196736 |
---|---|
author | Tilahun Yeshambel Josiane Mothe Yaregal Assabie |
author_facet | Tilahun Yeshambel Josiane Mothe Yaregal Assabie |
author_sort | Tilahun Yeshambel |
collection | DOAJ |
description | Information retrieval (IR) is one of the most important research and development areas due to the explosion of digital data and the need of accessing relevant information from huge corpora. Although IR systems function well for technologically advanced languages such as English, this is not the case for morphologically complex, under-resourced and less-studied languages such as Amharic. Amharic is a Semitic language characterized by a complex morphology where thousands of words are generated from a single root form through inflection and derivation. This has made the development of Amharic natural language processing (NLP) tools a challenging task. Amharic <i>adhoc</i> retrieval also faces challenges due to scarcity of linguistic resources, tools and standard evaluation corpora. In this research work, we investigate the impact of morphological features on the representation of Amharic documents and queries for <i>adhoc</i> retrieval. We also analyze the effects of stem-based and root-based text representation, and proposed new Amharic IR system architecture. Moreover, we present the resources and corpora we constructed for evaluation of Amharic IR systems and other NLP tools. We conduct various experiments with a TREC-like approach for Amharic IR test collection using a standard evaluation framework and measures. Our findings show that root-based text representation outperforms the conventional stem-based representation on Amharic IR. |
first_indexed | 2024-03-10T00:14:09Z |
format | Article |
id | doaj.art-8b02074b2fd44cf29adb06e481c768a1 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-10T00:14:09Z |
publishDate | 2022-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-8b02074b2fd44cf29adb06e481c768a12023-11-23T15:55:17ZengMDPI AGApplied Sciences2076-34172022-01-01123129410.3390/app12031294Amharic <i>Adhoc</i> Information Retrieval System Based on Morphological FeaturesTilahun Yeshambel0Josiane Mothe1Yaregal Assabie2ITPhD Program, Addis Ababa University, Addis Ababa P.O. Box 1176, EthiopiaUniversité Jean-Jaurès, Université de Toulouse, Componsante INSPE, IRIT, UMR5505 CNRS, 118 Rte de Narbonne, F31400 Toulouse, FranceDepartment of Computer Science, Addis Ababa University, Addis Ababa P.O. Box 1176, EthiopiaInformation retrieval (IR) is one of the most important research and development areas due to the explosion of digital data and the need of accessing relevant information from huge corpora. Although IR systems function well for technologically advanced languages such as English, this is not the case for morphologically complex, under-resourced and less-studied languages such as Amharic. Amharic is a Semitic language characterized by a complex morphology where thousands of words are generated from a single root form through inflection and derivation. This has made the development of Amharic natural language processing (NLP) tools a challenging task. Amharic <i>adhoc</i> retrieval also faces challenges due to scarcity of linguistic resources, tools and standard evaluation corpora. In this research work, we investigate the impact of morphological features on the representation of Amharic documents and queries for <i>adhoc</i> retrieval. We also analyze the effects of stem-based and root-based text representation, and proposed new Amharic IR system architecture. Moreover, we present the resources and corpora we constructed for evaluation of Amharic IR systems and other NLP tools. We conduct various experiments with a TREC-like approach for Amharic IR test collection using a standard evaluation framework and measures. Our findings show that root-based text representation outperforms the conventional stem-based representation on Amharic IR.https://www.mdpi.com/2076-3417/12/3/1294information retrieval<i>adhoc</i> retrievalAmhariccomplex morphologycorpusresources |
spellingShingle | Tilahun Yeshambel Josiane Mothe Yaregal Assabie Amharic <i>Adhoc</i> Information Retrieval System Based on Morphological Features Applied Sciences information retrieval <i>adhoc</i> retrieval Amharic complex morphology corpus resources |
title | Amharic <i>Adhoc</i> Information Retrieval System Based on Morphological Features |
title_full | Amharic <i>Adhoc</i> Information Retrieval System Based on Morphological Features |
title_fullStr | Amharic <i>Adhoc</i> Information Retrieval System Based on Morphological Features |
title_full_unstemmed | Amharic <i>Adhoc</i> Information Retrieval System Based on Morphological Features |
title_short | Amharic <i>Adhoc</i> Information Retrieval System Based on Morphological Features |
title_sort | amharic i adhoc i information retrieval system based on morphological features |
topic | information retrieval <i>adhoc</i> retrieval Amharic complex morphology corpus resources |
url | https://www.mdpi.com/2076-3417/12/3/1294 |
work_keys_str_mv | AT tilahunyeshambel amhariciadhociinformationretrievalsystembasedonmorphologicalfeatures AT josianemothe amhariciadhociinformationretrievalsystembasedonmorphologicalfeatures AT yaregalassabie amhariciadhociinformationretrievalsystembasedonmorphologicalfeatures |