Developing Lexico-Semantic Relations of Saraiki Nouns: A Corpus-Based Study
Saraiki, being the fourth most widely spoken language in Pakistan and being used in some parts of India and Afghanistan, is of significant geographical, historical, and cultural importance. However, it remains neglected in terms of proper documentation and identification of its unique linguistic fea...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Department of English, University of Chitral
2024-01-01
|
Series: | University of Chitral Journal of Linguistics and Literature |
Subjects: | |
Online Access: | https://jll.uoch.edu.pk/index.php/jll/article/view/270/216 |
_version_ | 1797213335926603776 |
---|---|
author | Musarat Nazeer Musarrat Azher Azhar Pervaiz Iqra Yasmeen |
author_facet | Musarat Nazeer Musarrat Azher Azhar Pervaiz Iqra Yasmeen |
author_sort | Musarat Nazeer |
collection | DOAJ |
description | Saraiki, being the fourth most widely spoken language in Pakistan and being used in some parts of India and Afghanistan, is of significant geographical, historical, and cultural importance. However, it remains neglected in terms of proper documentation and identification of its unique linguistic features. The current study is centered on identifying the lexico-semantic categories of Saraiki nouns and then developing their hierarchical relationships (Miller et al., 1993). This quantitative research is designed to contribute to the process of developing Saraiki WordNet and is related to Natural Language Processing (NLP). A corpus of 3 million words was developed on the basis of data collected from different genres of the Saraiki language, including newspapers, academic essays, literary texts, and religious books. Both expansion and merge approaches were used to analyze the data. A wordlist of 1500 most occurring nouns was extracted from the corpus using Antconc 3.4.4.0, followed by manual tagging in Microsoft Excel 2010. Resultantly, 39 most occurring nouns from the wordlist were used to develop 173 related synsets, and lexico-semantic relationships among these nouns were identified with the help of 30 hierarchies (Miller et al., 1993). This study is limited to areas like Bahawalpur, Multan, and Muzaffarabad. It would be a milestone for Saraiki language learners, SWN development, Saraiki lexical resources, online SL dictionaries, and a guide for researchers. |
first_indexed | 2024-04-24T10:56:39Z |
format | Article |
id | doaj.art-b3189654843c4ded82a885bc08fd9757 |
institution | Directory Open Access Journal |
issn | 2617-3611 2663-1512 |
language | English |
last_indexed | 2024-04-24T10:56:39Z |
publishDate | 2024-01-01 |
publisher | Department of English, University of Chitral |
record_format | Article |
series | University of Chitral Journal of Linguistics and Literature |
spelling | doaj.art-b3189654843c4ded82a885bc08fd97572024-04-12T06:05:39ZengDepartment of English, University of ChitralUniversity of Chitral Journal of Linguistics and Literature2617-36112663-15122024-01-018I162182Developing Lexico-Semantic Relations of Saraiki Nouns: A Corpus-Based StudyMusarat Nazeer0Musarrat Azher1Azhar Pervaiz2Iqra Yasmeen3M.Phil. Scholar, Department of English, University of Sargodha, Sargodha, Punjab, PakistanAssociate Professor, Department of Linguistics and Language Studies, University of Sargodha, PakistanAssistant Professor, Department of Linguistics and Language Studies, University of Sargodha, PakistanMphil Scholar, Department of English, University of Sargodha, Sargodha, PakistanSaraiki, being the fourth most widely spoken language in Pakistan and being used in some parts of India and Afghanistan, is of significant geographical, historical, and cultural importance. However, it remains neglected in terms of proper documentation and identification of its unique linguistic features. The current study is centered on identifying the lexico-semantic categories of Saraiki nouns and then developing their hierarchical relationships (Miller et al., 1993). This quantitative research is designed to contribute to the process of developing Saraiki WordNet and is related to Natural Language Processing (NLP). A corpus of 3 million words was developed on the basis of data collected from different genres of the Saraiki language, including newspapers, academic essays, literary texts, and religious books. Both expansion and merge approaches were used to analyze the data. A wordlist of 1500 most occurring nouns was extracted from the corpus using Antconc 3.4.4.0, followed by manual tagging in Microsoft Excel 2010. Resultantly, 39 most occurring nouns from the wordlist were used to develop 173 related synsets, and lexico-semantic relationships among these nouns were identified with the help of 30 hierarchies (Miller et al., 1993). This study is limited to areas like Bahawalpur, Multan, and Muzaffarabad. It would be a milestone for Saraiki language learners, SWN development, Saraiki lexical resources, online SL dictionaries, and a guide for researchers. https://jll.uoch.edu.pk/index.php/jll/article/view/270/216corpus-based studysaraiki nounslexico-semantic relationswordnetnlp |
spellingShingle | Musarat Nazeer Musarrat Azher Azhar Pervaiz Iqra Yasmeen Developing Lexico-Semantic Relations of Saraiki Nouns: A Corpus-Based Study University of Chitral Journal of Linguistics and Literature corpus-based study saraiki nouns lexico-semantic relations wordnet nlp |
title | Developing Lexico-Semantic Relations of Saraiki Nouns: A Corpus-Based Study |
title_full | Developing Lexico-Semantic Relations of Saraiki Nouns: A Corpus-Based Study |
title_fullStr | Developing Lexico-Semantic Relations of Saraiki Nouns: A Corpus-Based Study |
title_full_unstemmed | Developing Lexico-Semantic Relations of Saraiki Nouns: A Corpus-Based Study |
title_short | Developing Lexico-Semantic Relations of Saraiki Nouns: A Corpus-Based Study |
title_sort | developing lexico semantic relations of saraiki nouns a corpus based study |
topic | corpus-based study saraiki nouns lexico-semantic relations wordnet nlp |
url | https://jll.uoch.edu.pk/index.php/jll/article/view/270/216 |
work_keys_str_mv | AT musaratnazeer developinglexicosemanticrelationsofsaraikinounsacorpusbasedstudy AT musarratazher developinglexicosemanticrelationsofsaraikinounsacorpusbasedstudy AT azharpervaiz developinglexicosemanticrelationsofsaraikinounsacorpusbasedstudy AT iqrayasmeen developinglexicosemanticrelationsofsaraikinounsacorpusbasedstudy |