Developing Lexico-Semantic Relations of Saraiki Nouns: A Corpus-Based Study

Saraiki, being the fourth most widely spoken language in Pakistan and being used in some parts of India and Afghanistan, is of significant geographical, historical, and cultural importance. However, it remains neglected in terms of proper documentation and identification of its unique linguistic fea...

Full description

Bibliographic Details
Main Authors: Musarat Nazeer, Musarrat Azher, Azhar Pervaiz, Iqra Yasmeen
Format: Article
Language:English
Published: Department of English, University of Chitral 2024-01-01
Series:University of Chitral Journal of Linguistics and Literature
Subjects:
Online Access:https://jll.uoch.edu.pk/index.php/jll/article/view/270/216
_version_ 1797213335926603776
author Musarat Nazeer
Musarrat Azher
Azhar Pervaiz
Iqra Yasmeen
author_facet Musarat Nazeer
Musarrat Azher
Azhar Pervaiz
Iqra Yasmeen
author_sort Musarat Nazeer
collection DOAJ
description Saraiki, being the fourth most widely spoken language in Pakistan and being used in some parts of India and Afghanistan, is of significant geographical, historical, and cultural importance. However, it remains neglected in terms of proper documentation and identification of its unique linguistic features. The current study is centered on identifying the lexico-semantic categories of Saraiki nouns and then developing their hierarchical relationships (Miller et al., 1993). This quantitative research is designed to contribute to the process of developing Saraiki WordNet and is related to Natural Language Processing (NLP). A corpus of 3 million words was developed on the basis of data collected from different genres of the Saraiki language, including newspapers, academic essays, literary texts, and religious books. Both expansion and merge approaches were used to analyze the data. A wordlist of 1500 most occurring nouns was extracted from the corpus using Antconc 3.4.4.0, followed by manual tagging in Microsoft Excel 2010. Resultantly, 39 most occurring nouns from the wordlist were used to develop 173 related synsets, and lexico-semantic relationships among these nouns were identified with the help of 30 hierarchies (Miller et al., 1993). This study is limited to areas like Bahawalpur, Multan, and Muzaffarabad. It would be a milestone for Saraiki language learners, SWN development, Saraiki lexical resources, online SL dictionaries, and a guide for researchers.
first_indexed 2024-04-24T10:56:39Z
format Article
id doaj.art-b3189654843c4ded82a885bc08fd9757
institution Directory Open Access Journal
issn 2617-3611
2663-1512
language English
last_indexed 2024-04-24T10:56:39Z
publishDate 2024-01-01
publisher Department of English, University of Chitral
record_format Article
series University of Chitral Journal of Linguistics and Literature
spelling doaj.art-b3189654843c4ded82a885bc08fd97572024-04-12T06:05:39ZengDepartment of English, University of ChitralUniversity of Chitral Journal of Linguistics and Literature2617-36112663-15122024-01-018I162182Developing Lexico-Semantic Relations of Saraiki Nouns: A Corpus-Based StudyMusarat Nazeer0Musarrat Azher1Azhar Pervaiz2Iqra Yasmeen3M.Phil. Scholar, Department of English, University of Sargodha, Sargodha, Punjab, PakistanAssociate Professor, Department of Linguistics and Language Studies, University of Sargodha, PakistanAssistant Professor, Department of Linguistics and Language Studies, University of Sargodha, PakistanMphil Scholar, Department of English, University of Sargodha, Sargodha, PakistanSaraiki, being the fourth most widely spoken language in Pakistan and being used in some parts of India and Afghanistan, is of significant geographical, historical, and cultural importance. However, it remains neglected in terms of proper documentation and identification of its unique linguistic features. The current study is centered on identifying the lexico-semantic categories of Saraiki nouns and then developing their hierarchical relationships (Miller et al., 1993). This quantitative research is designed to contribute to the process of developing Saraiki WordNet and is related to Natural Language Processing (NLP). A corpus of 3 million words was developed on the basis of data collected from different genres of the Saraiki language, including newspapers, academic essays, literary texts, and religious books. Both expansion and merge approaches were used to analyze the data. A wordlist of 1500 most occurring nouns was extracted from the corpus using Antconc 3.4.4.0, followed by manual tagging in Microsoft Excel 2010. Resultantly, 39 most occurring nouns from the wordlist were used to develop 173 related synsets, and lexico-semantic relationships among these nouns were identified with the help of 30 hierarchies (Miller et al., 1993). This study is limited to areas like Bahawalpur, Multan, and Muzaffarabad. It would be a milestone for Saraiki language learners, SWN development, Saraiki lexical resources, online SL dictionaries, and a guide for researchers. https://jll.uoch.edu.pk/index.php/jll/article/view/270/216corpus-based studysaraiki nounslexico-semantic relationswordnetnlp
spellingShingle Musarat Nazeer
Musarrat Azher
Azhar Pervaiz
Iqra Yasmeen
Developing Lexico-Semantic Relations of Saraiki Nouns: A Corpus-Based Study
University of Chitral Journal of Linguistics and Literature
corpus-based study
saraiki nouns
lexico-semantic relations
wordnet
nlp
title Developing Lexico-Semantic Relations of Saraiki Nouns: A Corpus-Based Study
title_full Developing Lexico-Semantic Relations of Saraiki Nouns: A Corpus-Based Study
title_fullStr Developing Lexico-Semantic Relations of Saraiki Nouns: A Corpus-Based Study
title_full_unstemmed Developing Lexico-Semantic Relations of Saraiki Nouns: A Corpus-Based Study
title_short Developing Lexico-Semantic Relations of Saraiki Nouns: A Corpus-Based Study
title_sort developing lexico semantic relations of saraiki nouns a corpus based study
topic corpus-based study
saraiki nouns
lexico-semantic relations
wordnet
nlp
url https://jll.uoch.edu.pk/index.php/jll/article/view/270/216
work_keys_str_mv AT musaratnazeer developinglexicosemanticrelationsofsaraikinounsacorpusbasedstudy
AT musarratazher developinglexicosemanticrelationsofsaraikinounsacorpusbasedstudy
AT azharpervaiz developinglexicosemanticrelationsofsaraikinounsacorpusbasedstudy
AT iqrayasmeen developinglexicosemanticrelationsofsaraikinounsacorpusbasedstudy