Enhancing text pre-processing for Swahili language: Datasets for common Swahili stop-words, slangs and typos with equivalent proper words

Natural Language Processing requires data to be pre-processed to guarantee quality models in different machine learning tasks. However, Swahili language have been disadvantaged and is classified as low resource language because of inadequate data for NLP especially basic textual datasets that are us...

Full description

Bibliographic Details
Main Authors:	Bernard Masua, Noel Masasi
Format:	Article
Language:	English
Published:	Elsevier 2020-12-01
Series:	Data in Brief
Subjects:	Natural language processing Text pre-processing Swahili language Stop-words Slangs Typos
Online Access:	http://www.sciencedirect.com/science/article/pii/S2352340920313998

Internet

http://www.sciencedirect.com/science/article/pii/S2352340920313998

Enhancing text pre-processing for Swahili language: Datasets for common Swahili stop-words, slangs and typos with equivalent proper words

Internet

Similar Items