Named Entity Recognition Utilized to Enhance Text Classification While Preserving Privacy

Recent development in Natural Language Processing (NLP) techniques has encouraged NLP-based application in various field including business, legal and health. An important process for all NLP projects is text preprocessing which is a process that modifies text data before using them in a machine lea...

Full beskrivning

Bibliografiska uppgifter
Huvudupphovsman: Mohammed Kutbi
Materialtyp: Artikel
Språk:English
Publicerad: IEEE 2023-01-01
Serie:IEEE Access
Ämnen:
Länkar:https://ieeexplore.ieee.org/document/10287940/
Beskrivning
Sammanfattning:Recent development in Natural Language Processing (NLP) techniques has encouraged NLP-based application in various field including business, legal and health. An important process for all NLP projects is text preprocessing which is a process that modifies text data before using them in a machine learning model. Usually text preprocessing process includes cleaning, filtering, removing and replacing some texts to increase model accuracy, robustness, reduce data size or preserve privacy. Named entities recognizer (NER) is an NLP tool which finds Named Entities in text such as: names, organization, addresses, numbers and date. In this work, we create a preproccessing approach that uses NER to find named entities and, then, replace them with their type i.e. location, person or organization name to improve accuracy and preserve privacy instead of removing them or letting them become noise to our data. Experiments for text classification task using our approach have been conducted on several datasets some of which were collected in-house. Experiments indicate that using this approach enhances classifier accuracy and reduces feature representation’s dimensionality while, also, preserve privacy.
ISSN:2169-3536