Ensemble classifier based approach for code-mixed cross-script question classification
With an increasing popularity of social-media, people post updates that aid other users in finding answers to their questions. Most of the user-generated data on social-media are in code-mixed or multi-script form, where the words are represented phonetically in a non-native script. We address the p...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Conference Paper |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/87322 http://hdl.handle.net/10220/49465 http://ceur-ws.org/Vol-1737/ |
Summary: | With an increasing popularity of social-media, people post updates that aid other users in finding answers to their questions. Most of the user-generated data on social-media are in code-mixed or multi-script form, where the words are represented phonetically in a non-native script. We address the problem of Question-Classfication on social-media data. We propose an ensemble classifier based approach towards question classification when the questions are written in mixedscript, specifically, the Roman script for the Bengali language. We separately train Random Forests, One-Vs-Rest and k-NN classifiers and then build an ensemble classifier that combines the best from the three worlds. We achieve an accuracy of 82% approximately, suggesting that the method works well in the task. |
---|