Personality Detection on Reddit Using DistilBERT

Personality is a unique set of motivations, feelings, and behaviors humans possess. Personality detection on social media is a research topic commonly conducted in computer science. Personality models often used for personality detection research are the Big Five Indicator (BFI) and Myers-Briggs Typ...

Full description

Bibliographic Details
Main Authors: Alif Rahmat Julianda, Warih Maharani
Format: Article
Language:English
Published: Ikatan Ahli Informatika Indonesia 2023-10-01
Series:Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
Subjects:
Online Access:http://jurnal.iaii.or.id/index.php/RESTI/article/view/5236
_version_ 1827359544583389184
author Alif Rahmat Julianda
Warih Maharani
author_facet Alif Rahmat Julianda
Warih Maharani
author_sort Alif Rahmat Julianda
collection DOAJ
description Personality is a unique set of motivations, feelings, and behaviors humans possess. Personality detection on social media is a research topic commonly conducted in computer science. Personality models often used for personality detection research are the Big Five Indicator (BFI) and Myers-Briggs Type Indicator (MBTI) models. Unlike the BFI, which classifies personalities based on an individual’s traits, the MBTI model classifies personalities based on the type of the individual. So, MBTI performs better in several scenarios than the Big Five model. Many studies use machine learning to detect personality on social media, such as Logistic Regression, Naïve Bayes, and Support Vector Machine. With the recent popularity of Deep Learning, we can use language models such as DistilBERT to classify personality on social media. Because of DistilBERT’s ability to process large sentences and the ability for parallelization thanks to the transformer architecture. Therefore, the proposed research will detect MBTI personality on Reddit using DistilBERT. The evaluation shows that removing stopwords on the data preprocessing stage can reduce the model’s performance, and with class imbalance handling, DistilBERT performs worse than without class imbalance handling. Also, as a comparison, DistilBERT outperforms other machine learning classifiers such as Naïve Bayes, SVM, and Logistic Regression in accuracy, precision, recall, and f1-score.
first_indexed 2024-03-08T06:30:45Z
format Article
id doaj.art-cea9066444f045e4bc39f3ebddd4de4c
institution Directory Open Access Journal
issn 2580-0760
language English
last_indexed 2024-03-08T06:30:45Z
publishDate 2023-10-01
publisher Ikatan Ahli Informatika Indonesia
record_format Article
series Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
spelling doaj.art-cea9066444f045e4bc39f3ebddd4de4c2024-02-03T11:49:42ZengIkatan Ahli Informatika IndonesiaJurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)2580-07602023-10-01751140114610.29207/resti.v7i5.52365236Personality Detection on Reddit Using DistilBERTAlif Rahmat Julianda0Warih Maharani1Telkom UniversityTelkom UniversityPersonality is a unique set of motivations, feelings, and behaviors humans possess. Personality detection on social media is a research topic commonly conducted in computer science. Personality models often used for personality detection research are the Big Five Indicator (BFI) and Myers-Briggs Type Indicator (MBTI) models. Unlike the BFI, which classifies personalities based on an individual’s traits, the MBTI model classifies personalities based on the type of the individual. So, MBTI performs better in several scenarios than the Big Five model. Many studies use machine learning to detect personality on social media, such as Logistic Regression, Naïve Bayes, and Support Vector Machine. With the recent popularity of Deep Learning, we can use language models such as DistilBERT to classify personality on social media. Because of DistilBERT’s ability to process large sentences and the ability for parallelization thanks to the transformer architecture. Therefore, the proposed research will detect MBTI personality on Reddit using DistilBERT. The evaluation shows that removing stopwords on the data preprocessing stage can reduce the model’s performance, and with class imbalance handling, DistilBERT performs worse than without class imbalance handling. Also, as a comparison, DistilBERT outperforms other machine learning classifiers such as Naïve Bayes, SVM, and Logistic Regression in accuracy, precision, recall, and f1-score.http://jurnal.iaii.or.id/index.php/RESTI/article/view/5236personality detectionredditdistilbert
spellingShingle Alif Rahmat Julianda
Warih Maharani
Personality Detection on Reddit Using DistilBERT
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
personality detection
reddit
distilbert
title Personality Detection on Reddit Using DistilBERT
title_full Personality Detection on Reddit Using DistilBERT
title_fullStr Personality Detection on Reddit Using DistilBERT
title_full_unstemmed Personality Detection on Reddit Using DistilBERT
title_short Personality Detection on Reddit Using DistilBERT
title_sort personality detection on reddit using distilbert
topic personality detection
reddit
distilbert
url http://jurnal.iaii.or.id/index.php/RESTI/article/view/5236
work_keys_str_mv AT alifrahmatjulianda personalitydetectiononredditusingdistilbert
AT warihmaharani personalitydetectiononredditusingdistilbert