Job scam detection using classification algorithms

Scams are the most common type of cybercrime in Singapore, with a majority of them being job scams. Applicant Tracking Systems (ATS) and their automation capabilities makes it easy for scammers to post fraudulent job listings on online recruitment portals such as Monster. It also allows them to easi...

Full description

Bibliographic Details
Main Author: Sim, Keith Shi Jie
Other Authors: Josephine Chong Leng Leng
Format: Final Year Project (FYP)
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/181115
_version_ 1824457148225028096
author Sim, Keith Shi Jie
author2 Josephine Chong Leng Leng
author_facet Josephine Chong Leng Leng
Sim, Keith Shi Jie
author_sort Sim, Keith Shi Jie
collection NTU
description Scams are the most common type of cybercrime in Singapore, with a majority of them being job scams. Applicant Tracking Systems (ATS) and their automation capabilities makes it easy for scammers to post fraudulent job listings on online recruitment portals such as Monster. It also allows them to easily collect up to 1000 resumes a day. The objective of this study is to expand upon the foundational knowledge obtained by past researchers and identify feature extraction techniques and classification models that are most effective in identifying fake job advertisements. This study applies modern Natural Language Processing (NLP) techniques such as transformers and word embeddings on the Employment Scam Aegean Dataset (EMSCAD) from the University of the Aegean to study its effectiveness. The resulting models that utilised these techniques managed to achieve the highest F1 scores through the study, highlighting their effectiveness in the classification task. These results support prior research and prove that feature selection improves performance regardless of the classification model chosen. Additionally, embedding features generally perform better than a custom ruleset of features. Although these results show that transformers and word embeddings are effective, they are prone to certain limitations due to the imbalanced EMSCAD dataset, and the maximum sequence length of the transformer models used in this study. Hence, future work in this area can focus on creating a more robust, comprehensive and balanced dataset as compared to the EMSCAD dataset and focus on fine-tuning other transformer models such as BigBird and Longformer, that are capable of handling larger sequences of texts.
first_indexed 2025-02-19T04:05:23Z
format Final Year Project (FYP)
id ntu-10356/181115
institution Nanyang Technological University
language English
last_indexed 2025-02-19T04:05:23Z
publishDate 2024
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1811152024-11-14T12:31:38Z Job scam detection using classification algorithms Sim, Keith Shi Jie Josephine Chong Leng Leng College of Computing and Data Science josephine.chong@ntu.edu.sg Computer and Information Science Scams are the most common type of cybercrime in Singapore, with a majority of them being job scams. Applicant Tracking Systems (ATS) and their automation capabilities makes it easy for scammers to post fraudulent job listings on online recruitment portals such as Monster. It also allows them to easily collect up to 1000 resumes a day. The objective of this study is to expand upon the foundational knowledge obtained by past researchers and identify feature extraction techniques and classification models that are most effective in identifying fake job advertisements. This study applies modern Natural Language Processing (NLP) techniques such as transformers and word embeddings on the Employment Scam Aegean Dataset (EMSCAD) from the University of the Aegean to study its effectiveness. The resulting models that utilised these techniques managed to achieve the highest F1 scores through the study, highlighting their effectiveness in the classification task. These results support prior research and prove that feature selection improves performance regardless of the classification model chosen. Additionally, embedding features generally perform better than a custom ruleset of features. Although these results show that transformers and word embeddings are effective, they are prone to certain limitations due to the imbalanced EMSCAD dataset, and the maximum sequence length of the transformer models used in this study. Hence, future work in this area can focus on creating a more robust, comprehensive and balanced dataset as compared to the EMSCAD dataset and focus on fine-tuning other transformer models such as BigBird and Longformer, that are capable of handling larger sequences of texts. Bachelor's degree 2024-11-14T12:31:38Z 2024-11-14T12:31:38Z 2024 Final Year Project (FYP) Sim, K. S. J. (2024). Job scam detection using classification algorithms. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181115 https://hdl.handle.net/10356/181115 en SCSE23-0928 application/pdf Nanyang Technological University
spellingShingle Computer and Information Science
Sim, Keith Shi Jie
Job scam detection using classification algorithms
title Job scam detection using classification algorithms
title_full Job scam detection using classification algorithms
title_fullStr Job scam detection using classification algorithms
title_full_unstemmed Job scam detection using classification algorithms
title_short Job scam detection using classification algorithms
title_sort job scam detection using classification algorithms
topic Computer and Information Science
url https://hdl.handle.net/10356/181115
work_keys_str_mv AT simkeithshijie jobscamdetectionusingclassificationalgorithms