Job scam detection using classification algorithms

Scams are the most common type of cybercrime in Singapore, with a majority of them being job scams. Applicant Tracking Systems (ATS) and their automation capabilities makes it easy for scammers to post fraudulent job listings on online recruitment portals such as Monster. It also allows them to easi...

Full description

Bibliographic Details
Main Author:	Sim, Keith Shi Jie
Other Authors:	Josephine Chong Leng Leng
Format:	Final Year Project (FYP)
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science
Online Access:	https://hdl.handle.net/10356/181115

_version_	1824457148225028096
author	Sim, Keith Shi Jie
author2	Josephine Chong Leng Leng
author_facet	Josephine Chong Leng Leng Sim, Keith Shi Jie
author_sort	Sim, Keith Shi Jie
collection	NTU
description	Scams are the most common type of cybercrime in Singapore, with a majority of them being job scams. Applicant Tracking Systems (ATS) and their automation capabilities makes it easy for scammers to post fraudulent job listings on online recruitment portals such as Monster. It also allows them to easily collect up to 1000 resumes a day. The objective of this study is to expand upon the foundational knowledge obtained by past researchers and identify feature extraction techniques and classification models that are most effective in identifying fake job advertisements. This study applies modern Natural Language Processing (NLP) techniques such as transformers and word embeddings on the Employment Scam Aegean Dataset (EMSCAD) from the University of the Aegean to study its effectiveness. The resulting models that utilised these techniques managed to achieve the highest F1 scores through the study, highlighting their effectiveness in the classification task. These results support prior research and prove that feature selection improves performance regardless of the classification model chosen. Additionally, embedding features generally perform better than a custom ruleset of features. Although these results show that transformers and word embeddings are effective, they are prone to certain limitations due to the imbalanced EMSCAD dataset, and the maximum sequence length of the transformer models used in this study. Hence, future work in this area can focus on creating a more robust, comprehensive and balanced dataset as compared to the EMSCAD dataset and focus on fine-tuning other transformer models such as BigBird and Longformer, that are capable of handling larger sequences of texts.
first_indexed	2025-02-19T04:05:23Z
format	Final Year Project (FYP)
id	ntu-10356/181115
institution	Nanyang Technological University
language	English
last_indexed	2025-02-19T04:05:23Z
publishDate	2024
publisher	Nanyang Technological University
record_format	dspace
spelling	ntu-10356/1811152024-11-14T12:31:38Z Job scam detection using classification algorithms Sim, Keith Shi Jie Josephine Chong Leng Leng College of Computing and Data Science josephine.chong@ntu.edu.sg Computer and Information Science Scams are the most common type of cybercrime in Singapore, with a majority of them being job scams. Applicant Tracking Systems (ATS) and their automation capabilities makes it easy for scammers to post fraudulent job listings on online recruitment portals such as Monster. It also allows them to easily collect up to 1000 resumes a day. The objective of this study is to expand upon the foundational knowledge obtained by past researchers and identify feature extraction techniques and classification models that are most effective in identifying fake job advertisements. This study applies modern Natural Language Processing (NLP) techniques such as transformers and word embeddings on the Employment Scam Aegean Dataset (EMSCAD) from the University of the Aegean to study its effectiveness. The resulting models that utilised these techniques managed to achieve the highest F1 scores through the study, highlighting their effectiveness in the classification task. These results support prior research and prove that feature selection improves performance regardless of the classification model chosen. Additionally, embedding features generally perform better than a custom ruleset of features. Although these results show that transformers and word embeddings are effective, they are prone to certain limitations due to the imbalanced EMSCAD dataset, and the maximum sequence length of the transformer models used in this study. Hence, future work in this area can focus on creating a more robust, comprehensive and balanced dataset as compared to the EMSCAD dataset and focus on fine-tuning other transformer models such as BigBird and Longformer, that are capable of handling larger sequences of texts. Bachelor's degree 2024-11-14T12:31:38Z 2024-11-14T12:31:38Z 2024 Final Year Project (FYP) Sim, K. S. J. (2024). Job scam detection using classification algorithms. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181115 https://hdl.handle.net/10356/181115 en SCSE23-0928 application/pdf Nanyang Technological University
spellingShingle	Computer and Information Science Sim, Keith Shi Jie Job scam detection using classification algorithms
title	Job scam detection using classification algorithms
title_full	Job scam detection using classification algorithms
title_fullStr	Job scam detection using classification algorithms
title_full_unstemmed	Job scam detection using classification algorithms
title_short	Job scam detection using classification algorithms
title_sort	job scam detection using classification algorithms
topic	Computer and Information Science
url	https://hdl.handle.net/10356/181115
work_keys_str_mv	AT simkeithshijie jobscamdetectionusingclassificationalgorithms

Job scam detection using classification algorithms

Similar Items