BDGOA: A bot detection approach for GitHub OAuth Apps
<p>As various software bots are widely used in open source software repositories, some drawbacks are coming to light, such as giving newcomers non-positive feedback and misleading empirical studies of software engineering researchers. Several techniques have been proposed by researchers to per...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Tsinghua University Press
2023-09-01
|
Series: | Intelligent and Converged Networks |
Subjects: | |
Online Access: | https://www.sciopen.com/article/10.23919/ICN.2023.0006 |
_version_ | 1827783098223296512 |
---|---|
author | Zhifang Liao Xuechun Huang Bolin Zhang Jinsong Wu Yu Cheng |
author_facet | Zhifang Liao Xuechun Huang Bolin Zhang Jinsong Wu Yu Cheng |
author_sort | Zhifang Liao |
collection | DOAJ |
description | <p>As various software bots are widely used in open source software repositories, some drawbacks are coming to light, such as giving newcomers non-positive feedback and misleading empirical studies of software engineering researchers. Several techniques have been proposed by researchers to perform bot detection, but most of them are limited to identifying bots performing specific activities, let alone distinguishing between GitHub App and OAuth App. In this paper, we propose a bot detection technique for OAuth App, named BDGOA. 24 features are used in BDGOA, which can be divided into three dimensions: account information, account activity, and text similarity. To better explore the behavioral features, we define a fine-grained classification of behavioral events and introduce self-similarity to quantify the repeatability of behavioral sequence. We leverage five machine learning classifiers on the benchmark dataset to conduct bot detection, and finally choose random forest as the classifier, which achieves the highest F1-score of 95.83%. The experimental results comparing with the state-of-the-art approaches also demonstrate the superiority of BDGOA.</p> |
first_indexed | 2024-03-11T15:42:03Z |
format | Article |
id | doaj.art-4207afeb4df14aafa8d22050a1952c97 |
institution | Directory Open Access Journal |
issn | 2708-6240 |
language | English |
last_indexed | 2024-03-11T15:42:03Z |
publishDate | 2023-09-01 |
publisher | Tsinghua University Press |
record_format | Article |
series | Intelligent and Converged Networks |
spelling | doaj.art-4207afeb4df14aafa8d22050a1952c972023-10-26T10:10:51ZengTsinghua University PressIntelligent and Converged Networks2708-62402023-09-014318119710.23919/ICN.2023.0006BDGOA: A bot detection approach for GitHub OAuth AppsZhifang Liao0Xuechun Huang1Bolin Zhang2Jinsong Wu3Yu Cheng4School of Computer Science and Engineering, Central South University, Changsha 410083, ChinaSchool of Computer Science and Engineering, Central South University, Changsha 410083, ChinaSchool of Computer Science and Engineering, Central South University, Changsha 410083, ChinaSchool of Artificial Intelligence, Guilin University of Electronic Technology, Guilin 541004, China, and also with the Department of Electrical Engineering, University of Chile, Santiago 8320000, ChileHunan Glozeal Science and Technology Co., Ltd., Changsha 410083, China<p>As various software bots are widely used in open source software repositories, some drawbacks are coming to light, such as giving newcomers non-positive feedback and misleading empirical studies of software engineering researchers. Several techniques have been proposed by researchers to perform bot detection, but most of them are limited to identifying bots performing specific activities, let alone distinguishing between GitHub App and OAuth App. In this paper, we propose a bot detection technique for OAuth App, named BDGOA. 24 features are used in BDGOA, which can be divided into three dimensions: account information, account activity, and text similarity. To better explore the behavioral features, we define a fine-grained classification of behavioral events and introduce self-similarity to quantify the repeatability of behavioral sequence. We leverage five machine learning classifiers on the benchmark dataset to conduct bot detection, and finally choose random forest as the classifier, which achieves the highest F1-score of 95.83%. The experimental results comparing with the state-of-the-art approaches also demonstrate the superiority of BDGOA.</p>https://www.sciopen.com/article/10.23919/ICN.2023.0006githubdevbotsmachine learningtext similarity |
spellingShingle | Zhifang Liao Xuechun Huang Bolin Zhang Jinsong Wu Yu Cheng BDGOA: A bot detection approach for GitHub OAuth Apps Intelligent and Converged Networks github devbots machine learning text similarity |
title | BDGOA: A bot detection approach for GitHub OAuth Apps |
title_full | BDGOA: A bot detection approach for GitHub OAuth Apps |
title_fullStr | BDGOA: A bot detection approach for GitHub OAuth Apps |
title_full_unstemmed | BDGOA: A bot detection approach for GitHub OAuth Apps |
title_short | BDGOA: A bot detection approach for GitHub OAuth Apps |
title_sort | bdgoa a bot detection approach for github oauth apps |
topic | github devbots machine learning text similarity |
url | https://www.sciopen.com/article/10.23919/ICN.2023.0006 |
work_keys_str_mv | AT zhifangliao bdgoaabotdetectionapproachforgithuboauthapps AT xuechunhuang bdgoaabotdetectionapproachforgithuboauthapps AT bolinzhang bdgoaabotdetectionapproachforgithuboauthapps AT jinsongwu bdgoaabotdetectionapproachforgithuboauthapps AT yucheng bdgoaabotdetectionapproachforgithuboauthapps |