BDGOA: A bot detection approach for GitHub OAuth Apps

<p>As various software bots are widely used in open source software repositories, some drawbacks are coming to light, such as giving newcomers non-positive feedback and misleading empirical studies of software engineering researchers. Several techniques have been proposed by researchers to per...

Full description

Bibliographic Details
Main Authors: Zhifang Liao, Xuechun Huang, Bolin Zhang, Jinsong Wu, Yu Cheng
Format: Article
Language:English
Published: Tsinghua University Press 2023-09-01
Series:Intelligent and Converged Networks
Subjects:
Online Access:https://www.sciopen.com/article/10.23919/ICN.2023.0006
_version_ 1827783098223296512
author Zhifang Liao
Xuechun Huang
Bolin Zhang
Jinsong Wu
Yu Cheng
author_facet Zhifang Liao
Xuechun Huang
Bolin Zhang
Jinsong Wu
Yu Cheng
author_sort Zhifang Liao
collection DOAJ
description <p>As various software bots are widely used in open source software repositories, some drawbacks are coming to light, such as giving newcomers non-positive feedback and misleading empirical studies of software engineering researchers. Several techniques have been proposed by researchers to perform bot detection, but most of them are limited to identifying bots performing specific activities, let alone distinguishing between GitHub App and OAuth App. In this paper, we propose a bot detection technique for OAuth App, named BDGOA. 24 features are used in BDGOA, which can be divided into three dimensions: account information, account activity, and text similarity. To better explore the behavioral features, we define a fine-grained classification of behavioral events and introduce self-similarity to quantify the repeatability of behavioral sequence. We leverage five machine learning classifiers on the benchmark dataset to conduct bot detection, and finally choose random forest as the classifier, which achieves the highest F1-score of 95.83%. The experimental results comparing with the state-of-the-art approaches also demonstrate the superiority of BDGOA.</p>
first_indexed 2024-03-11T15:42:03Z
format Article
id doaj.art-4207afeb4df14aafa8d22050a1952c97
institution Directory Open Access Journal
issn 2708-6240
language English
last_indexed 2024-03-11T15:42:03Z
publishDate 2023-09-01
publisher Tsinghua University Press
record_format Article
series Intelligent and Converged Networks
spelling doaj.art-4207afeb4df14aafa8d22050a1952c972023-10-26T10:10:51ZengTsinghua University PressIntelligent and Converged Networks2708-62402023-09-014318119710.23919/ICN.2023.0006BDGOA: A bot detection approach for GitHub OAuth AppsZhifang Liao0Xuechun Huang1Bolin Zhang2Jinsong Wu3Yu Cheng4School of Computer Science and Engineering, Central South University, Changsha 410083, ChinaSchool of Computer Science and Engineering, Central South University, Changsha 410083, ChinaSchool of Computer Science and Engineering, Central South University, Changsha 410083, ChinaSchool of Artificial Intelligence, Guilin University of Electronic Technology, Guilin 541004, China, and also with the Department of Electrical Engineering, University of Chile, Santiago 8320000, ChileHunan Glozeal Science and Technology Co., Ltd., Changsha 410083, China<p>As various software bots are widely used in open source software repositories, some drawbacks are coming to light, such as giving newcomers non-positive feedback and misleading empirical studies of software engineering researchers. Several techniques have been proposed by researchers to perform bot detection, but most of them are limited to identifying bots performing specific activities, let alone distinguishing between GitHub App and OAuth App. In this paper, we propose a bot detection technique for OAuth App, named BDGOA. 24 features are used in BDGOA, which can be divided into three dimensions: account information, account activity, and text similarity. To better explore the behavioral features, we define a fine-grained classification of behavioral events and introduce self-similarity to quantify the repeatability of behavioral sequence. We leverage five machine learning classifiers on the benchmark dataset to conduct bot detection, and finally choose random forest as the classifier, which achieves the highest F1-score of 95.83%. The experimental results comparing with the state-of-the-art approaches also demonstrate the superiority of BDGOA.</p>https://www.sciopen.com/article/10.23919/ICN.2023.0006githubdevbotsmachine learningtext similarity
spellingShingle Zhifang Liao
Xuechun Huang
Bolin Zhang
Jinsong Wu
Yu Cheng
BDGOA: A bot detection approach for GitHub OAuth Apps
Intelligent and Converged Networks
github
devbots
machine learning
text similarity
title BDGOA: A bot detection approach for GitHub OAuth Apps
title_full BDGOA: A bot detection approach for GitHub OAuth Apps
title_fullStr BDGOA: A bot detection approach for GitHub OAuth Apps
title_full_unstemmed BDGOA: A bot detection approach for GitHub OAuth Apps
title_short BDGOA: A bot detection approach for GitHub OAuth Apps
title_sort bdgoa a bot detection approach for github oauth apps
topic github
devbots
machine learning
text similarity
url https://www.sciopen.com/article/10.23919/ICN.2023.0006
work_keys_str_mv AT zhifangliao bdgoaabotdetectionapproachforgithuboauthapps
AT xuechunhuang bdgoaabotdetectionapproachforgithuboauthapps
AT bolinzhang bdgoaabotdetectionapproachforgithuboauthapps
AT jinsongwu bdgoaabotdetectionapproachforgithuboauthapps
AT yucheng bdgoaabotdetectionapproachforgithuboauthapps