Instance-Based Classification Through Hypothesis Testing

Classification is a fundamental problem in machine learning and data mining. During the past decades, numerous classification methods have been presented based on different principles. However, most existing classifiers cast the classification problem as an optimization problem and do not address th...

Full description

Bibliographic Details
Main Authors:	Zengyou He, Chaohua Sheng, Yan Liu, Quan Zou
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	Classification hypothesis testing two-sample testing machine learning
Online Access:	https://ieeexplore.ieee.org/document/9333560/

_version_	1818856964242800640
author	Zengyou He Chaohua Sheng Yan Liu Quan Zou
author_facet	Zengyou He Chaohua Sheng Yan Liu Quan Zou
author_sort	Zengyou He
collection	DOAJ
description	Classification is a fundamental problem in machine learning and data mining. During the past decades, numerous classification methods have been presented based on different principles. However, most existing classifiers cast the classification problem as an optimization problem and do not address the issue of statistical significance. In this paper, we formulate the binary classification problem as a two-sample testing problem. More precisely, our classification model is a generic framework that is composed of two steps. In the first step, the distance between the test instance and each training instance is calculated to derive two distance sets. In the second step, the two-sample test is performed under the null hypothesis that the two sets of distances are drawn from the same cumulative distribution. After these two steps, we have two p-values for each test instance and the test instance is assigned to the class associated with the smaller p-value. Essentially, the presented classification method can be regarded as an instance-based classifier based on hypothesis testing. The experimental results on 38 real data sets show that our method is able to achieve the same level performance as several classic classifiers and has significantly better performance than existing testing-based classifiers. Furthermore, we can handle outlying instances and control the false discovery rate of test instances assigned to each class under the same framework.
first_indexed	2024-12-19T08:32:51Z
format	Article
id	doaj.art-9719b6314f43425daf8f4f682699fdb0
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-19T08:32:51Z
publishDate	2021-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-9719b6314f43425daf8f4f682699fdb02022-12-21T20:29:07ZengIEEEIEEE Access2169-35362021-01-019174851749410.1109/ACCESS.2021.30537789333560Instance-Based Classification Through Hypothesis TestingZengyou He0https://orcid.org/0000-0001-9526-8816Chaohua Sheng1https://orcid.org/0000-0002-6392-2411Yan Liu2https://orcid.org/0000-0002-1386-812XQuan Zou3https://orcid.org/0000-0001-6406-1142Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian University of Technology, Dalian, ChinaSchool of Software, Dalian University of Technology, Dalian, ChinaSchool of Software, Dalian University of Technology, Dalian, ChinaInstitute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, ChinaClassification is a fundamental problem in machine learning and data mining. During the past decades, numerous classification methods have been presented based on different principles. However, most existing classifiers cast the classification problem as an optimization problem and do not address the issue of statistical significance. In this paper, we formulate the binary classification problem as a two-sample testing problem. More precisely, our classification model is a generic framework that is composed of two steps. In the first step, the distance between the test instance and each training instance is calculated to derive two distance sets. In the second step, the two-sample test is performed under the null hypothesis that the two sets of distances are drawn from the same cumulative distribution. After these two steps, we have two p-values for each test instance and the test instance is assigned to the class associated with the smaller p-value. Essentially, the presented classification method can be regarded as an instance-based classifier based on hypothesis testing. The experimental results on 38 real data sets show that our method is able to achieve the same level performance as several classic classifiers and has significantly better performance than existing testing-based classifiers. Furthermore, we can handle outlying instances and control the false discovery rate of test instances assigned to each class under the same framework.https://ieeexplore.ieee.org/document/9333560/Classificationhypothesis testingtwo-sample testingmachine learning
spellingShingle	Zengyou He Chaohua Sheng Yan Liu Quan Zou Instance-Based Classification Through Hypothesis Testing IEEE Access Classification hypothesis testing two-sample testing machine learning
title	Instance-Based Classification Through Hypothesis Testing
title_full	Instance-Based Classification Through Hypothesis Testing
title_fullStr	Instance-Based Classification Through Hypothesis Testing
title_full_unstemmed	Instance-Based Classification Through Hypothesis Testing
title_short	Instance-Based Classification Through Hypothesis Testing
title_sort	instance based classification through hypothesis testing
topic	Classification hypothesis testing two-sample testing machine learning
url	https://ieeexplore.ieee.org/document/9333560/
work_keys_str_mv	AT zengyouhe instancebasedclassificationthroughhypothesistesting AT chaohuasheng instancebasedclassificationthroughhypothesistesting AT yanliu instancebasedclassificationthroughhypothesistesting AT quanzou instancebasedclassificationthroughhypothesistesting

Instance-Based Classification Through Hypothesis Testing

Similar Items