On the Accuracy of Cross-Validation in the Classification Problem

In this work we will study the accuracy of the cross-validation estimates for decision functions. The main idea of the research consists in the scheme of statistical modeling that allows using real data to obtain statistical estimates, which are usually obtained only by using model (synthetic) distr...

Full description

Bibliographic Details
Main Author:	V. M. Nedel’ko
Format:	Article
Language:	English
Published:	Irkutsk State University 2021-12-01
Series:	Известия Иркутского государственного университета: Серия "Математика"
Subjects:	k-fold cross-validation accuracy statistical estimates machinelearning
Online Access:	http://mathizv.isu.ru/en/article/file?id=1395

_version_	1818888298186145792
author	V. M. Nedel’ko
author_facet	V. M. Nedel’ko
author_sort	V. M. Nedel’ko
collection	DOAJ
description	In this work we will study the accuracy of the cross-validation estimates for decision functions. The main idea of the research consists in the scheme of statistical modeling that allows using real data to obtain statistical estimates, which are usually obtained only by using model (synthetic) distributions. The studies confirm the well-known empirical recommendation to choose the number of folds equal to 5 or more. The choice of more than 10 folds does not yield a significant increase in accuracy. The use of repeated cross-validation also does not provide fundamental gain in precision. The results of the experiments allow us to formulate an empirical fact that the accuracy of the estimates obtained by the cross-validation method is approximately the same as the accuracy of the estimates obtained from the test sample of half the size. This result can be easily explained by the fact that all the objects of the test sample are independent, and the estimates built by the cross-validation on different subsamples (folds) are not independent.
first_indexed	2024-12-19T16:50:51Z
format	Article
id	doaj.art-61b2814df5284b889c9a59b9a53b6850
institution	Directory Open Access Journal
issn	1997-7670 2541-8785
language	English
last_indexed	2024-12-19T16:50:51Z
publishDate	2021-12-01
publisher	Irkutsk State University
record_format	Article
series	Известия Иркутского государственного университета: Серия "Математика"
spelling	doaj.art-61b2814df5284b889c9a59b9a53b68502022-12-21T20:13:32ZengIrkutsk State UniversityИзвестия Иркутского государственного университета: Серия "Математика"1997-76702541-87852021-12-013818495https://doi.org/10.26516/1997-7670.2021.38.84On the Accuracy of Cross-Validation in the Classification ProblemV. M. Nedel’koIn this work we will study the accuracy of the cross-validation estimates for decision functions. The main idea of the research consists in the scheme of statistical modeling that allows using real data to obtain statistical estimates, which are usually obtained only by using model (synthetic) distributions. The studies confirm the well-known empirical recommendation to choose the number of folds equal to 5 or more. The choice of more than 10 folds does not yield a significant increase in accuracy. The use of repeated cross-validation also does not provide fundamental gain in precision. The results of the experiments allow us to formulate an empirical fact that the accuracy of the estimates obtained by the cross-validation method is approximately the same as the accuracy of the estimates obtained from the test sample of half the size. This result can be easily explained by the fact that all the objects of the test sample are independent, and the estimates built by the cross-validation on different subsamples (folds) are not independent.http://mathizv.isu.ru/en/article/file?id=1395k-fold cross-validationaccuracystatistical estimatesmachinelearning
spellingShingle	V. M. Nedel’ko On the Accuracy of Cross-Validation in the Classification Problem Известия Иркутского государственного университета: Серия "Математика" k-fold cross-validation accuracy statistical estimates machinelearning
title	On the Accuracy of Cross-Validation in the Classification Problem
title_full	On the Accuracy of Cross-Validation in the Classification Problem
title_fullStr	On the Accuracy of Cross-Validation in the Classification Problem
title_full_unstemmed	On the Accuracy of Cross-Validation in the Classification Problem
title_short	On the Accuracy of Cross-Validation in the Classification Problem
title_sort	on the accuracy of cross validation in the classification problem
topic	k-fold cross-validation accuracy statistical estimates machinelearning
url	http://mathizv.isu.ru/en/article/file?id=1395
work_keys_str_mv	AT vmnedelko ontheaccuracyofcrossvalidationintheclassificationproblem

On the Accuracy of Cross-Validation in the Classification Problem

Similar Items