Finite sample corrections for parameters estimation and significance testing

An increasingly important problem in the era of Big Data is fitting data to distributions. However, many stop at visually inspecting the fits or use the coefficient of determination as a measure of the goodness of fit. In general, goodness-of-fit measures do not allow us to tell which of several dis...

Full description

Bibliographic Details
Main Authors:	Teh, Boon Kin, Tay, Darrell Jia Jie, Li, Sai Ping, Cheong, Siew Ann
Other Authors:	School of Physical and Mathematical Sciences
Format:	Journal Article
Language:	English
Published:	2021
Subjects:	Science::Physics Significance Testing Finite Sample Effects
Online Access:	https://hdl.handle.net/10356/146020

_version_	1811677828051107840
author	Teh, Boon Kin Tay, Darrell Jia Jie Li, Sai Ping Cheong, Siew Ann
author2	School of Physical and Mathematical Sciences
author_facet	School of Physical and Mathematical Sciences Teh, Boon Kin Tay, Darrell Jia Jie Li, Sai Ping Cheong, Siew Ann
author_sort	Teh, Boon Kin
collection	NTU
description	An increasingly important problem in the era of Big Data is fitting data to distributions. However, many stop at visually inspecting the fits or use the coefficient of determination as a measure of the goodness of fit. In general, goodness-of-fit measures do not allow us to tell which of several distributions fit the data best. Also, the likelihood of drawing the data from a distribution can be low even when the fit is good. To overcome these limitations, Clauset et al. advocated a three-step procedure for fitting any distribution: (i) estimate parameter(s) accurately, (ii) choosing and calculating an appropriate goodness of fit, (iii) test its significance to determine how likely this goodness of fit will appear in samples of the distribution. When we perform this significance testing on exponential distributions, we often obtain low significance values despite the fits being visually good. This led to our realization that most fitting methods do not account for effects due to the finite number of elements and the finite largest element. The former produces sample size dependence in the goodness of fits and the latter introduces a bias in the estimated parameter and the goodness of fit. We propose modifications to account for both and show that these corrections improve the significance of the fits of both real and simulated data. In addition, we used simulations and analytical approximations to verify that convergence rate of the estimated parameters toward its true value depends on how fast the largest element converge to infinity, and provide fast inversion formulas to obtain p-values directly from the adjusted test statistics, in place of doing more Monte Carlo simulations.
first_indexed	2024-10-01T02:43:34Z
format	Journal Article
id	ntu-10356/146020
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T02:43:34Z
publishDate	2021
record_format	dspace
spelling	ntu-10356/1460202023-02-28T19:31:02Z Finite sample corrections for parameters estimation and significance testing Teh, Boon Kin Tay, Darrell Jia Jie Li, Sai Ping Cheong, Siew Ann School of Physical and Mathematical Sciences Science::Physics Significance Testing Finite Sample Effects An increasingly important problem in the era of Big Data is fitting data to distributions. However, many stop at visually inspecting the fits or use the coefficient of determination as a measure of the goodness of fit. In general, goodness-of-fit measures do not allow us to tell which of several distributions fit the data best. Also, the likelihood of drawing the data from a distribution can be low even when the fit is good. To overcome these limitations, Clauset et al. advocated a three-step procedure for fitting any distribution: (i) estimate parameter(s) accurately, (ii) choosing and calculating an appropriate goodness of fit, (iii) test its significance to determine how likely this goodness of fit will appear in samples of the distribution. When we perform this significance testing on exponential distributions, we often obtain low significance values despite the fits being visually good. This led to our realization that most fitting methods do not account for effects due to the finite number of elements and the finite largest element. The former produces sample size dependence in the goodness of fits and the latter introduces a bias in the estimated parameter and the goodness of fit. We propose modifications to account for both and show that these corrections improve the significance of the fits of both real and simulated data. In addition, we used simulations and analytical approximations to verify that convergence rate of the estimated parameters toward its true value depends on how fast the largest element converge to infinity, and provide fast inversion formulas to obtain p-values directly from the adjusted test statistics, in place of doing more Monte Carlo simulations. Ministry of Education (MOE) Published version This research is supported by the Singapore Ministry of Education Academic Research Fund Tier 2 under Grant Number MOE2015-T2-2-012. 2021-01-21T05:08:54Z 2021-01-21T05:08:54Z 2018 Journal Article Teh, B. K., Tay, D. J. J., Li, S. P., & Cheong, S. A. (2018). Finite sample corrections for parameters estimation and significance testing. Frontiers in Applied Mathematics and Statistics, 4, 2-. doi:10.3389/fams.2018.00002 2297-4687 https://hdl.handle.net/10356/146020 10.3389/fams.2018.00002 2-s2.0-85097310531 4 en MOE2015-T2-2-012 Frontiers in Applied Mathematics and Statistics © 2018 Teh, Tay, Li and Cheong. This is an open-access article distributed under the terms of the Creative Commons Attribution License(CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. application/pdf
spellingShingle	Science::Physics Significance Testing Finite Sample Effects Teh, Boon Kin Tay, Darrell Jia Jie Li, Sai Ping Cheong, Siew Ann Finite sample corrections for parameters estimation and significance testing
title	Finite sample corrections for parameters estimation and significance testing
title_full	Finite sample corrections for parameters estimation and significance testing
title_fullStr	Finite sample corrections for parameters estimation and significance testing
title_full_unstemmed	Finite sample corrections for parameters estimation and significance testing
title_short	Finite sample corrections for parameters estimation and significance testing
title_sort	finite sample corrections for parameters estimation and significance testing
topic	Science::Physics Significance Testing Finite Sample Effects
url	https://hdl.handle.net/10356/146020
work_keys_str_mv	AT tehboonkin finitesamplecorrectionsforparametersestimationandsignificancetesting AT taydarrelljiajie finitesamplecorrectionsforparametersestimationandsignificancetesting AT lisaiping finitesamplecorrectionsforparametersestimationandsignificancetesting AT cheongsiewann finitesamplecorrectionsforparametersestimationandsignificancetesting

Finite sample corrections for parameters estimation and significance testing

Similar Items