Statistical model selection with “Big Data”

Big Data offer potential benefits for statistical modelling, but confront problems including an excess of false positives, mistaking correlations for causes, ignoring sampling biases and selecting by inappropriate methods. We consider the many important requirements when searching for a data-based r...

Full description

Bibliographic Details
Main Authors: Jurgen A. Doornik, David F. Hendry
Format: Article
Language:English
Published: Taylor & Francis Group 2015-12-01
Series:Cogent Economics & Finance
Subjects:
Online Access:http://dx.doi.org/10.1080/23322039.2015.1045216
_version_ 1818025497498484736
author Jurgen A. Doornik
David F. Hendry
author_facet Jurgen A. Doornik
David F. Hendry
author_sort Jurgen A. Doornik
collection DOAJ
description Big Data offer potential benefits for statistical modelling, but confront problems including an excess of false positives, mistaking correlations for causes, ignoring sampling biases and selecting by inappropriate methods. We consider the many important requirements when searching for a data-based relationship using Big Data, and the possible role of Autometrics in that context. Paramount considerations include embedding relationships in general initial models, possibly restricting the number of variables to be selected over by non-statistical criteria (the formulation problem), using good quality data on all variables, analyzed with tight significance levels by a powerful selection procedure, retaining available theory insights (the selection problem) while testing for relationships being well specified and invariant to shifts in explanatory variables (the evaluation problem), using a viable approach that resolves the computational problem of immense numbers of possible models.
first_indexed 2024-12-10T04:17:03Z
format Article
id doaj.art-31675d0732a14fbda0eccd21536555bd
institution Directory Open Access Journal
issn 2332-2039
language English
last_indexed 2024-12-10T04:17:03Z
publishDate 2015-12-01
publisher Taylor & Francis Group
record_format Article
series Cogent Economics & Finance
spelling doaj.art-31675d0732a14fbda0eccd21536555bd2022-12-22T02:02:33ZengTaylor & Francis GroupCogent Economics & Finance2332-20392015-12-013110.1080/23322039.2015.10452161045216Statistical model selection with “Big Data”Jurgen A. Doornik0David F. Hendry1Institute for New Economic Thinking, Oxford Martin SchoolInstitute for New Economic Thinking, Oxford Martin SchoolBig Data offer potential benefits for statistical modelling, but confront problems including an excess of false positives, mistaking correlations for causes, ignoring sampling biases and selecting by inappropriate methods. We consider the many important requirements when searching for a data-based relationship using Big Data, and the possible role of Autometrics in that context. Paramount considerations include embedding relationships in general initial models, possibly restricting the number of variables to be selected over by non-statistical criteria (the formulation problem), using good quality data on all variables, analyzed with tight significance levels by a powerful selection procedure, retaining available theory insights (the selection problem) while testing for relationships being well specified and invariant to shifts in explanatory variables (the evaluation problem), using a viable approach that resolves the computational problem of immense numbers of possible models.http://dx.doi.org/10.1080/23322039.2015.1045216Big Datamodel selectionlocation shiftsAutometricscomputational problems
spellingShingle Jurgen A. Doornik
David F. Hendry
Statistical model selection with “Big Data”
Cogent Economics & Finance
Big Data
model selection
location shifts
Autometrics
computational problems
title Statistical model selection with “Big Data”
title_full Statistical model selection with “Big Data”
title_fullStr Statistical model selection with “Big Data”
title_full_unstemmed Statistical model selection with “Big Data”
title_short Statistical model selection with “Big Data”
title_sort statistical model selection with big data
topic Big Data
model selection
location shifts
Autometrics
computational problems
url http://dx.doi.org/10.1080/23322039.2015.1045216
work_keys_str_mv AT jurgenadoornik statisticalmodelselectionwithbigdata
AT davidfhendry statisticalmodelselectionwithbigdata