fsdaSAS: A Package for Robust Regression for Very Large Datasets Including the Batch Forward Search

The forward search (FS) is a general method of robust data fitting that moves smoothly from very robust to maximum likelihood estimation. The regression procedures are included in the MATLAB toolbox FSDA. The work on a SAS version of the FS originates from the need for the analysis of large datasets...

Full description

Bibliographic Details
Main Authors: Francesca Torti, Aldo Corbellini, Anthony C. Atkinson
Format: Article
Language:English
Published: MDPI AG 2021-04-01
Series:Stats
Subjects:
Online Access:https://www.mdpi.com/2571-905X/4/2/22
_version_ 1797537255111262208
author Francesca Torti
Aldo Corbellini
Anthony C. Atkinson
author_facet Francesca Torti
Aldo Corbellini
Anthony C. Atkinson
author_sort Francesca Torti
collection DOAJ
description The forward search (FS) is a general method of robust data fitting that moves smoothly from very robust to maximum likelihood estimation. The regression procedures are included in the MATLAB toolbox FSDA. The work on a SAS version of the FS originates from the need for the analysis of large datasets expressed by law enforcement services operating in the European Union that use our SAS software for detecting data anomalies that may point to fraudulent customs returns. Specific to our SAS implementation, the <i>fsdaSAS package</i>, we describe the approximation used to provide fast analyses of large datasets using an FS which progresses through the inclusion of batches of observations, rather than progressing one observation at a time. We do, however, test for outliers one observation at a time. We demonstrate that our SAS implementation becomes appreciably faster than the MATLAB version as the sample size increases and is also able to analyse larger datasets. The series of fits provided by the FS leads to the adaptive data-dependent choice of maximally efficient robust estimates. This also allows the monitoring of residuals and parameter estimates for fits of differing robustness levels. We mention that our fsdaSAS also applies the idea of monitoring to several robust estimators for regression for a range of values of breakdown point or nominal efficiency, leading to adaptive values for these parameters. We have also provided a variety of plots linked through brushing. Further programmed analyses include the robust transformations of the response in regression. Our package also provides the SAS community with methods of monitoring robust estimators for multivariate data, including multivariate data transformations.
first_indexed 2024-03-10T12:12:32Z
format Article
id doaj.art-6708067102d04367b486038ca327399c
institution Directory Open Access Journal
issn 2571-905X
language English
last_indexed 2024-03-10T12:12:32Z
publishDate 2021-04-01
publisher MDPI AG
record_format Article
series Stats
spelling doaj.art-6708067102d04367b486038ca327399c2023-11-21T16:05:47ZengMDPI AGStats2571-905X2021-04-014232734710.3390/stats4020022fsdaSAS: A Package for Robust Regression for Very Large Datasets Including the Batch Forward SearchFrancesca Torti0Aldo Corbellini1Anthony C. Atkinson2European Commission, Joint Research Centre (JRC), 21027 Ispra, ItalyDepartment of Economics and Management, University of Parma, 43125 Parma, ItalyDepartment of Statistics, The London School of Economics, London WC2A 2AE, UKThe forward search (FS) is a general method of robust data fitting that moves smoothly from very robust to maximum likelihood estimation. The regression procedures are included in the MATLAB toolbox FSDA. The work on a SAS version of the FS originates from the need for the analysis of large datasets expressed by law enforcement services operating in the European Union that use our SAS software for detecting data anomalies that may point to fraudulent customs returns. Specific to our SAS implementation, the <i>fsdaSAS package</i>, we describe the approximation used to provide fast analyses of large datasets using an FS which progresses through the inclusion of batches of observations, rather than progressing one observation at a time. We do, however, test for outliers one observation at a time. We demonstrate that our SAS implementation becomes appreciably faster than the MATLAB version as the sample size increases and is also able to analyse larger datasets. The series of fits provided by the FS leads to the adaptive data-dependent choice of maximally efficient robust estimates. This also allows the monitoring of residuals and parameter estimates for fits of differing robustness levels. We mention that our fsdaSAS also applies the idea of monitoring to several robust estimators for regression for a range of values of breakdown point or nominal efficiency, leading to adaptive values for these parameters. We have also provided a variety of plots linked through brushing. Further programmed analyses include the robust transformations of the response in regression. Our package also provides the SAS community with methods of monitoring robust estimators for multivariate data, including multivariate data transformations.https://www.mdpi.com/2571-905X/4/2/22approximate analysisbig datalinked plotsmonitoringrobust regression
spellingShingle Francesca Torti
Aldo Corbellini
Anthony C. Atkinson
fsdaSAS: A Package for Robust Regression for Very Large Datasets Including the Batch Forward Search
Stats
approximate analysis
big data
linked plots
monitoring
robust regression
title fsdaSAS: A Package for Robust Regression for Very Large Datasets Including the Batch Forward Search
title_full fsdaSAS: A Package for Robust Regression for Very Large Datasets Including the Batch Forward Search
title_fullStr fsdaSAS: A Package for Robust Regression for Very Large Datasets Including the Batch Forward Search
title_full_unstemmed fsdaSAS: A Package for Robust Regression for Very Large Datasets Including the Batch Forward Search
title_short fsdaSAS: A Package for Robust Regression for Very Large Datasets Including the Batch Forward Search
title_sort fsdasas a package for robust regression for very large datasets including the batch forward search
topic approximate analysis
big data
linked plots
monitoring
robust regression
url https://www.mdpi.com/2571-905X/4/2/22
work_keys_str_mv AT francescatorti fsdasasapackageforrobustregressionforverylargedatasetsincludingthebatchforwardsearch
AT aldocorbellini fsdasasapackageforrobustregressionforverylargedatasetsincludingthebatchforwardsearch
AT anthonycatkinson fsdasasapackageforrobustregressionforverylargedatasetsincludingthebatchforwardsearch