Estimating the number of principal components via Split-Half Eigenvector Matching (SHEM)

Estimating the number of principal components to retain for dimension reduction is a critical step in many applications of principal component analysis. Common methods may not be optimal, however. The current paper presents an alternative procedure that aims to recover the true number of principal c...

ver descrição completa

Detalhes bibliográficos
Autor principal: Thomas E. Gladwin
Formato: Artigo
Idioma:English
Publicado em: Elsevier 2023-12-01
Colecção:MethodsX
Assuntos:
Acesso em linha:http://www.sciencedirect.com/science/article/pii/S2215016123002832
_version_ 1827596427961827328
author Thomas E. Gladwin
author_facet Thomas E. Gladwin
author_sort Thomas E. Gladwin
collection DOAJ
description Estimating the number of principal components to retain for dimension reduction is a critical step in many applications of principal component analysis. Common methods may not be optimal, however. The current paper presents an alternative procedure that aims to recover the true number of principal components, in the sense of the number of independent vectors involved in the generation of the data. • Data are split into random halves repeatedly. • For each split, the eigenvectors in one half are compared to those in the other. • The split between high and low similarities is used to estimate the number of principal components.The method is a proof of principle that similarity over split-halves of the data may provide a useful approach to estimating the number of components in dimension reduction, or of similar dimensions in other models.
first_indexed 2024-03-09T03:10:26Z
format Article
id doaj.art-e7f7eb6ffe7846d2b1666873c21ae922
institution Directory Open Access Journal
issn 2215-0161
language English
last_indexed 2024-03-09T03:10:26Z
publishDate 2023-12-01
publisher Elsevier
record_format Article
series MethodsX
spelling doaj.art-e7f7eb6ffe7846d2b1666873c21ae9222023-12-04T05:22:11ZengElsevierMethodsX2215-01612023-12-0111102286Estimating the number of principal components via Split-Half Eigenvector Matching (SHEM)Thomas E. Gladwin0Experience Design Team, Institute for Globally Distributed Open Research and Education (IGDORE), Sopra Steria, 6th Floor, 1 Bartholomew Close, EC1A 7BL London, United KingdomEstimating the number of principal components to retain for dimension reduction is a critical step in many applications of principal component analysis. Common methods may not be optimal, however. The current paper presents an alternative procedure that aims to recover the true number of principal components, in the sense of the number of independent vectors involved in the generation of the data. • Data are split into random halves repeatedly. • For each split, the eigenvectors in one half are compared to those in the other. • The split between high and low similarities is used to estimate the number of principal components.The method is a proof of principle that similarity over split-halves of the data may provide a useful approach to estimating the number of components in dimension reduction, or of similar dimensions in other models.http://www.sciencedirect.com/science/article/pii/S2215016123002832SHEM: Split-Half Eigenvector Matching
spellingShingle Thomas E. Gladwin
Estimating the number of principal components via Split-Half Eigenvector Matching (SHEM)
MethodsX
SHEM: Split-Half Eigenvector Matching
title Estimating the number of principal components via Split-Half Eigenvector Matching (SHEM)
title_full Estimating the number of principal components via Split-Half Eigenvector Matching (SHEM)
title_fullStr Estimating the number of principal components via Split-Half Eigenvector Matching (SHEM)
title_full_unstemmed Estimating the number of principal components via Split-Half Eigenvector Matching (SHEM)
title_short Estimating the number of principal components via Split-Half Eigenvector Matching (SHEM)
title_sort estimating the number of principal components via split half eigenvector matching shem
topic SHEM: Split-Half Eigenvector Matching
url http://www.sciencedirect.com/science/article/pii/S2215016123002832
work_keys_str_mv AT thomasegladwin estimatingthenumberofprincipalcomponentsviasplithalfeigenvectormatchingshem