Statistical Properties of Multivariate Distance Matrix Regression for High Dimensional Data Analysis

Multivariate distance matrix regression (MDMR) analysis is a statistical technique that allows researchers to relate P variables to an additional M factors collected on N individuals, where P>>N. The technique can be applied to a number of research settings involving high dimensional d...

Full description

Bibliographic Details
Main Authors: Nicholas J Schork, Matthew A Zapala
Format: Article
Language:English
Published: Frontiers Media S.A. 2012-09-01
Series:Frontiers in Genetics
Subjects:
Online Access:http://journal.frontiersin.org/Journal/10.3389/fgene.2012.00190/full
_version_ 1818390081236369408
author Nicholas J Schork
Nicholas J Schork
Matthew A Zapala
author_facet Nicholas J Schork
Nicholas J Schork
Matthew A Zapala
author_sort Nicholas J Schork
collection DOAJ
description Multivariate distance matrix regression (MDMR) analysis is a statistical technique that allows researchers to relate P variables to an additional M factors collected on N individuals, where P>>N. The technique can be applied to a number of research settings involving high dimensional data types such as DNA sequence data, gene expression microarray data and imaging data. MDMR analysis involves computing the distance between all pairs of individuals with respect to P variables of interest and constructing an N x N matrix whose elements reflect these distances. Permutation tests can be used to test linear hypotheses that consider whether or not the M additional factors collected on the individuals can explain variation in the observed distances between and among the N individuals as reflected in the matrix. MDMR analysis is an excellent complement to cluster analysis and other traditional multivariate analysis techniques. Despite its appeal and utility, properties of the statistics used in MDMR analysis have not been explored in detail. In this paper we consider the level accuracy and power of MDMR analysis assuming different distance measures and analysis settings. We also describe the utility of MDMR analysis in assessing hypotheses about the appropriate number of clusters arising from a cluster analysis.
first_indexed 2024-12-14T04:51:57Z
format Article
id doaj.art-f4742ae8eb32454782d4fdcf6aff157c
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-12-14T04:51:57Z
publishDate 2012-09-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-f4742ae8eb32454782d4fdcf6aff157c2022-12-21T23:16:31ZengFrontiers Media S.A.Frontiers in Genetics1664-80212012-09-01310.3389/fgene.2012.0019031294Statistical Properties of Multivariate Distance Matrix Regression for High Dimensional Data AnalysisNicholas J Schork0Nicholas J Schork1Matthew A Zapala2The Scripps Research InstituteThe Scripps Translational Science InstituteUniversity of California, San DiegoMultivariate distance matrix regression (MDMR) analysis is a statistical technique that allows researchers to relate P variables to an additional M factors collected on N individuals, where P>>N. The technique can be applied to a number of research settings involving high dimensional data types such as DNA sequence data, gene expression microarray data and imaging data. MDMR analysis involves computing the distance between all pairs of individuals with respect to P variables of interest and constructing an N x N matrix whose elements reflect these distances. Permutation tests can be used to test linear hypotheses that consider whether or not the M additional factors collected on the individuals can explain variation in the observed distances between and among the N individuals as reflected in the matrix. MDMR analysis is an excellent complement to cluster analysis and other traditional multivariate analysis techniques. Despite its appeal and utility, properties of the statistics used in MDMR analysis have not been explored in detail. In this paper we consider the level accuracy and power of MDMR analysis assuming different distance measures and analysis settings. We also describe the utility of MDMR analysis in assessing hypotheses about the appropriate number of clusters arising from a cluster analysis.http://journal.frontiersin.org/Journal/10.3389/fgene.2012.00190/fullsimulationmultivariate analysisRegression AnalysisDistance MatrixHigh dimensional data
spellingShingle Nicholas J Schork
Nicholas J Schork
Matthew A Zapala
Statistical Properties of Multivariate Distance Matrix Regression for High Dimensional Data Analysis
Frontiers in Genetics
simulation
multivariate analysis
Regression Analysis
Distance Matrix
High dimensional data
title Statistical Properties of Multivariate Distance Matrix Regression for High Dimensional Data Analysis
title_full Statistical Properties of Multivariate Distance Matrix Regression for High Dimensional Data Analysis
title_fullStr Statistical Properties of Multivariate Distance Matrix Regression for High Dimensional Data Analysis
title_full_unstemmed Statistical Properties of Multivariate Distance Matrix Regression for High Dimensional Data Analysis
title_short Statistical Properties of Multivariate Distance Matrix Regression for High Dimensional Data Analysis
title_sort statistical properties of multivariate distance matrix regression for high dimensional data analysis
topic simulation
multivariate analysis
Regression Analysis
Distance Matrix
High dimensional data
url http://journal.frontiersin.org/Journal/10.3389/fgene.2012.00190/full
work_keys_str_mv AT nicholasjschork statisticalpropertiesofmultivariatedistancematrixregressionforhighdimensionaldataanalysis
AT nicholasjschork statisticalpropertiesofmultivariatedistancematrixregressionforhighdimensionaldataanalysis
AT matthewazapala statisticalpropertiesofmultivariatedistancematrixregressionforhighdimensionaldataanalysis