A multistage mathematical approach to automated clustering of high-dimensional noisy data

A critical problem faced in many scientific fields is the adequate separation of data derived from individual sources. Often, such datasets require analysis of multiple features in a highly multidimensional space, with overlap of features and sources. The datasets generated by simultaneous recording...

Full description

Bibliographic Details
Main Authors:	Friedman, Alexander, Keselman, Michael D., Gibb, Leif G., Graybiel, Ann M.
Other Authors:	Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences
Format:	Article
Language:	en_US
Published:	National Academy of Sciences (U.S.) 2015
Online Access:	http://hdl.handle.net/1721.1/99117 https://orcid.org/0000-0002-4326-7720

_version_	1826216693550022656
author	Friedman, Alexander Keselman, Michael D. Gibb, Leif G. Graybiel, Ann M.
author2	Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences
author_facet	Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences Friedman, Alexander Keselman, Michael D. Gibb, Leif G. Graybiel, Ann M.
author_sort	Friedman, Alexander
collection	MIT
description	A critical problem faced in many scientific fields is the adequate separation of data derived from individual sources. Often, such datasets require analysis of multiple features in a highly multidimensional space, with overlap of features and sources. The datasets generated by simultaneous recording from hundreds of neurons emitting phasic action potentials have produced the challenge of separating the recorded signals into independent data subsets (clusters) corresponding to individual signal-generating neurons. Mathematical methods have been developed over the past three decades to achieve such spike clustering, but a complete solution with fully automated cluster identification has not been achieved. We propose here a fully automated mathematical approach that identifies clusters in multidimensional space through recursion, which combats the multidimensionality of the data. Recursion is paired with an approach to dimensional evaluation, in which each dimension of a dataset is examined for its informational importance for clustering. The dimensions offering greater informational importance are given added weight during recursive clustering. To combat strong background activity, our algorithm takes an iterative approach of data filtering according to a signal-to-noise ratio metric. The algorithm finds cluster cores, which are thereafter expanded to include complete clusters. This mathematical approach can be extended from its prototype context of spike sorting to other datasets that suffer from high dimensionality and background activity.
first_indexed	2024-09-23T16:51:35Z
format	Article
id	mit-1721.1/99117
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T16:51:35Z
publishDate	2015
publisher	National Academy of Sciences (U.S.)
record_format	dspace
spelling	mit-1721.1/991172022-10-03T08:45:31Z A multistage mathematical approach to automated clustering of high-dimensional noisy data Friedman, Alexander Keselman, Michael D. Gibb, Leif G. Graybiel, Ann M. Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences McGovern Institute for Brain Research at MIT Friedman, Alexander Keselman, Michael D. Gibb, Leif G. Graybiel, Ann M. A critical problem faced in many scientific fields is the adequate separation of data derived from individual sources. Often, such datasets require analysis of multiple features in a highly multidimensional space, with overlap of features and sources. The datasets generated by simultaneous recording from hundreds of neurons emitting phasic action potentials have produced the challenge of separating the recorded signals into independent data subsets (clusters) corresponding to individual signal-generating neurons. Mathematical methods have been developed over the past three decades to achieve such spike clustering, but a complete solution with fully automated cluster identification has not been achieved. We propose here a fully automated mathematical approach that identifies clusters in multidimensional space through recursion, which combats the multidimensionality of the data. Recursion is paired with an approach to dimensional evaluation, in which each dimension of a dataset is examined for its informational importance for clustering. The dimensions offering greater informational importance are given added weight during recursive clustering. To combat strong background activity, our algorithm takes an iterative approach of data filtering according to a signal-to-noise ratio metric. The algorithm finds cluster cores, which are thereafter expanded to include complete clusters. This mathematical approach can be extended from its prototype context of spike sorting to other datasets that suffer from high dimensionality and background activity. National Institutes of Health (U.S.) (Grant R01 MH060379) United States. Defense Advanced Research Projects Agency United States. Army Research Office (Grant W911NF-10-1-0059) Cure Huntington’s Disease Initiative, Inc. (Grant A-5552) 2015-10-01T12:37:33Z 2015-10-01T12:37:33Z 2015-04 2015-01 Article http://purl.org/eprint/type/JournalArticle 0027-8424 1091-6490 http://hdl.handle.net/1721.1/99117 Friedman, Alexander, Michael D. Keselman, Leif G. Gibb, and Ann M. Graybiel. “A Multistage Mathematical Approach to Automated Clustering of High-Dimensional Noisy Data.” Proc Natl Acad Sci USA 112, no. 14 (March 23, 2015): 4477–4482. https://orcid.org/0000-0002-4326-7720 en_US http://dx.doi.org/10.1073/pnas.1503940112 Proceedings of the National Academy of Sciences Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf National Academy of Sciences (U.S.) National Academy of Sciences (U.S.)
spellingShingle	Friedman, Alexander Keselman, Michael D. Gibb, Leif G. Graybiel, Ann M. A multistage mathematical approach to automated clustering of high-dimensional noisy data
title	A multistage mathematical approach to automated clustering of high-dimensional noisy data
title_full	A multistage mathematical approach to automated clustering of high-dimensional noisy data
title_fullStr	A multistage mathematical approach to automated clustering of high-dimensional noisy data
title_full_unstemmed	A multistage mathematical approach to automated clustering of high-dimensional noisy data
title_short	A multistage mathematical approach to automated clustering of high-dimensional noisy data
title_sort	multistage mathematical approach to automated clustering of high dimensional noisy data
url	http://hdl.handle.net/1721.1/99117 https://orcid.org/0000-0002-4326-7720
work_keys_str_mv	AT friedmanalexander amultistagemathematicalapproachtoautomatedclusteringofhighdimensionalnoisydata AT keselmanmichaeld amultistagemathematicalapproachtoautomatedclusteringofhighdimensionalnoisydata AT gibbleifg amultistagemathematicalapproachtoautomatedclusteringofhighdimensionalnoisydata AT graybielannm amultistagemathematicalapproachtoautomatedclusteringofhighdimensionalnoisydata AT friedmanalexander multistagemathematicalapproachtoautomatedclusteringofhighdimensionalnoisydata AT keselmanmichaeld multistagemathematicalapproachtoautomatedclusteringofhighdimensionalnoisydata AT gibbleifg multistagemathematicalapproachtoautomatedclusteringofhighdimensionalnoisydata AT graybielannm multistagemathematicalapproachtoautomatedclusteringofhighdimensionalnoisydata

A multistage mathematical approach to automated clustering of high-dimensional noisy data

Similar Items