MCAM: Multiple Clustering Analysis Methodology for Deriving Hypotheses and Insights from High-Throughput Proteomic Datasets

Advances in proteomic technologies continue to substantially accelerate capability for generating experimental data on protein levels, states, and activities in biological samples. For example, studies on receptor tyrosine kinase signaling networks can now capture the phosphorylation state of hundre...

Full description

Bibliographic Details
Main Authors: Naegle, Kristen Marie, White, Forest M., Welsch, Roy E, Yaffe, Michael B, Lauffenburger, Douglas A
Other Authors: Massachusetts Institute of Technology. Department of Biological Engineering
Format: Article
Language:en_US
Published: Public Library of Science 2011
Online Access:http://hdl.handle.net/1721.1/66157
https://orcid.org/0000-0002-1545-1651
https://orcid.org/0000-0002-9038-1622
https://orcid.org/0000-0002-9547-3251
_version_ 1811084409641631744
author Naegle, Kristen Marie
White, Forest M.
Welsch, Roy E
Yaffe, Michael B
Lauffenburger, Douglas A
author2 Massachusetts Institute of Technology. Department of Biological Engineering
author_facet Massachusetts Institute of Technology. Department of Biological Engineering
Naegle, Kristen Marie
White, Forest M.
Welsch, Roy E
Yaffe, Michael B
Lauffenburger, Douglas A
author_sort Naegle, Kristen Marie
collection MIT
description Advances in proteomic technologies continue to substantially accelerate capability for generating experimental data on protein levels, states, and activities in biological samples. For example, studies on receptor tyrosine kinase signaling networks can now capture the phosphorylation state of hundreds to thousands of proteins across multiple conditions. However, little is known about the function of many of these protein modifications, or the enzymes responsible for modifying them. To address this challenge, we have developed an approach that enhances the power of clustering techniques to infer functional and regulatory meaning of protein states in cell signaling networks. We have created a new computational framework for applying clustering to biological data in order to overcome the typical dependence on specific a priori assumptions and expert knowledge concerning the technical aspects of clustering. Multiple clustering analysis methodology (‘MCAM’) employs an array of diverse data transformations, distance metrics, set sizes, and clustering algorithms, in a combinatorial fashion, to create a suite of clustering sets. These sets are then evaluated based on their ability to produce biological insights through statistical enrichment of metadata relating to knowledge concerning protein functions, kinase substrates, and sequence motifs. We applied MCAM to a set of dynamic phosphorylation measurements of the ERRB network to explore the relationships between algorithmic parameters and the biological meaning that could be inferred and report on interesting biological predictions. Further, we applied MCAM to multiple phosphoproteomic datasets for the ERBB network, which allowed us to compare independent and incomplete overlapping measurements of phosphorylation sites in the network. We report specific and global differences of the ERBB network stimulated with different ligands and with changes in HER2 expression. Overall, we offer MCAM as a broadly-applicable approach for analysis of proteomic data which may help increase the current understanding of molecular networks in a variety of biological problems.
first_indexed 2024-09-23T12:50:16Z
format Article
id mit-1721.1/66157
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T12:50:16Z
publishDate 2011
publisher Public Library of Science
record_format dspace
spelling mit-1721.1/661572022-10-01T11:24:46Z MCAM: Multiple Clustering Analysis Methodology for Deriving Hypotheses and Insights from High-Throughput Proteomic Datasets Naegle, Kristen Marie White, Forest M. Welsch, Roy E Yaffe, Michael B Lauffenburger, Douglas A Massachusetts Institute of Technology. Department of Biological Engineering Massachusetts Institute of Technology. Department of Biology Sloan School of Management Koch Institute for Integrative Cancer Research at MIT Lauffenburger, Douglas A. Naegle, Kristen Marie Welsch, Roy E. Yaffe, Michael B. White, Forest M. Lauffenburger, Douglas A. Advances in proteomic technologies continue to substantially accelerate capability for generating experimental data on protein levels, states, and activities in biological samples. For example, studies on receptor tyrosine kinase signaling networks can now capture the phosphorylation state of hundreds to thousands of proteins across multiple conditions. However, little is known about the function of many of these protein modifications, or the enzymes responsible for modifying them. To address this challenge, we have developed an approach that enhances the power of clustering techniques to infer functional and regulatory meaning of protein states in cell signaling networks. We have created a new computational framework for applying clustering to biological data in order to overcome the typical dependence on specific a priori assumptions and expert knowledge concerning the technical aspects of clustering. Multiple clustering analysis methodology (‘MCAM’) employs an array of diverse data transformations, distance metrics, set sizes, and clustering algorithms, in a combinatorial fashion, to create a suite of clustering sets. These sets are then evaluated based on their ability to produce biological insights through statistical enrichment of metadata relating to knowledge concerning protein functions, kinase substrates, and sequence motifs. We applied MCAM to a set of dynamic phosphorylation measurements of the ERRB network to explore the relationships between algorithmic parameters and the biological meaning that could be inferred and report on interesting biological predictions. Further, we applied MCAM to multiple phosphoproteomic datasets for the ERBB network, which allowed us to compare independent and incomplete overlapping measurements of phosphorylation sites in the network. We report specific and global differences of the ERBB network stimulated with different ligands and with changes in HER2 expression. Overall, we offer MCAM as a broadly-applicable approach for analysis of proteomic data which may help increase the current understanding of molecular networks in a variety of biological problems. National Institutes of Health (U.S.) (NIH-U54-CA112967 ) National Institutes of Health (U.S.) (NIH-R01-CA096504) 2011-10-03T16:05:12Z 2011-10-03T16:05:12Z 2011-07 2011-02 Article http://purl.org/eprint/type/JournalArticle 1553-7358 1553-734X http://hdl.handle.net/1721.1/66157 Naegle, Kristen M. et al. “MCAM: Multiple Clustering Analysis Methodology for Deriving Hypotheses and Insights from High-Throughput Proteomic Datasets.” Ed. Jason A. Papin. PLoS Computational Biology 7 (2011): e1002119. https://orcid.org/0000-0002-1545-1651 https://orcid.org/0000-0002-9038-1622 https://orcid.org/0000-0002-9547-3251 en_US http://dx.doi.org/10.1371/journal.pcbi.1002119 PLoS Computational Biology Creative Commons Attribution http://creativecommons.org/licenses/by/2.5/ application/pdf Public Library of Science PLoS
spellingShingle Naegle, Kristen Marie
White, Forest M.
Welsch, Roy E
Yaffe, Michael B
Lauffenburger, Douglas A
MCAM: Multiple Clustering Analysis Methodology for Deriving Hypotheses and Insights from High-Throughput Proteomic Datasets
title MCAM: Multiple Clustering Analysis Methodology for Deriving Hypotheses and Insights from High-Throughput Proteomic Datasets
title_full MCAM: Multiple Clustering Analysis Methodology for Deriving Hypotheses and Insights from High-Throughput Proteomic Datasets
title_fullStr MCAM: Multiple Clustering Analysis Methodology for Deriving Hypotheses and Insights from High-Throughput Proteomic Datasets
title_full_unstemmed MCAM: Multiple Clustering Analysis Methodology for Deriving Hypotheses and Insights from High-Throughput Proteomic Datasets
title_short MCAM: Multiple Clustering Analysis Methodology for Deriving Hypotheses and Insights from High-Throughput Proteomic Datasets
title_sort mcam multiple clustering analysis methodology for deriving hypotheses and insights from high throughput proteomic datasets
url http://hdl.handle.net/1721.1/66157
https://orcid.org/0000-0002-1545-1651
https://orcid.org/0000-0002-9038-1622
https://orcid.org/0000-0002-9547-3251
work_keys_str_mv AT naeglekristenmarie mcammultipleclusteringanalysismethodologyforderivinghypothesesandinsightsfromhighthroughputproteomicdatasets
AT whiteforestm mcammultipleclusteringanalysismethodologyforderivinghypothesesandinsightsfromhighthroughputproteomicdatasets
AT welschroye mcammultipleclusteringanalysismethodologyforderivinghypothesesandinsightsfromhighthroughputproteomicdatasets
AT yaffemichaelb mcammultipleclusteringanalysismethodologyforderivinghypothesesandinsightsfromhighthroughputproteomicdatasets
AT lauffenburgerdouglasa mcammultipleclusteringanalysismethodologyforderivinghypothesesandinsightsfromhighthroughputproteomicdatasets