Towards definition of an ECM parts list: An advance on GO categories

Those of us interested in the extracellular matrix (ECM) are faced with significant challenges of definition. ECM proteins are large, complex and assembled into crosslinked insoluble matrices. This has meant that defining the biochemical composition of ECMs has been difficult. Nonetheless, protein c...

Full description

Bibliographic Details
Main Authors: Naba, Alexandra, Hoersch, Sebastian, Hynes, Richard O.
Other Authors: Massachusetts Institute of Technology. Department of Biology
Format: Article
Language:en_US
Published: Elsevier B.V. 2017
Online Access:http://hdl.handle.net/1721.1/106801
https://orcid.org/0000-0001-7603-8396
Description
Summary:Those of us interested in the extracellular matrix (ECM) are faced with significant challenges of definition. ECM proteins are large, complex and assembled into crosslinked insoluble matrices. This has meant that defining the biochemical composition of ECMs has been difficult. Nonetheless, protein chemistry and molecular biology have defined many familiar ECM proteins — collagens, proteoglycans, laminins, thrombospondins, tenascins, fibronectins, etc. With the completion of many genomes it should now be possible to develop complete “parts lists” for the ECM. Such lists are needed for analyzing data from “omic” approaches such as expression arrays, latest-generation sequencing and proteomics. These approaches generate long lists and it is typically necessary to extract from those lists the genes/proteins of interest. Anyone who attempts to do this using the commonly used gene ontology (GO) categories soon discovers that they are largely useless for defining ECM proteins. Many ECM proteins are unannotated and those which are, are sorted, with little evidence of logic or consistency, into diverse categories such “extracellular matrix,” “basement membrane,” “cell surface” and many others. The human and mouse orthologs are often found in different categories and attempts to use GO categories to extract a complete list of ECM genes or proteins from a data set are unsatisfactory at best.