A Unified View of Protein Low-complexity Regions (LCRs) Across Species

Low-complexity regions (LCRs) in proteins play a role in a variety of important cellular processes, dispersed across different fields in biology such as transcription, extracellular structure, and stress response. LCRs have been shown to vary in amino acid composition and structure, and can act as i...

Full description

Bibliographic Details
Main Author: Lee, Byron
Other Authors: Calo, Eliezer
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/150241
https://orcid.org/0000-0001-7132-2662
Description
Summary:Low-complexity regions (LCRs) in proteins play a role in a variety of important cellular processes, dispersed across different fields in biology such as transcription, extracellular structure, and stress response. LCRs have been shown to vary in amino acid composition and structure, and can act as interacting domains capable of forming phase-separated higher-order assemblies. However, we lack a unified view of LCRs that incorporates all of the information in their sequences, features, relationships, and functions. In this thesis, I present a unified view of LCRs by 1) co-developing a framework based on the features and relationships of LCRs which are important in their roles as versatile interacting and phase-separating domains and 2) seeing whether this framework may provide a more general understanding of the functions of LCRs in proteins. Using the systematic dotplot matrix approach that we developed, we define LCR type/copy relationships for proteins across the proteome. Based on these definitions, we show the importance of K-rich LCR copy number for the RNA polymerase I subunit RPA43 for both assembly in vitro and localization in cells, demonstrating how principles of LCR copy number can relate these two processes. Moreover, by mapping regions of LCR sequence space to higher-order assemblies, such as the nucleolus, metazoan extracellular matrix and plant cell wall, we relate LCR functions across different fields and suggest that LCR functions may be unified in their roles in higher-order assemblies. Using this unified view, we uncover scaffold-client relationships among E-rich LCR-containing proteins in the nucleolus and discover TCOF1 as a self-assembling scaffold of the nucleolar fibrillar center. We go on to uncover previously undescribed regions of LCR sequence space with signatures of higher-order assemblies, including a teleost-specific T/H-rich sequence space. Thus, this work provides a framework that can unify the disparate functions of LCRs and enables discovery of how LCRs encode higher-order assemblies of organisms.