Complet+: a computationally scalable method to improve completeness of large-scale protein sequence clustering
A major challenge for clustering algorithms is to balance the trade-off between homogeneity, i.e., the degree to which an individual cluster includes only related sequences, and completeness, the degree to which related sequences are broken up into multiple clusters. Most algorithms are conservative...
Main Authors: | Rachel Nguyen, Bahrad A. Sokhansanj, Robi Polikar, Gail L. Rosen |
---|---|
Format: | Article |
Language: | English |
Published: |
PeerJ Inc.
2023-02-01
|
Series: | PeerJ |
Subjects: | |
Online Access: | https://peerj.com/articles/14779.pdf |
Similar Items
-
Alignment-free clustering of large data sets of unannotated protein conserved regions using minhashing
by: Armen Abnousi, et al.
Published: (2018-03-01) -
Update on genome completion and annotations: Protein Information Resource
by: Wu Cathy, et al.
Published: (2004-03-01) -
Interpretable and Predictive Deep Neural Network Modeling of the SARS-CoV-2 Spike Protein Sequence to Predict COVID-19 Disease Severity
by: Bahrad A. Sokhansanj, et al.
Published: (2022-12-01) -
Predicting Institution Outcomes for Inter Partes Review (IPR) Proceedings at the United States Patent Trial & Appeal Board by Deep Learning of Patent Owner Preliminary Response Briefs
by: Bahrad A. Sokhansanj, et al.
Published: (2022-04-01) -
Improvements in viral gene annotation using large language models and soft alignments
by: William L. Harrigan, et al.
Published: (2024-04-01)