Constructing benchmark test sets for biological sequence analysis using independent set algorithms.

Biological sequence families contain many sequences that are very similar to each other because they are related by evolution, so the strategy for splitting data into separate training and test sets is a nontrivial choice in benchmarking sequence analysis methods. A random split is insufficient beca...

Full description

Bibliographic Details
Main Authors: Samantha Petti, Sean R Eddy
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2022-03-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1009492