Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study

Integrating gene-level data is useful for predicting the role of genes in biological processes. This problem has typically focused on supervised classification, which requires large training sets of positive and negative examples. However, training data sets that are too small for supervised approac...

Full description

Bibliographic Details
Main Authors: Michael W. Daniels, Daniel Dvorkin, Rani K. Powers, Katerina Kechris
Format: Article
Language:English
Published: MDPI AG 2021-05-01
Series:Mathematical and Computational Applications
Subjects:
Online Access:https://www.mdpi.com/2297-8747/26/2/40
_version_ 1797533650393235456
author Michael W. Daniels
Daniel Dvorkin
Rani K. Powers
Katerina Kechris
author_facet Michael W. Daniels
Daniel Dvorkin
Rani K. Powers
Katerina Kechris
author_sort Michael W. Daniels
collection DOAJ
description Integrating gene-level data is useful for predicting the role of genes in biological processes. This problem has typically focused on supervised classification, which requires large training sets of positive and negative examples. However, training data sets that are too small for supervised approaches can still provide valuable information. We describe a hierarchical mixture model that uses limited positively labeled gene training data for semi-supervised learning. We focus on the problem of predicting essential genes, where a gene is required for the survival of an organism under particular conditions. We applied cross-validation and found that the inclusion of positively labeled samples in a semi-supervised learning framework with the hierarchical mixture model improves the detection of essential genes compared to unsupervised, supervised, and other semi-supervised approaches. There was also improved prediction performance when genes are incorrectly assumed to be non-essential. Our comparisons indicate that the incorporation of even small amounts of existing knowledge improves the accuracy of prediction and decreases variability in predictions. Although we focused on gene essentiality, the hierarchical mixture model and semi-supervised framework is standard for problems focused on prediction of genes or other features, with multiple data types characterizing the feature, and a small set of positive labels.
first_indexed 2024-03-10T11:18:37Z
format Article
id doaj.art-f5ff1288fff2408ca294074f36b51fee
institution Directory Open Access Journal
issn 1300-686X
2297-8747
language English
last_indexed 2024-03-10T11:18:37Z
publishDate 2021-05-01
publisher MDPI AG
record_format Article
series Mathematical and Computational Applications
spelling doaj.art-f5ff1288fff2408ca294074f36b51fee2023-11-21T20:16:28ZengMDPI AGMathematical and Computational Applications1300-686X2297-87472021-05-012624010.3390/mca26020040Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case StudyMichael W. Daniels0Daniel Dvorkin1Rani K. Powers2Katerina Kechris3Department of Bioinformatics and Biostatistics, School of Public Health and Information Sciences, University of Louisville, Louisville, KY 40202, USAThe Bioinformatics CRO, Inc., Niceville, FL 32578, USAWyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02155, USADepartment of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO 80045, USAIntegrating gene-level data is useful for predicting the role of genes in biological processes. This problem has typically focused on supervised classification, which requires large training sets of positive and negative examples. However, training data sets that are too small for supervised approaches can still provide valuable information. We describe a hierarchical mixture model that uses limited positively labeled gene training data for semi-supervised learning. We focus on the problem of predicting essential genes, where a gene is required for the survival of an organism under particular conditions. We applied cross-validation and found that the inclusion of positively labeled samples in a semi-supervised learning framework with the hierarchical mixture model improves the detection of essential genes compared to unsupervised, supervised, and other semi-supervised approaches. There was also improved prediction performance when genes are incorrectly assumed to be non-essential. Our comparisons indicate that the incorporation of even small amounts of existing knowledge improves the accuracy of prediction and decreases variability in predictions. Although we focused on gene essentiality, the hierarchical mixture model and semi-supervised framework is standard for problems focused on prediction of genes or other features, with multiple data types characterizing the feature, and a small set of positive labels.https://www.mdpi.com/2297-8747/26/2/40semi-supervisedhierarchical mixture modelsessential genesgenomicintegration
spellingShingle Michael W. Daniels
Daniel Dvorkin
Rani K. Powers
Katerina Kechris
Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study
Mathematical and Computational Applications
semi-supervised
hierarchical mixture models
essential genes
genomic
integration
title Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study
title_full Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study
title_fullStr Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study
title_full_unstemmed Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study
title_short Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study
title_sort semi supervised learning using hierarchical mixture models gene essentiality case study
topic semi-supervised
hierarchical mixture models
essential genes
genomic
integration
url https://www.mdpi.com/2297-8747/26/2/40
work_keys_str_mv AT michaelwdaniels semisupervisedlearningusinghierarchicalmixturemodelsgeneessentialitycasestudy
AT danieldvorkin semisupervisedlearningusinghierarchicalmixturemodelsgeneessentialitycasestudy
AT ranikpowers semisupervisedlearningusinghierarchicalmixturemodelsgeneessentialitycasestudy
AT katerinakechris semisupervisedlearningusinghierarchicalmixturemodelsgeneessentialitycasestudy