Partial label learning for automated classification of single-cell transcriptomic profiles.

Single-cell RNA sequencing (scRNASeq) data plays a major role in advancing our understanding of developmental biology. An important current question is how to classify transcriptomic profiles obtained from scRNASeq experiments into the various cell types and identify the lineage relationship for ind...

Full description

Bibliographic Details
Main Authors: Malek Senoussi, Thierry Artieres, Paul Villoutreix
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2024-04-01
Series:PLoS Computational Biology
Online Access:https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1012006&type=printable
_version_ 1797200761917014016
author Malek Senoussi
Thierry Artieres
Paul Villoutreix
author_facet Malek Senoussi
Thierry Artieres
Paul Villoutreix
author_sort Malek Senoussi
collection DOAJ
description Single-cell RNA sequencing (scRNASeq) data plays a major role in advancing our understanding of developmental biology. An important current question is how to classify transcriptomic profiles obtained from scRNASeq experiments into the various cell types and identify the lineage relationship for individual cells. Because of the fast accumulation of datasets and the high dimensionality of the data, it has become challenging to explore and annotate single-cell transcriptomic profiles by hand. To overcome this challenge, automated classification methods are needed. Classical approaches rely on supervised training datasets. However, due to the difficulty of obtaining data annotated at single-cell resolution, we propose instead to take advantage of partial annotations. The partial label learning framework assumes that we can obtain a set of candidate labels containing the correct one for each data point, a simpler setting than requiring a fully supervised training dataset. We study and extend when needed state-of-the-art multi-class classification methods, such as SVM, kNN, prototype-based, logistic regression and ensemble methods, to the partial label learning framework. Moreover, we study the effect of incorporating the structure of the label set into the methods. We focus particularly on the hierarchical structure of the labels, as commonly observed in developmental processes. We show, on simulated and real datasets, that these extensions enable to learn from partially labeled data, and perform predictions with high accuracy, particularly with a nonlinear prototype-based method. We demonstrate that the performances of our methods trained with partially annotated data reach the same performance as fully supervised data. Finally, we study the level of uncertainty present in the partially annotated data, and derive some prescriptive results on the effect of this uncertainty on the accuracy of the partial label learning methods. Overall our findings show how hierarchical and non-hierarchical partial label learning strategies can help solve the problem of automated classification of single-cell transcriptomic profiles, interestingly these methods rely on a much less stringent type of annotated datasets compared to fully supervised learning methods.
first_indexed 2024-04-24T07:36:48Z
format Article
id doaj.art-dfa1a00e5afd4346b4bbafaa59159d9b
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-04-24T07:36:48Z
publishDate 2024-04-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-dfa1a00e5afd4346b4bbafaa59159d9b2024-04-20T05:31:09ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582024-04-01204e101200610.1371/journal.pcbi.1012006Partial label learning for automated classification of single-cell transcriptomic profiles.Malek SenoussiThierry ArtieresPaul VilloutreixSingle-cell RNA sequencing (scRNASeq) data plays a major role in advancing our understanding of developmental biology. An important current question is how to classify transcriptomic profiles obtained from scRNASeq experiments into the various cell types and identify the lineage relationship for individual cells. Because of the fast accumulation of datasets and the high dimensionality of the data, it has become challenging to explore and annotate single-cell transcriptomic profiles by hand. To overcome this challenge, automated classification methods are needed. Classical approaches rely on supervised training datasets. However, due to the difficulty of obtaining data annotated at single-cell resolution, we propose instead to take advantage of partial annotations. The partial label learning framework assumes that we can obtain a set of candidate labels containing the correct one for each data point, a simpler setting than requiring a fully supervised training dataset. We study and extend when needed state-of-the-art multi-class classification methods, such as SVM, kNN, prototype-based, logistic regression and ensemble methods, to the partial label learning framework. Moreover, we study the effect of incorporating the structure of the label set into the methods. We focus particularly on the hierarchical structure of the labels, as commonly observed in developmental processes. We show, on simulated and real datasets, that these extensions enable to learn from partially labeled data, and perform predictions with high accuracy, particularly with a nonlinear prototype-based method. We demonstrate that the performances of our methods trained with partially annotated data reach the same performance as fully supervised data. Finally, we study the level of uncertainty present in the partially annotated data, and derive some prescriptive results on the effect of this uncertainty on the accuracy of the partial label learning methods. Overall our findings show how hierarchical and non-hierarchical partial label learning strategies can help solve the problem of automated classification of single-cell transcriptomic profiles, interestingly these methods rely on a much less stringent type of annotated datasets compared to fully supervised learning methods.https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1012006&type=printable
spellingShingle Malek Senoussi
Thierry Artieres
Paul Villoutreix
Partial label learning for automated classification of single-cell transcriptomic profiles.
PLoS Computational Biology
title Partial label learning for automated classification of single-cell transcriptomic profiles.
title_full Partial label learning for automated classification of single-cell transcriptomic profiles.
title_fullStr Partial label learning for automated classification of single-cell transcriptomic profiles.
title_full_unstemmed Partial label learning for automated classification of single-cell transcriptomic profiles.
title_short Partial label learning for automated classification of single-cell transcriptomic profiles.
title_sort partial label learning for automated classification of single cell transcriptomic profiles
url https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1012006&type=printable
work_keys_str_mv AT maleksenoussi partiallabellearningforautomatedclassificationofsinglecelltranscriptomicprofiles
AT thierryartieres partiallabellearningforautomatedclassificationofsinglecelltranscriptomicprofiles
AT paulvilloutreix partiallabellearningforautomatedclassificationofsinglecelltranscriptomicprofiles