Characterizing Human Cell Types and Tissue Origin Using the Benford Law

Processing massive transcriptomic datasets in a meaningful manner requires novel, possibly interdisciplinary, approaches. One principle that can address this challenge is the Benford law (BL), which posits that the occurrence probability of a leading digit in a large numerical dataset decreases as i...

Full description

Bibliographic Details
Main Authors: Sne Morag, Mali Salmon-Divon
Format: Article
Language:English
Published: MDPI AG 2019-08-01
Series:Cells
Subjects:
Online Access:https://www.mdpi.com/2073-4409/8/9/1004
_version_ 1797724442075332608
author Sne Morag
Mali Salmon-Divon
author_facet Sne Morag
Mali Salmon-Divon
author_sort Sne Morag
collection DOAJ
description Processing massive transcriptomic datasets in a meaningful manner requires novel, possibly interdisciplinary, approaches. One principle that can address this challenge is the Benford law (BL), which posits that the occurrence probability of a leading digit in a large numerical dataset decreases as its value increases. Here, we analyzed large single-cell and bulk RNA-seq datasets to test whether cell types and tissue origins can be differentiated based on the adherence of specific genes to the BL. Then, we used the Benford adherence scores of these genes as inputs to machine-learning algorithms and tested their separation accuracy. We found that genes selected based on their first-digit distributions can distinguish between cell types and tissue origins. Moreover, despite the simplicity of this novel feature-selection method, its separation accuracy is higher than that of the mean-expression level approach and is similar to that of the differential expression approach. Thus, the BL can be used to obtain biological insights from massive amounts of numerical genomics data—a capability that could be utilized in various biomedical applications, e.g., to resolve samples of unknown primary origin, identify possible sample contaminations, and provide insights into the molecular basis of cancer subtypes.
first_indexed 2024-03-12T10:17:18Z
format Article
id doaj.art-fe0b26d6de0f44dca603aba3984373e4
institution Directory Open Access Journal
issn 2073-4409
language English
last_indexed 2024-03-12T10:17:18Z
publishDate 2019-08-01
publisher MDPI AG
record_format Article
series Cells
spelling doaj.art-fe0b26d6de0f44dca603aba3984373e42023-09-02T10:24:43ZengMDPI AGCells2073-44092019-08-0189100410.3390/cells8091004cells8091004Characterizing Human Cell Types and Tissue Origin Using the Benford LawSne Morag0Mali Salmon-Divon1Department of Molecular Biology, Faculty of Life Sciences, Ariel University, Ariel 40700, IsraelDepartment of Molecular Biology, Faculty of Life Sciences, Ariel University, Ariel 40700, IsraelProcessing massive transcriptomic datasets in a meaningful manner requires novel, possibly interdisciplinary, approaches. One principle that can address this challenge is the Benford law (BL), which posits that the occurrence probability of a leading digit in a large numerical dataset decreases as its value increases. Here, we analyzed large single-cell and bulk RNA-seq datasets to test whether cell types and tissue origins can be differentiated based on the adherence of specific genes to the BL. Then, we used the Benford adherence scores of these genes as inputs to machine-learning algorithms and tested their separation accuracy. We found that genes selected based on their first-digit distributions can distinguish between cell types and tissue origins. Moreover, despite the simplicity of this novel feature-selection method, its separation accuracy is higher than that of the mean-expression level approach and is similar to that of the differential expression approach. Thus, the BL can be used to obtain biological insights from massive amounts of numerical genomics data—a capability that could be utilized in various biomedical applications, e.g., to resolve samples of unknown primary origin, identify possible sample contaminations, and provide insights into the molecular basis of cancer subtypes.https://www.mdpi.com/2073-4409/8/9/1004single-cell RNA sequencingBenford lawBenford distributioncell classificationmachine learning
spellingShingle Sne Morag
Mali Salmon-Divon
Characterizing Human Cell Types and Tissue Origin Using the Benford Law
Cells
single-cell RNA sequencing
Benford law
Benford distribution
cell classification
machine learning
title Characterizing Human Cell Types and Tissue Origin Using the Benford Law
title_full Characterizing Human Cell Types and Tissue Origin Using the Benford Law
title_fullStr Characterizing Human Cell Types and Tissue Origin Using the Benford Law
title_full_unstemmed Characterizing Human Cell Types and Tissue Origin Using the Benford Law
title_short Characterizing Human Cell Types and Tissue Origin Using the Benford Law
title_sort characterizing human cell types and tissue origin using the benford law
topic single-cell RNA sequencing
Benford law
Benford distribution
cell classification
machine learning
url https://www.mdpi.com/2073-4409/8/9/1004
work_keys_str_mv AT snemorag characterizinghumancelltypesandtissueoriginusingthebenfordlaw
AT malisalmondivon characterizinghumancelltypesandtissueoriginusingthebenfordlaw