Characterizing Human Cell Types and Tissue Origin Using the Benford Law
Processing massive transcriptomic datasets in a meaningful manner requires novel, possibly interdisciplinary, approaches. One principle that can address this challenge is the Benford law (BL), which posits that the occurrence probability of a leading digit in a large numerical dataset decreases as i...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2019-08-01
|
Series: | Cells |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-4409/8/9/1004 |
_version_ | 1797724442075332608 |
---|---|
author | Sne Morag Mali Salmon-Divon |
author_facet | Sne Morag Mali Salmon-Divon |
author_sort | Sne Morag |
collection | DOAJ |
description | Processing massive transcriptomic datasets in a meaningful manner requires novel, possibly interdisciplinary, approaches. One principle that can address this challenge is the Benford law (BL), which posits that the occurrence probability of a leading digit in a large numerical dataset decreases as its value increases. Here, we analyzed large single-cell and bulk RNA-seq datasets to test whether cell types and tissue origins can be differentiated based on the adherence of specific genes to the BL. Then, we used the Benford adherence scores of these genes as inputs to machine-learning algorithms and tested their separation accuracy. We found that genes selected based on their first-digit distributions can distinguish between cell types and tissue origins. Moreover, despite the simplicity of this novel feature-selection method, its separation accuracy is higher than that of the mean-expression level approach and is similar to that of the differential expression approach. Thus, the BL can be used to obtain biological insights from massive amounts of numerical genomics data—a capability that could be utilized in various biomedical applications, e.g., to resolve samples of unknown primary origin, identify possible sample contaminations, and provide insights into the molecular basis of cancer subtypes. |
first_indexed | 2024-03-12T10:17:18Z |
format | Article |
id | doaj.art-fe0b26d6de0f44dca603aba3984373e4 |
institution | Directory Open Access Journal |
issn | 2073-4409 |
language | English |
last_indexed | 2024-03-12T10:17:18Z |
publishDate | 2019-08-01 |
publisher | MDPI AG |
record_format | Article |
series | Cells |
spelling | doaj.art-fe0b26d6de0f44dca603aba3984373e42023-09-02T10:24:43ZengMDPI AGCells2073-44092019-08-0189100410.3390/cells8091004cells8091004Characterizing Human Cell Types and Tissue Origin Using the Benford LawSne Morag0Mali Salmon-Divon1Department of Molecular Biology, Faculty of Life Sciences, Ariel University, Ariel 40700, IsraelDepartment of Molecular Biology, Faculty of Life Sciences, Ariel University, Ariel 40700, IsraelProcessing massive transcriptomic datasets in a meaningful manner requires novel, possibly interdisciplinary, approaches. One principle that can address this challenge is the Benford law (BL), which posits that the occurrence probability of a leading digit in a large numerical dataset decreases as its value increases. Here, we analyzed large single-cell and bulk RNA-seq datasets to test whether cell types and tissue origins can be differentiated based on the adherence of specific genes to the BL. Then, we used the Benford adherence scores of these genes as inputs to machine-learning algorithms and tested their separation accuracy. We found that genes selected based on their first-digit distributions can distinguish between cell types and tissue origins. Moreover, despite the simplicity of this novel feature-selection method, its separation accuracy is higher than that of the mean-expression level approach and is similar to that of the differential expression approach. Thus, the BL can be used to obtain biological insights from massive amounts of numerical genomics data—a capability that could be utilized in various biomedical applications, e.g., to resolve samples of unknown primary origin, identify possible sample contaminations, and provide insights into the molecular basis of cancer subtypes.https://www.mdpi.com/2073-4409/8/9/1004single-cell RNA sequencingBenford lawBenford distributioncell classificationmachine learning |
spellingShingle | Sne Morag Mali Salmon-Divon Characterizing Human Cell Types and Tissue Origin Using the Benford Law Cells single-cell RNA sequencing Benford law Benford distribution cell classification machine learning |
title | Characterizing Human Cell Types and Tissue Origin Using the Benford Law |
title_full | Characterizing Human Cell Types and Tissue Origin Using the Benford Law |
title_fullStr | Characterizing Human Cell Types and Tissue Origin Using the Benford Law |
title_full_unstemmed | Characterizing Human Cell Types and Tissue Origin Using the Benford Law |
title_short | Characterizing Human Cell Types and Tissue Origin Using the Benford Law |
title_sort | characterizing human cell types and tissue origin using the benford law |
topic | single-cell RNA sequencing Benford law Benford distribution cell classification machine learning |
url | https://www.mdpi.com/2073-4409/8/9/1004 |
work_keys_str_mv | AT snemorag characterizinghumancelltypesandtissueoriginusingthebenfordlaw AT malisalmondivon characterizinghumancelltypesandtissueoriginusingthebenfordlaw |