Simple method for cutoff point identification in descriptive high-throughput biological studies

Abstract Background Rapid development of high-throughput omics technologies generates an increasing interest in algorithms for cutoff point identification. Existing cutoff methods and tools identify cutoff points based on an association of continuous variables with another variable, such as phenotyp...

Full description

Bibliographic Details
Main Author: Alexander Suvorov
Format: Article
Language:English
Published: BMC 2022-03-01
Series:BMC Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12864-022-08427-6
_version_ 1818673428141441024
author Alexander Suvorov
author_facet Alexander Suvorov
author_sort Alexander Suvorov
collection DOAJ
description Abstract Background Rapid development of high-throughput omics technologies generates an increasing interest in algorithms for cutoff point identification. Existing cutoff methods and tools identify cutoff points based on an association of continuous variables with another variable, such as phenotype, disease state, or treatment group. These approaches are not applicable for descriptive studies in which continuous variables are reported without known association with any biologically meaningful variables. Results The most common shape of the ranked distribution of continuous variables in high-throughput descriptive studies corresponds to a biphasic curve, where the first phase includes a big number of variables with values slowly growing with rank and the second phase includes a smaller number of variables rapidly growing with rank. This study describes an easy algorithm to identify the boundary between these phases to be used as a cutoff point. Discussion The major assumption of that approach is that a small number of variables with high values dominate the biological system and determine its major processes and functions. This approach was tested on three different datasets: human genes and their expression values in the human cerebral cortex, mammalian genes and their values of sensitivity to chemical exposures, and human proteins and their expression values in the human heart. In every case, the described cutoff identification method produced shortlists of variables (genes, proteins) highly relevant for dominant functions/pathways of the analyzed biological systems. Conclusions The described method for cutoff identification may be used to prioritize variables in descriptive omics studies for a focused functional analysis, in situations where other methods of dichotomization of data are inaccessible.
first_indexed 2024-12-17T07:55:38Z
format Article
id doaj.art-b1bbad66170147bd9b21c4d48f2fb8b2
institution Directory Open Access Journal
issn 1471-2164
language English
last_indexed 2024-12-17T07:55:38Z
publishDate 2022-03-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj.art-b1bbad66170147bd9b21c4d48f2fb8b22022-12-21T21:57:42ZengBMCBMC Genomics1471-21642022-03-012311610.1186/s12864-022-08427-6Simple method for cutoff point identification in descriptive high-throughput biological studiesAlexander Suvorov0Department of Environmental Health Sciences, School of Public Health and Health Sciences, University of MassachusettsAbstract Background Rapid development of high-throughput omics technologies generates an increasing interest in algorithms for cutoff point identification. Existing cutoff methods and tools identify cutoff points based on an association of continuous variables with another variable, such as phenotype, disease state, or treatment group. These approaches are not applicable for descriptive studies in which continuous variables are reported without known association with any biologically meaningful variables. Results The most common shape of the ranked distribution of continuous variables in high-throughput descriptive studies corresponds to a biphasic curve, where the first phase includes a big number of variables with values slowly growing with rank and the second phase includes a smaller number of variables rapidly growing with rank. This study describes an easy algorithm to identify the boundary between these phases to be used as a cutoff point. Discussion The major assumption of that approach is that a small number of variables with high values dominate the biological system and determine its major processes and functions. This approach was tested on three different datasets: human genes and their expression values in the human cerebral cortex, mammalian genes and their values of sensitivity to chemical exposures, and human proteins and their expression values in the human heart. In every case, the described cutoff identification method produced shortlists of variables (genes, proteins) highly relevant for dominant functions/pathways of the analyzed biological systems. Conclusions The described method for cutoff identification may be used to prioritize variables in descriptive omics studies for a focused functional analysis, in situations where other methods of dichotomization of data are inaccessible.https://doi.org/10.1186/s12864-022-08427-6CutoffDichotomizationDescriptive genomicsThreshold-omics
spellingShingle Alexander Suvorov
Simple method for cutoff point identification in descriptive high-throughput biological studies
BMC Genomics
Cutoff
Dichotomization
Descriptive genomics
Threshold
-omics
title Simple method for cutoff point identification in descriptive high-throughput biological studies
title_full Simple method for cutoff point identification in descriptive high-throughput biological studies
title_fullStr Simple method for cutoff point identification in descriptive high-throughput biological studies
title_full_unstemmed Simple method for cutoff point identification in descriptive high-throughput biological studies
title_short Simple method for cutoff point identification in descriptive high-throughput biological studies
title_sort simple method for cutoff point identification in descriptive high throughput biological studies
topic Cutoff
Dichotomization
Descriptive genomics
Threshold
-omics
url https://doi.org/10.1186/s12864-022-08427-6
work_keys_str_mv AT alexandersuvorov simplemethodforcutoffpointidentificationindescriptivehighthroughputbiologicalstudies