Identification of biomarkers predictive of metastasis development in early-stage colorectal cancer using network-based regularization
Abstract Colorectal cancer (CRC) is the third most common cancer and the second most deathly worldwide. It is a very heterogeneous disease that can develop via distinct pathways where metastasis is the primary cause of death. Therefore, it is crucial to understand the molecular mechanisms underlying...
Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2023-01-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12859-022-05104-z |
_version_ | 1797945649097867264 |
---|---|
author | Carolina Peixoto Marta B. Lopes Marta Martins Sandra Casimiro Daniel Sobral Ana Rita Grosso Catarina Abreu Daniela Macedo Ana Lúcia Costa Helena Pais Cecília Alvim André Mansinho Pedro Filipe Pedro Marques da Costa Afonso Fernandes Paula Borralho Cristina Ferreira João Malaquias António Quintela Shannon Kaplan Mahdi Golkaram Michael Salmans Nafeesa Khan Raakhee Vijayaraghavan Shile Zhang Traci Pawlowski Jim Godsey Alex So Li Liu Luís Costa Susana Vinga |
author_facet | Carolina Peixoto Marta B. Lopes Marta Martins Sandra Casimiro Daniel Sobral Ana Rita Grosso Catarina Abreu Daniela Macedo Ana Lúcia Costa Helena Pais Cecília Alvim André Mansinho Pedro Filipe Pedro Marques da Costa Afonso Fernandes Paula Borralho Cristina Ferreira João Malaquias António Quintela Shannon Kaplan Mahdi Golkaram Michael Salmans Nafeesa Khan Raakhee Vijayaraghavan Shile Zhang Traci Pawlowski Jim Godsey Alex So Li Liu Luís Costa Susana Vinga |
author_sort | Carolina Peixoto |
collection | DOAJ |
description | Abstract Colorectal cancer (CRC) is the third most common cancer and the second most deathly worldwide. It is a very heterogeneous disease that can develop via distinct pathways where metastasis is the primary cause of death. Therefore, it is crucial to understand the molecular mechanisms underlying metastasis. RNA-sequencing is an essential tool used for studying the transcriptional landscape. However, the high-dimensionality of gene expression data makes selecting novel metastatic biomarkers problematic. To distinguish early-stage CRC patients at risk of developing metastasis from those that are not, three types of binary classification approaches were used: (1) classification methods (decision trees, linear and radial kernel support vector machines, logistic regression, and random forest) using differentially expressed genes (DEGs) as input features; (2) regularized logistic regression based on the Elastic Net penalty and the proposed iTwiner—a network-based regularizer accounting for gene correlation information; and (3) classification methods based on the genes pre-selected using regularized logistic regression. Classifiers using the DEGs as features showed similar results, with random forest showing the highest accuracy. Using regularized logistic regression on the full dataset yielded no improvement in the methods’ accuracy. Further classification using the pre-selected genes found by different penalty factors, instead of the DEGs, significantly improved the accuracy of the binary classifiers. Moreover, the use of network-based correlation information (iTwiner) for gene selection produced the best classification results and the identification of more stable and robust gene sets. Some are known to be tumor suppressor genes (OPCML-IT2), to be related to resistance to cancer therapies (RAC1P3), or to be involved in several cancer processes such as genome stability (XRCC6P2), tumor growth and metastasis (MIR602) and regulation of gene transcription (NME2P2). We show that the classification of CRC patients based on pre-selected features by regularized logistic regression is a valuable alternative to using DEGs, significantly increasing the models’ predictive performance. Moreover, the use of correlation-based penalization for biomarker selection stands as a promising strategy for predicting patients’ groups based on RNA-seq data. |
first_indexed | 2024-04-10T20:58:27Z |
format | Article |
id | doaj.art-0d6a3817dc23495ca7e6cf3f2ada3dd4 |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-04-10T20:58:27Z |
publishDate | 2023-01-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-0d6a3817dc23495ca7e6cf3f2ada3dd42023-01-22T12:27:00ZengBMCBMC Bioinformatics1471-21052023-01-0124112310.1186/s12859-022-05104-zIdentification of biomarkers predictive of metastasis development in early-stage colorectal cancer using network-based regularizationCarolina Peixoto0Marta B. Lopes1Marta Martins2Sandra Casimiro3Daniel Sobral4Ana Rita Grosso5Catarina Abreu6Daniela Macedo7Ana Lúcia Costa8Helena Pais9Cecília Alvim10André Mansinho11Pedro Filipe12Pedro Marques da Costa13Afonso Fernandes14Paula Borralho15Cristina Ferreira16João Malaquias17António Quintela18Shannon Kaplan19Mahdi Golkaram20Michael Salmans21Nafeesa Khan22Raakhee Vijayaraghavan23Shile Zhang24Traci Pawlowski25Jim Godsey26Alex So27Li Liu28Luís Costa29Susana Vinga30INESC-ID, Instituto Superior Técnico, Universidade de LisboaNOVA Laboratory for Computer Science and Informatics (NOVA LINCS), NOVA School of Science and TechnologyInstituto de Medicina Molecular - João Lobo Antunes, Faculdade de Medicina de LisboaInstituto de Medicina Molecular - João Lobo Antunes, Faculdade de Medicina de LisboaAssociate Laboratory i4HB - Institute for Health and Bioeconomy, NOVA School of Science and Technology, Universidade NOVA de LisboaAssociate Laboratory i4HB - Institute for Health and Bioeconomy, NOVA School of Science and Technology, Universidade NOVA de LisboaOncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa NorteOncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa NorteOncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa NorteOncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa NorteOncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa NorteInstituto de Medicina Molecular - João Lobo Antunes, Faculdade de Medicina de LisboaOncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa NorteOncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa NorteInstituto de Medicina Molecular - João Lobo Antunes, Faculdade de Medicina de LisboaInstituto de Medicina Molecular - João Lobo Antunes, Faculdade de Medicina de LisboaOncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa NorteOncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa NorteOncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa NorteIllumina Inc.Illumina Inc.Illumina Inc.Illumina Inc.Illumina Inc.Illumina Inc.Illumina Inc.Illumina Inc.Illumina Inc.Illumina Inc.Instituto de Medicina Molecular - João Lobo Antunes, Faculdade de Medicina de LisboaINESC-ID, Instituto Superior Técnico, Universidade de LisboaAbstract Colorectal cancer (CRC) is the third most common cancer and the second most deathly worldwide. It is a very heterogeneous disease that can develop via distinct pathways where metastasis is the primary cause of death. Therefore, it is crucial to understand the molecular mechanisms underlying metastasis. RNA-sequencing is an essential tool used for studying the transcriptional landscape. However, the high-dimensionality of gene expression data makes selecting novel metastatic biomarkers problematic. To distinguish early-stage CRC patients at risk of developing metastasis from those that are not, three types of binary classification approaches were used: (1) classification methods (decision trees, linear and radial kernel support vector machines, logistic regression, and random forest) using differentially expressed genes (DEGs) as input features; (2) regularized logistic regression based on the Elastic Net penalty and the proposed iTwiner—a network-based regularizer accounting for gene correlation information; and (3) classification methods based on the genes pre-selected using regularized logistic regression. Classifiers using the DEGs as features showed similar results, with random forest showing the highest accuracy. Using regularized logistic regression on the full dataset yielded no improvement in the methods’ accuracy. Further classification using the pre-selected genes found by different penalty factors, instead of the DEGs, significantly improved the accuracy of the binary classifiers. Moreover, the use of network-based correlation information (iTwiner) for gene selection produced the best classification results and the identification of more stable and robust gene sets. Some are known to be tumor suppressor genes (OPCML-IT2), to be related to resistance to cancer therapies (RAC1P3), or to be involved in several cancer processes such as genome stability (XRCC6P2), tumor growth and metastasis (MIR602) and regulation of gene transcription (NME2P2). We show that the classification of CRC patients based on pre-selected features by regularized logistic regression is a valuable alternative to using DEGs, significantly increasing the models’ predictive performance. Moreover, the use of correlation-based penalization for biomarker selection stands as a promising strategy for predicting patients’ groups based on RNA-seq data.https://doi.org/10.1186/s12859-022-05104-zColorectal cancerClassificationBiomarker selectionRegularizationiTwiner |
spellingShingle | Carolina Peixoto Marta B. Lopes Marta Martins Sandra Casimiro Daniel Sobral Ana Rita Grosso Catarina Abreu Daniela Macedo Ana Lúcia Costa Helena Pais Cecília Alvim André Mansinho Pedro Filipe Pedro Marques da Costa Afonso Fernandes Paula Borralho Cristina Ferreira João Malaquias António Quintela Shannon Kaplan Mahdi Golkaram Michael Salmans Nafeesa Khan Raakhee Vijayaraghavan Shile Zhang Traci Pawlowski Jim Godsey Alex So Li Liu Luís Costa Susana Vinga Identification of biomarkers predictive of metastasis development in early-stage colorectal cancer using network-based regularization BMC Bioinformatics Colorectal cancer Classification Biomarker selection Regularization iTwiner |
title | Identification of biomarkers predictive of metastasis development in early-stage colorectal cancer using network-based regularization |
title_full | Identification of biomarkers predictive of metastasis development in early-stage colorectal cancer using network-based regularization |
title_fullStr | Identification of biomarkers predictive of metastasis development in early-stage colorectal cancer using network-based regularization |
title_full_unstemmed | Identification of biomarkers predictive of metastasis development in early-stage colorectal cancer using network-based regularization |
title_short | Identification of biomarkers predictive of metastasis development in early-stage colorectal cancer using network-based regularization |
title_sort | identification of biomarkers predictive of metastasis development in early stage colorectal cancer using network based regularization |
topic | Colorectal cancer Classification Biomarker selection Regularization iTwiner |
url | https://doi.org/10.1186/s12859-022-05104-z |
work_keys_str_mv | AT carolinapeixoto identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT martablopes identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT martamartins identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT sandracasimiro identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT danielsobral identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT anaritagrosso identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT catarinaabreu identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT danielamacedo identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT analuciacosta identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT helenapais identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT ceciliaalvim identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT andremansinho identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT pedrofilipe identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT pedromarquesdacosta identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT afonsofernandes identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT paulaborralho identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT cristinaferreira identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT joaomalaquias identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT antonioquintela identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT shannonkaplan identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT mahdigolkaram identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT michaelsalmans identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT nafeesakhan identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT raakheevijayaraghavan identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT shilezhang identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT tracipawlowski identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT jimgodsey identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT alexso identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT liliu identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT luiscosta identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization AT susanavinga identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization |