Identification of biomarkers predictive of metastasis development in early-stage colorectal cancer using network-based regularization

Abstract Colorectal cancer (CRC) is the third most common cancer and the second most deathly worldwide. It is a very heterogeneous disease that can develop via distinct pathways where metastasis is the primary cause of death. Therefore, it is crucial to understand the molecular mechanisms underlying...

Full description

Bibliographic Details
Main Authors: Carolina Peixoto, Marta B. Lopes, Marta Martins, Sandra Casimiro, Daniel Sobral, Ana Rita Grosso, Catarina Abreu, Daniela Macedo, Ana Lúcia Costa, Helena Pais, Cecília Alvim, André Mansinho, Pedro Filipe, Pedro Marques da Costa, Afonso Fernandes, Paula Borralho, Cristina Ferreira, João Malaquias, António Quintela, Shannon Kaplan, Mahdi Golkaram, Michael Salmans, Nafeesa Khan, Raakhee Vijayaraghavan, Shile Zhang, Traci Pawlowski, Jim Godsey, Alex So, Li Liu, Luís Costa, Susana Vinga
Format: Article
Language:English
Published: BMC 2023-01-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-022-05104-z
_version_ 1797945649097867264
author Carolina Peixoto
Marta B. Lopes
Marta Martins
Sandra Casimiro
Daniel Sobral
Ana Rita Grosso
Catarina Abreu
Daniela Macedo
Ana Lúcia Costa
Helena Pais
Cecília Alvim
André Mansinho
Pedro Filipe
Pedro Marques da Costa
Afonso Fernandes
Paula Borralho
Cristina Ferreira
João Malaquias
António Quintela
Shannon Kaplan
Mahdi Golkaram
Michael Salmans
Nafeesa Khan
Raakhee Vijayaraghavan
Shile Zhang
Traci Pawlowski
Jim Godsey
Alex So
Li Liu
Luís Costa
Susana Vinga
author_facet Carolina Peixoto
Marta B. Lopes
Marta Martins
Sandra Casimiro
Daniel Sobral
Ana Rita Grosso
Catarina Abreu
Daniela Macedo
Ana Lúcia Costa
Helena Pais
Cecília Alvim
André Mansinho
Pedro Filipe
Pedro Marques da Costa
Afonso Fernandes
Paula Borralho
Cristina Ferreira
João Malaquias
António Quintela
Shannon Kaplan
Mahdi Golkaram
Michael Salmans
Nafeesa Khan
Raakhee Vijayaraghavan
Shile Zhang
Traci Pawlowski
Jim Godsey
Alex So
Li Liu
Luís Costa
Susana Vinga
author_sort Carolina Peixoto
collection DOAJ
description Abstract Colorectal cancer (CRC) is the third most common cancer and the second most deathly worldwide. It is a very heterogeneous disease that can develop via distinct pathways where metastasis is the primary cause of death. Therefore, it is crucial to understand the molecular mechanisms underlying metastasis. RNA-sequencing is an essential tool used for studying the transcriptional landscape. However, the high-dimensionality of gene expression data makes selecting novel metastatic biomarkers problematic. To distinguish early-stage CRC patients at risk of developing metastasis from those that are not, three types of binary classification approaches were used: (1) classification methods (decision trees, linear and radial kernel support vector machines, logistic regression, and random forest) using differentially expressed genes (DEGs) as input features; (2) regularized logistic regression based on the Elastic Net penalty and the proposed iTwiner—a network-based regularizer accounting for gene correlation information; and (3) classification methods based on the genes pre-selected using regularized logistic regression. Classifiers using the DEGs as features showed similar results, with random forest showing the highest accuracy. Using regularized logistic regression on the full dataset yielded no improvement in the methods’ accuracy. Further classification using the pre-selected genes found by different penalty factors, instead of the DEGs, significantly improved the accuracy of the binary classifiers. Moreover, the use of network-based correlation information (iTwiner) for gene selection produced the best classification results and the identification of more stable and robust gene sets. Some are known to be tumor suppressor genes (OPCML-IT2), to be related to resistance to cancer therapies (RAC1P3), or to be involved in several cancer processes such as genome stability (XRCC6P2), tumor growth and metastasis (MIR602) and regulation of gene transcription (NME2P2). We show that the classification of CRC patients based on pre-selected features by regularized logistic regression is a valuable alternative to using DEGs, significantly increasing the models’ predictive performance. Moreover, the use of correlation-based penalization for biomarker selection stands as a promising strategy for predicting patients’ groups based on RNA-seq data.
first_indexed 2024-04-10T20:58:27Z
format Article
id doaj.art-0d6a3817dc23495ca7e6cf3f2ada3dd4
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-10T20:58:27Z
publishDate 2023-01-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-0d6a3817dc23495ca7e6cf3f2ada3dd42023-01-22T12:27:00ZengBMCBMC Bioinformatics1471-21052023-01-0124112310.1186/s12859-022-05104-zIdentification of biomarkers predictive of metastasis development in early-stage colorectal cancer using network-based regularizationCarolina Peixoto0Marta B. Lopes1Marta Martins2Sandra Casimiro3Daniel Sobral4Ana Rita Grosso5Catarina Abreu6Daniela Macedo7Ana Lúcia Costa8Helena Pais9Cecília Alvim10André Mansinho11Pedro Filipe12Pedro Marques da Costa13Afonso Fernandes14Paula Borralho15Cristina Ferreira16João Malaquias17António Quintela18Shannon Kaplan19Mahdi Golkaram20Michael Salmans21Nafeesa Khan22Raakhee Vijayaraghavan23Shile Zhang24Traci Pawlowski25Jim Godsey26Alex So27Li Liu28Luís Costa29Susana Vinga30INESC-ID, Instituto Superior Técnico, Universidade de LisboaNOVA Laboratory for Computer Science and Informatics (NOVA LINCS), NOVA School of Science and TechnologyInstituto de Medicina Molecular - João Lobo Antunes, Faculdade de Medicina de LisboaInstituto de Medicina Molecular - João Lobo Antunes, Faculdade de Medicina de LisboaAssociate Laboratory i4HB - Institute for Health and Bioeconomy, NOVA School of Science and Technology, Universidade NOVA de LisboaAssociate Laboratory i4HB - Institute for Health and Bioeconomy, NOVA School of Science and Technology, Universidade NOVA de LisboaOncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa NorteOncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa NorteOncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa NorteOncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa NorteOncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa NorteInstituto de Medicina Molecular - João Lobo Antunes, Faculdade de Medicina de LisboaOncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa NorteOncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa NorteInstituto de Medicina Molecular - João Lobo Antunes, Faculdade de Medicina de LisboaInstituto de Medicina Molecular - João Lobo Antunes, Faculdade de Medicina de LisboaOncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa NorteOncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa NorteOncology Division, Hospital de Santa Maria, Centro Hospitalar Lisboa NorteIllumina Inc.Illumina Inc.Illumina Inc.Illumina Inc.Illumina Inc.Illumina Inc.Illumina Inc.Illumina Inc.Illumina Inc.Illumina Inc.Instituto de Medicina Molecular - João Lobo Antunes, Faculdade de Medicina de LisboaINESC-ID, Instituto Superior Técnico, Universidade de LisboaAbstract Colorectal cancer (CRC) is the third most common cancer and the second most deathly worldwide. It is a very heterogeneous disease that can develop via distinct pathways where metastasis is the primary cause of death. Therefore, it is crucial to understand the molecular mechanisms underlying metastasis. RNA-sequencing is an essential tool used for studying the transcriptional landscape. However, the high-dimensionality of gene expression data makes selecting novel metastatic biomarkers problematic. To distinguish early-stage CRC patients at risk of developing metastasis from those that are not, three types of binary classification approaches were used: (1) classification methods (decision trees, linear and radial kernel support vector machines, logistic regression, and random forest) using differentially expressed genes (DEGs) as input features; (2) regularized logistic regression based on the Elastic Net penalty and the proposed iTwiner—a network-based regularizer accounting for gene correlation information; and (3) classification methods based on the genes pre-selected using regularized logistic regression. Classifiers using the DEGs as features showed similar results, with random forest showing the highest accuracy. Using regularized logistic regression on the full dataset yielded no improvement in the methods’ accuracy. Further classification using the pre-selected genes found by different penalty factors, instead of the DEGs, significantly improved the accuracy of the binary classifiers. Moreover, the use of network-based correlation information (iTwiner) for gene selection produced the best classification results and the identification of more stable and robust gene sets. Some are known to be tumor suppressor genes (OPCML-IT2), to be related to resistance to cancer therapies (RAC1P3), or to be involved in several cancer processes such as genome stability (XRCC6P2), tumor growth and metastasis (MIR602) and regulation of gene transcription (NME2P2). We show that the classification of CRC patients based on pre-selected features by regularized logistic regression is a valuable alternative to using DEGs, significantly increasing the models’ predictive performance. Moreover, the use of correlation-based penalization for biomarker selection stands as a promising strategy for predicting patients’ groups based on RNA-seq data.https://doi.org/10.1186/s12859-022-05104-zColorectal cancerClassificationBiomarker selectionRegularizationiTwiner
spellingShingle Carolina Peixoto
Marta B. Lopes
Marta Martins
Sandra Casimiro
Daniel Sobral
Ana Rita Grosso
Catarina Abreu
Daniela Macedo
Ana Lúcia Costa
Helena Pais
Cecília Alvim
André Mansinho
Pedro Filipe
Pedro Marques da Costa
Afonso Fernandes
Paula Borralho
Cristina Ferreira
João Malaquias
António Quintela
Shannon Kaplan
Mahdi Golkaram
Michael Salmans
Nafeesa Khan
Raakhee Vijayaraghavan
Shile Zhang
Traci Pawlowski
Jim Godsey
Alex So
Li Liu
Luís Costa
Susana Vinga
Identification of biomarkers predictive of metastasis development in early-stage colorectal cancer using network-based regularization
BMC Bioinformatics
Colorectal cancer
Classification
Biomarker selection
Regularization
iTwiner
title Identification of biomarkers predictive of metastasis development in early-stage colorectal cancer using network-based regularization
title_full Identification of biomarkers predictive of metastasis development in early-stage colorectal cancer using network-based regularization
title_fullStr Identification of biomarkers predictive of metastasis development in early-stage colorectal cancer using network-based regularization
title_full_unstemmed Identification of biomarkers predictive of metastasis development in early-stage colorectal cancer using network-based regularization
title_short Identification of biomarkers predictive of metastasis development in early-stage colorectal cancer using network-based regularization
title_sort identification of biomarkers predictive of metastasis development in early stage colorectal cancer using network based regularization
topic Colorectal cancer
Classification
Biomarker selection
Regularization
iTwiner
url https://doi.org/10.1186/s12859-022-05104-z
work_keys_str_mv AT carolinapeixoto identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT martablopes identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT martamartins identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT sandracasimiro identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT danielsobral identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT anaritagrosso identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT catarinaabreu identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT danielamacedo identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT analuciacosta identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT helenapais identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT ceciliaalvim identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT andremansinho identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT pedrofilipe identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT pedromarquesdacosta identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT afonsofernandes identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT paulaborralho identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT cristinaferreira identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT joaomalaquias identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT antonioquintela identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT shannonkaplan identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT mahdigolkaram identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT michaelsalmans identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT nafeesakhan identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT raakheevijayaraghavan identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT shilezhang identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT tracipawlowski identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT jimgodsey identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT alexso identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT liliu identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT luiscosta identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization
AT susanavinga identificationofbiomarkerspredictiveofmetastasisdevelopmentinearlystagecolorectalcancerusingnetworkbasedregularization