Exploration and validation of key genes associated with early lymph node metastasis in thyroid carcinoma using weighted gene co-expression network analysis and machine learning

BackgroundThyroid carcinoma (THCA), the most common endocrine neoplasm, typically exhibits an indolent behavior. However, in some instances, lymph node metastasis (LNM) may occur in the early stages, with the underlying mechanisms not yet fully understood.Materials and methodsLNM potential was defin...

Full description

Bibliographic Details
Main Authors: Yanyan Liu, Zhenglang Yin, Yao Wang, Haohao Chen
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-12-01
Series:Frontiers in Endocrinology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fendo.2023.1247709/full
_version_ 1797400689287102464
author Yanyan Liu
Zhenglang Yin
Yao Wang
Haohao Chen
author_facet Yanyan Liu
Zhenglang Yin
Yao Wang
Haohao Chen
author_sort Yanyan Liu
collection DOAJ
description BackgroundThyroid carcinoma (THCA), the most common endocrine neoplasm, typically exhibits an indolent behavior. However, in some instances, lymph node metastasis (LNM) may occur in the early stages, with the underlying mechanisms not yet fully understood.Materials and methodsLNM potential was defined as the tumor’s capability to metastasize to lymph nodes at an early stage, even when the tumor volume is small. We performed differential expression analysis using the ‘Limma’ R package and conducted enrichment analyses using the Metascape tool. Co-expression networks were established using the ‘WGCNA’ R package, with the soft threshold power determined by the ‘pickSoftThreshold’ algorithm. For unsupervised clustering, we utilized the ‘ConsensusCluster Plus’ R package. To determine the topological features and degree centralities of each node (protein) within the Protein-Protein Interaction (PPI) network, we used the CytoNCA plugin integrated with the Cytoscape tool. Immune cell infiltration was assessed using the Immune Cell Abundance Identifier (ImmuCellAI) database. We applied the Least Absolute Shrinkage and Selection Operator (LASSO), Support Vector Machine (SVM), and Random Forest (RF) algorithms individually, with the ‘glmnet,’ ‘e1071,’ and ‘randomForest’ R packages, respectively. Ridge regression was performed using the ‘oncoPredict’ algorithm, and all the predictions were based on data from the Genomics of Drug Sensitivity in Cancer (GDSC) database. To ascertain the protein expression levels and subcellular localization of genes, we consulted the Human Protein Atlas (HPA) database. Molecular docking was carried out using the mcule 1-click Docking server online. Experimental validation of gene and protein expression levels was conducted through Real-Time Quantitative PCR (RT-qPCR) and immunohistochemistry (IHC) assays.ResultsThrough WGCNA and PPI network analysis, we identified twelve hub genes as the most relevant to LNM potential from these two modules. These 12 hub genes displayed differential expression in THCA and exhibited significant correlations with the downregulation of neutrophil infiltration, as well as the upregulation of dendritic cell and macrophage infiltration, along with activation of the EMT pathway in THCA. We propose a novel molecular classification approach and provide an online web-based nomogram for evaluating the LNM potential of THCA (http://www.empowerstats.net/pmodel/?m=17617_LNM). Machine learning algorithms have identified ERBB3 as the most critical gene associated with LNM potential in THCA. ERBB3 exhibits high expression in patients with THCA who have experienced LNM or have advanced-stage disease. The differential methylation levels partially explain this differential expression of ERBB3. ROC analysis has identified ERBB3 as a diagnostic marker for THCA (AUC=0.89), THCA with high LNM potential (AUC=0.75), and lymph nodes with tumor metastasis (AUC=0.86). We have presented a comprehensive review of endocrine disruptor chemical (EDC) exposures, environmental toxins, and pharmacological agents that may potentially impact LNM potential. Molecular docking revealed a docking score of -10.1 kcal/mol for Lapatinib and ERBB3, indicating a strong binding affinity.ConclusionIn conclusion, our study, utilizing bioinformatics analysis techniques, identified gene modules and hub genes influencing LNM potential in THCA patients. ERBB3 was identified as a key gene with therapeutic implications. We have also developed a novel molecular classification approach and a user-friendly web-based nomogram tool for assessing LNM potential. These findings pave the way for investigations into the mechanisms underlying differences in LNM potential and provide guidance for personalized clinical treatment plans.
first_indexed 2024-03-09T01:58:08Z
format Article
id doaj.art-d767d220c58b491caea6395a7297b4aa
institution Directory Open Access Journal
issn 1664-2392
language English
last_indexed 2024-03-09T01:58:08Z
publishDate 2023-12-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Endocrinology
spelling doaj.art-d767d220c58b491caea6395a7297b4aa2023-12-08T12:27:40ZengFrontiers Media S.A.Frontiers in Endocrinology1664-23922023-12-011410.3389/fendo.2023.12477091247709Exploration and validation of key genes associated with early lymph node metastasis in thyroid carcinoma using weighted gene co-expression network analysis and machine learningYanyan Liu0Zhenglang Yin1Yao Wang2Haohao Chen3Department of General Surgery, The Third Affiliated Hospital of Anhui Medical University (The First People’s Hospital of Hefei), Hefei, Anhui, ChinaDepartment of General Surgery, The Third Affiliated Hospital of Anhui Medical University (The First People’s Hospital of Hefei), Hefei, Anhui, ChinaDigestive Endoscopy Department, Jiangsu Province Hospital, The First Affiliated Hospital with Nanjing Medical University, Nanjing, Jiangsu, ChinaDepartment of General Surgery, The Third Affiliated Hospital of Anhui Medical University (The First People’s Hospital of Hefei), Hefei, Anhui, ChinaBackgroundThyroid carcinoma (THCA), the most common endocrine neoplasm, typically exhibits an indolent behavior. However, in some instances, lymph node metastasis (LNM) may occur in the early stages, with the underlying mechanisms not yet fully understood.Materials and methodsLNM potential was defined as the tumor’s capability to metastasize to lymph nodes at an early stage, even when the tumor volume is small. We performed differential expression analysis using the ‘Limma’ R package and conducted enrichment analyses using the Metascape tool. Co-expression networks were established using the ‘WGCNA’ R package, with the soft threshold power determined by the ‘pickSoftThreshold’ algorithm. For unsupervised clustering, we utilized the ‘ConsensusCluster Plus’ R package. To determine the topological features and degree centralities of each node (protein) within the Protein-Protein Interaction (PPI) network, we used the CytoNCA plugin integrated with the Cytoscape tool. Immune cell infiltration was assessed using the Immune Cell Abundance Identifier (ImmuCellAI) database. We applied the Least Absolute Shrinkage and Selection Operator (LASSO), Support Vector Machine (SVM), and Random Forest (RF) algorithms individually, with the ‘glmnet,’ ‘e1071,’ and ‘randomForest’ R packages, respectively. Ridge regression was performed using the ‘oncoPredict’ algorithm, and all the predictions were based on data from the Genomics of Drug Sensitivity in Cancer (GDSC) database. To ascertain the protein expression levels and subcellular localization of genes, we consulted the Human Protein Atlas (HPA) database. Molecular docking was carried out using the mcule 1-click Docking server online. Experimental validation of gene and protein expression levels was conducted through Real-Time Quantitative PCR (RT-qPCR) and immunohistochemistry (IHC) assays.ResultsThrough WGCNA and PPI network analysis, we identified twelve hub genes as the most relevant to LNM potential from these two modules. These 12 hub genes displayed differential expression in THCA and exhibited significant correlations with the downregulation of neutrophil infiltration, as well as the upregulation of dendritic cell and macrophage infiltration, along with activation of the EMT pathway in THCA. We propose a novel molecular classification approach and provide an online web-based nomogram for evaluating the LNM potential of THCA (http://www.empowerstats.net/pmodel/?m=17617_LNM). Machine learning algorithms have identified ERBB3 as the most critical gene associated with LNM potential in THCA. ERBB3 exhibits high expression in patients with THCA who have experienced LNM or have advanced-stage disease. The differential methylation levels partially explain this differential expression of ERBB3. ROC analysis has identified ERBB3 as a diagnostic marker for THCA (AUC=0.89), THCA with high LNM potential (AUC=0.75), and lymph nodes with tumor metastasis (AUC=0.86). We have presented a comprehensive review of endocrine disruptor chemical (EDC) exposures, environmental toxins, and pharmacological agents that may potentially impact LNM potential. Molecular docking revealed a docking score of -10.1 kcal/mol for Lapatinib and ERBB3, indicating a strong binding affinity.ConclusionIn conclusion, our study, utilizing bioinformatics analysis techniques, identified gene modules and hub genes influencing LNM potential in THCA patients. ERBB3 was identified as a key gene with therapeutic implications. We have also developed a novel molecular classification approach and a user-friendly web-based nomogram tool for assessing LNM potential. These findings pave the way for investigations into the mechanisms underlying differences in LNM potential and provide guidance for personalized clinical treatment plans.https://www.frontiersin.org/articles/10.3389/fendo.2023.1247709/fullthyroid cancerbioinformatics analysisThe Cancer Genome Atlasnomogrammachine learning
spellingShingle Yanyan Liu
Zhenglang Yin
Yao Wang
Haohao Chen
Exploration and validation of key genes associated with early lymph node metastasis in thyroid carcinoma using weighted gene co-expression network analysis and machine learning
Frontiers in Endocrinology
thyroid cancer
bioinformatics analysis
The Cancer Genome Atlas
nomogram
machine learning
title Exploration and validation of key genes associated with early lymph node metastasis in thyroid carcinoma using weighted gene co-expression network analysis and machine learning
title_full Exploration and validation of key genes associated with early lymph node metastasis in thyroid carcinoma using weighted gene co-expression network analysis and machine learning
title_fullStr Exploration and validation of key genes associated with early lymph node metastasis in thyroid carcinoma using weighted gene co-expression network analysis and machine learning
title_full_unstemmed Exploration and validation of key genes associated with early lymph node metastasis in thyroid carcinoma using weighted gene co-expression network analysis and machine learning
title_short Exploration and validation of key genes associated with early lymph node metastasis in thyroid carcinoma using weighted gene co-expression network analysis and machine learning
title_sort exploration and validation of key genes associated with early lymph node metastasis in thyroid carcinoma using weighted gene co expression network analysis and machine learning
topic thyroid cancer
bioinformatics analysis
The Cancer Genome Atlas
nomogram
machine learning
url https://www.frontiersin.org/articles/10.3389/fendo.2023.1247709/full
work_keys_str_mv AT yanyanliu explorationandvalidationofkeygenesassociatedwithearlylymphnodemetastasisinthyroidcarcinomausingweightedgenecoexpressionnetworkanalysisandmachinelearning
AT zhenglangyin explorationandvalidationofkeygenesassociatedwithearlylymphnodemetastasisinthyroidcarcinomausingweightedgenecoexpressionnetworkanalysisandmachinelearning
AT yaowang explorationandvalidationofkeygenesassociatedwithearlylymphnodemetastasisinthyroidcarcinomausingweightedgenecoexpressionnetworkanalysisandmachinelearning
AT haohaochen explorationandvalidationofkeygenesassociatedwithearlylymphnodemetastasisinthyroidcarcinomausingweightedgenecoexpressionnetworkanalysisandmachinelearning