Summary: | In this work, we integrated prior knowledge from GWAS studies or gene signatures (curated gene sets from MSigDB and/or GeneSigDB), gene set enrichment analysis (GSEA), and gene/protein network modeling together to identify gene network signatures from microarray data. We demonstrated how to apply this approach into discovering gene network signatures for colorectal cancer (CRC) from microarray datasets at three levels - genome, transcriptome, and proteome. First, we use GSEA to analyze the microarray data through enriching differential genes in different CRC-related gene sets from two publicly-available up-to-date gene set databases - Molecular Signatures Database (MSigDB) and Gene Signatures Database (GeneSigDB). Second, we compare the enriched gene sets through enrichment score (ES), false-discovery rate (FDR) and nominal p-value. Third, we construct an integrated protein-protein interaction (PPI) network through connecting these enriched genes by using a human annotated and predicted protein interaction (HAPPI) database, with a confidence score (CS) labeled for each interaction. Finally, we map differential expression values onto the constructed network to build a comprehensive network model containing visualized genome, transcriptome, and proteome data. The results show that although MSigDB is more suitable for GSEA analysis than GeneSigDB, the integrated PPI network connecting the enriched genes from both MSigDB and GeneSigDB can provide more complete view for discovering gene signatures. We also find several important sub-network signatures for colorectal cancer, such as TP53 sub-network, PCNA sub-network and IL8 sub-network, corresponding to apoptosis, DNA repair, and immune response respectively.
|