SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model

A large number of inorganic and organic compounds are able to bind DNA and form complexes, among which drug-related molecules are important. Chromatin accessibility changes not only directly affect drug–DNA interactions, but they can promote or inhibit the expression of the critical genes associated...

Full description

Bibliographic Details
Main Authors: Yikang Zhang, Xiaomin Chu, Yelu Jiang, Hongjie Wu, Lijun Quan
Format: Article
Language:English
Published: MDPI AG 2022-03-01
Series:Genes
Subjects:
Online Access:https://www.mdpi.com/2073-4425/13/4/568
_version_ 1797434632574074880
author Yikang Zhang
Xiaomin Chu
Yelu Jiang
Hongjie Wu
Lijun Quan
author_facet Yikang Zhang
Xiaomin Chu
Yelu Jiang
Hongjie Wu
Lijun Quan
author_sort Yikang Zhang
collection DOAJ
description A large number of inorganic and organic compounds are able to bind DNA and form complexes, among which drug-related molecules are important. Chromatin accessibility changes not only directly affect drug–DNA interactions, but they can promote or inhibit the expression of the critical genes associated with drug resistance by affecting the DNA binding capacity of TFs and transcriptional regulators. However, the biological experimental techniques for measuring it are expensive and time-consuming. In recent years, several kinds of computational methods have been proposed to identify accessible regions of the genome. Existing computational models mostly ignore the contextual information provided by the bases in gene sequences. To address these issues, we proposed a new solution called SemanticCAP. It introduces a gene language model that models the context of gene sequences and is thus able to provide an effective representation of a certain site in a gene sequence. Basically, we merged the features provided by the gene language model into our chromatin accessibility model. During the process, we designed methods called SFA and SFC to make feature fusion smoother. Compared to DeepSEA, gkm-SVM, and k-mer using public benchmarks, our model proved to have better performance, showing a 1.25% maximum improvement in auROC and a 2.41% maximum improvement in auPRC.
first_indexed 2024-03-09T10:35:55Z
format Article
id doaj.art-27761d86800044b6bdc79dc686f67841
institution Directory Open Access Journal
issn 2073-4425
language English
last_indexed 2024-03-09T10:35:55Z
publishDate 2022-03-01
publisher MDPI AG
record_format Article
series Genes
spelling doaj.art-27761d86800044b6bdc79dc686f678412023-12-01T20:56:19ZengMDPI AGGenes2073-44252022-03-0113456810.3390/genes13040568SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language ModelYikang Zhang0Xiaomin Chu1Yelu Jiang2Hongjie Wu3Lijun Quan4School of Computer Science and Technology, Soochow University, Suzhou 215006, ChinaSchool of Computer Science and Technology, Soochow University, Suzhou 215006, ChinaSchool of Computer Science and Technology, Soochow University, Suzhou 215006, ChinaSchool of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, ChinaSchool of Computer Science and Technology, Soochow University, Suzhou 215006, ChinaA large number of inorganic and organic compounds are able to bind DNA and form complexes, among which drug-related molecules are important. Chromatin accessibility changes not only directly affect drug–DNA interactions, but they can promote or inhibit the expression of the critical genes associated with drug resistance by affecting the DNA binding capacity of TFs and transcriptional regulators. However, the biological experimental techniques for measuring it are expensive and time-consuming. In recent years, several kinds of computational methods have been proposed to identify accessible regions of the genome. Existing computational models mostly ignore the contextual information provided by the bases in gene sequences. To address these issues, we proposed a new solution called SemanticCAP. It introduces a gene language model that models the context of gene sequences and is thus able to provide an effective representation of a certain site in a gene sequence. Basically, we merged the features provided by the gene language model into our chromatin accessibility model. During the process, we designed methods called SFA and SFC to make feature fusion smoother. Compared to DeepSEA, gkm-SVM, and k-mer using public benchmarks, our model proved to have better performance, showing a 1.25% maximum improvement in auROC and a 2.41% maximum improvement in auPRC.https://www.mdpi.com/2073-4425/13/4/568chromatin accessibilitydrug designlanguage modeltransformerfeature fusion
spellingShingle Yikang Zhang
Xiaomin Chu
Yelu Jiang
Hongjie Wu
Lijun Quan
SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model
Genes
chromatin accessibility
drug design
language model
transformer
feature fusion
title SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model
title_full SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model
title_fullStr SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model
title_full_unstemmed SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model
title_short SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model
title_sort semanticcap chromatin accessibility prediction enhanced by features learning from a language model
topic chromatin accessibility
drug design
language model
transformer
feature fusion
url https://www.mdpi.com/2073-4425/13/4/568
work_keys_str_mv AT yikangzhang semanticcapchromatinaccessibilitypredictionenhancedbyfeatureslearningfromalanguagemodel
AT xiaominchu semanticcapchromatinaccessibilitypredictionenhancedbyfeatureslearningfromalanguagemodel
AT yelujiang semanticcapchromatinaccessibilitypredictionenhancedbyfeatureslearningfromalanguagemodel
AT hongjiewu semanticcapchromatinaccessibilitypredictionenhancedbyfeatureslearningfromalanguagemodel
AT lijunquan semanticcapchromatinaccessibilitypredictionenhancedbyfeatureslearningfromalanguagemodel