CN110853707A - Gene regulation and control network reconstruction method based on deep learning - Google Patents

Gene regulation and control network reconstruction method based on deep learning Download PDF

Info

Publication number
CN110853707A
CN110853707A CN201911141752.9A CN201911141752A CN110853707A CN 110853707 A CN110853707 A CN 110853707A CN 201911141752 A CN201911141752 A CN 201911141752A CN 110853707 A CN110853707 A CN 110853707A
Authority
CN
China
Prior art keywords
network
dynamics
gene regulation
gene
messenger rna
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911141752.9A
Other languages
Chinese (zh)
Inventor
张章
王立飞
王硕
陶如意
牟牧云
肖镜舒
张江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jizhi Academy (beijing) Technology Co Ltd
Beijing Normal University
Original Assignee
Jizhi Academy (beijing) Technology Co Ltd
Beijing Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jizhi Academy (beijing) Technology Co Ltd, Beijing Normal University filed Critical Jizhi Academy (beijing) Technology Co Ltd
Priority to CN201911141752.9A priority Critical patent/CN110853707A/en
Publication of CN110853707A publication Critical patent/CN110853707A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biotechnology (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a gene regulation network reconstruction method based on deep learning, which reconstructs a network structure of a gene regulation network from observed time sequence data of messenger RNA (mRNA) concentration change, namely, a mutual regulation relation between genes. The method provides a data-driven deep learning framework to simultaneously complete the reconstruction of a gene regulation network and the simulation of gene regulation dynamics, and the method comprises two co-trained modules which are respectively as follows: a adjacency matrix generator representing the connection structure of the gene regulatory network and a kinetic predictor that can predict the concentration of each messenger RNA in the future. The model of the method can reconstruct a gene regulation and control network with higher precision, so that people can conjecture the regulation and control relation among genes from observation data and possibly help to realize the control of biological characters.

Description

Gene regulation and control network reconstruction method based on deep learning
Technical Field
The invention relates to the crossing field of deep learning and biological science, which can be used for reconstructing a gene regulation network. The model integrates a plurality of multilayer perceptrons by using a Gumbel-Softmax mechanism, and can adjust the weights in a network generator and the multilayer perceptrons by forward simulation of the evolution process of the gene regulation and control network and reverse propagation to realize the reconstruction of the gene regulation and control network structure and the simulation of dynamics.
Background
Gene Regulatory Networks (GRNs) play an important role in cell development and cellular characteristics. Transcription Factors (TFs) interact to regulate millions of downstream genes, forming a regulatory network. To connect this network, a great deal of effort is put into understanding the basic principles of biology. One more common method of reconfiguring gene regulatory networks is through biochemical experiments. However, we can also reconstruct the gene regulation network by analyzing the gene expression time sequence data, namely a method for reconstructing the gene regulation network by the time sequence data of messenger RNA concentration change in the gene expression process, the method uses deep learning technology, forwardly simulates the gene regulation process through a neural network, optimizes all parameters in the forward process through backward propagation, and finally can guess the network structure of the gene regulation network with higher accuracy and can obtain a dynamics predictor capable of modeling the gene regulation dynamics with higher accuracy.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to provide a data-driven method for reconstructing a gene control network from messenger RNA concentration change time sequence data, which can enable people to find the control relationship among genes, realize clear understanding of the biological gene control network and further possibly play a role in controlling biological characters.
In order to achieve the purpose, the invention provides a network reconstruction model based on deep learning. The principle of operation of gene regulatory networks can be summarized as follows: the process of gene expression is the process of gene transcription into RNA and RNA translation into protein, and the gene, messenger RNA and protein are in one-to-one correspondence. The regulatory relationship between genes is reflected in the influence of different proteins on other gene tables. In this process, messenger RNA concentration is most easily measured, so we use the change in concentration of messenger RNA to reconstruct the gene regulatory network.
Specifically, a gene regulation relationship is modeled into a network, a node is messenger RNA transcribed by a gene, node information is the concentration of the messenger RNA, and if the gene A has a regulation relationship with the gene B, an oriented edge exists between the node A and the node B.
Our method implementation can be briefly described as follows: firstly, a network generator generates an adjacency matrix by using Gumbel-Softmax sampling technology, and the adjacency matrix represents the connection mode of the gene regulation network. The internal parameters of the adjacency matrix generator are initialized randomly, so that the adjacency matrix generated by the network generator cannot accurately represent the real gene regulation network structure at the beginning. The formula generated for each element of the adjacency matrix is as follows:
Figure BDA0002281135510000021
wherein α ij is the probability that the element of the ith row and the jth column of the adjacency matrix is 1, ξ ij is the result of repeating the logarithm operation and the inverse operation twice by random sampling from the standard normal distribution, and τ is the temperature parameter;
further, in the study of gene regulatory dynamics, because there is heterogeneity in gene regulatory dynamics, i.e. different nodes follow different regulatory rules, we equip each node with a multi-layered perceptron as its dynamics learner. Since a column of the adjacency matrix represents the connection relationship between a specific gene and its in-degree node, we will further use the column to filter (i.e., multiply) all messenger RNA concentrations and input neighbor information into a specific multi-tier perceptron. The output of the multilayer perceptron is the concentration of this messenger RNA at the next moment.
After obtaining all messenger RNA concentration information of the next moment, calculating loss by the predicted value and a corresponding real value in data, carrying out backward propagation on the loss by a gradient descent method, and adjusting parameters in the process, including parameters of a network structure generator and parameters of a plurality of dynamics predictors. The loss function of this process can be expressed as follows:
Figure BDA0002281135510000022
the above processes are repeated until the loss is converged, at the moment, the network generator can sample an adjacent matrix which can accurately represent the real network connection mode, and the dynamics predictor can also accurately represent the dynamics process of how the specific gene is regulated and controlled by the neighbors.
In addition, we introduce a structural loss method, optimized for the network generator itself. The structural loss function means that the gene regulation network in the reality is known to be sparse, so that the number of all 1 in the adjacency matrix generated by the network generator is calculated as a penalty term. If the number of 1 in the adjacency matrix is more, the generated network of the adjacency matrix is more dense, the structural loss is more, so that the generated network is sparse, the method plays a greater role when the number of genes is large, and the structural loss can be expressed as the following formula:
Figure BDA0002281135510000031
wherein Ls represents the structure loss, the alpha value is the structure loss parameter, and if the alpha is larger, the structure loss is stronger. In addition, in the gradient descending process, the network generator and the dynamics predictor respectively correspond to the network structure and the regulation dynamics of the gene regulation network, so that the network generator and the dynamics predictor represent two different types of variables, and the learning rates corresponding to different sizes are also adopted in the optimization process.
Advantageous effects
1) The invention can complete the reconstruction of the gene regulation network, so that people can more clearly know the regulation relation among genes and have potential possibility to promote the further control of biological characters.
2) Besides reconstructing a gene regulation network, the invention can also respectively model different gene regulation dynamics and accurately predict the concentration state of a certain messenger RNA in the future.
3) The invention adopts a deep learning method to simulate the gene regulation dynamics, thereby simulating the highly nonlinear stimulation or inhibition between genes.
4) The invention achieves the highest accuracy in the gene regulation network reconstruction technology at present while having the advantages.
Drawings
Fig. 1 is a schematic diagram of a frame: the framework is integrally divided into two parts, namely a network structure generator and a plurality of kinetic predictors, wherein each kinetic predictor models a kinetic process regulated by a specific gene.
Fig. 2 is a network reconfiguration effect diagram: the graph is an roc curve drawn from the trained net generator, the roc curve exceeds the diagonal and the auc value is greater than 0.5, which means that our net generator learns the net structure with high accuracy.
FIG. 3 is a graph of the predicted effect of kinetics: the graph shows the change in concentration of authentic messenger RNA at a given initial concentration and the change in concentration of messenger RNA predicted by our method. It can be seen that our method accurately holds the variation trend of messenger RNA concentration in the presence of noise.
Detailed Description
The gene regulatory network reconfiguration process is further explained below with reference to the accompanying drawings.
The problem to be solved by the invention is to reconstruct a gene regulation network through concentration change time sequence data of messenger RNA by a deep learning-based method. To achieve the goal, two submodules, namely a network generator and a dynamics predictor, are built, all parameters in a model are adjusted by a back propagation and gradient descent technology in deep learning, and the overall model architecture is shown in FIG. 1.
Our overall goal is to use the full messenger RNA concentration data at time t, i.e., Xt, and the full messenger RNA concentration information to predict time t +1, i.e., Xt + 1. Network structure was learned in constant prediction and back-propagation tuning, and kinetic learners were learned that were able to accurately fit gene regulation kinetics. The model consists of two parts, 1, a network generator, and the function of the network generator is to generate an adjacency matrix by sampling to represent the connection structure of the network. A set of kinetic predictors, each of which is a multi-layered perceptron for learning how dynamically a particular gene is affected by his regulators. The two major parts work together in such a way that: the network generator generates an adjacency matrix, the dynamics predictor takes as input a specific column of the adjacency matrix (representing the regulatory genes of a specific certain gene) and all node states at time t, and outputs a scalar as the concentration value of the specific messenger RNA at the next time. And the output of all the dynamics predictors is spliced to the concentration vectors of all messenger RNAs at the next moment, loss is calculated and reversely propagated with the real concentration vector, and parameters of the dynamics predictors and the network generator are adjusted. Finally, the network generator will generate a adjacency matrix close to the real situation, and the dynamics predictor can also accurately learn how a specific gene is regulated by the neighbors.
Since in a gene regulatory network all genes are not regulated by their neighbors following exactly the same kinetic rules, and the kinetic process of gene regulation is highly non-linear. We do not try to learn many different dynamics rules with a neural network structure, but rather generate a specific dynamics learner for each gene. The dynamics learning device is a multi-layer neural network, the number of internal hidden layers and the number of layers of the dynamics learning device can be changed, however, because the goal of the dynamics learning device is to receive neighbor information and predict concentration information of the next moment of the dynamics learning device as far as possible, the input dimension and the output dimension of the dynamics learning device are respectively fixed as the number of nodes and 1, a specific column of an adjacent matrix is multiplied by a vector formed by all messenger RNA concentrations at the moment t (the operation can filter out the concentration information of genes of neighbors which are not considered as the genes by a network generator) and then input into the MLP, and the output of the dynamics learning device is considered to represent the concentration of the next moment of the nodes. We put the outputs of all the kinetic predictors together into a vector of length N, representing the total messenger RNA concentration at time t + 1.
Our specific embodiments may be stated in steps as follows:
1) internal parameters of the network generator and the dynamics predictor are randomly initialized.
2) And sampling the internal parameters of the network generator by using a Gumbel-softmax technology to obtain an adjacency matrix.
3) And (4) performing sparsity punishment on the adjacency matrix, namely calculating the number of 1 in the adjacency matrix as a punishment item. The penalty term can be formulated as:
Figure BDA0002281135510000051
wherein Ls represents the structure loss, the alpha value is the structure loss parameter, and if the alpha is larger, the structure loss is stronger.
4) Let i equal to 1, and input all node concentration information Xt at ith column and t moment of the adjacency matrix into the ith dynamics predictor. The dynamics predictor outputs the concentration information of the inode at the t +1 moment. Iteratively outputting the trained dynamics predictor to obtain a prediction curve and a real curve of iterative output as shown in figure 2
5) Comparing the concentration information of the i-node at the t +1 moment with the real concentration information of the i-node at the t +1 moment, calculating a loss function, wherein the loss function is calculated by an L1 norm and can be represented as the following formula:
Figure BDA0002281135510000052
6) repeating the steps of 4) and 5), each time repeating i +1, until all kinetic predictors have been trained.
7) Performing multiple rounds of training, and repeating the steps 2) to 6) for each round.
8) The training is stopped until the loss function converges, at which time the network generator can sample the adjacency matrix with higher accuracy, as shown in fig. 3
Training is carried out according to the steps, when the loss function is converged, the network generator can generate a relatively accurate gene regulation network structure, and the corresponding dynamics predictor can also accurately represent the dynamics corresponding to a certain gene.

Claims (1)

1. A gene regulation network reconstruction method based on deep learning is characterized in that a gene regulation network is reconstructed through concentration change time sequence data of messenger RNA, two sub-modules, namely a network generator and a dynamics predictor, are built, and all parameters in a model are adjusted by a back propagation and gradient descent technology in the deep learning; the goal is to use the total messenger RNA concentration data at time t, i.e., Xt, and the total messenger RNA concentration information to predict time t +1, i.e., Xt + 1; learning a network structure in continuous prediction and back propagation adjustment and learning a dynamics learner capable of accurately fitting gene regulation dynamics;
the method comprises the following specific steps:
1) randomly initializing internal parameters of a network generator and a dynamics predictor;
2) sampling internal parameters of the network generator by using a gumbel-softmax technology to obtain an adjacency matrix, wherein the gumbel-softmax refers to a differentiable sampling technology, and a calculation process simulates a normal sampling process;
3) and (3) performing sparsity punishment on the adjacency matrix, namely calculating the number of 1 in the adjacency matrix as a punishment item, wherein the punishment item is expressed by the formula:
Figure FDA0002281135500000011
wherein Ls represents the structure loss, α value structure loss parameter, if α is bigger, the structure loss is stronger;
4) let i equal to 1, input the concentration information Xt of all nodes in the ith column and t moment of the adjacency matrix into the ith dynamics predictor, and the dynamics predictor outputs the concentration information of the i node at t +1 moment;
5) comparing the concentration information of the i-node at the t +1 moment with the concentration information of the i-node at the real t +1 moment, calculating a loss function, wherein the loss function is calculated by an L1 norm, and the L1 norm is the absolute value of the difference between a predicted value and a real value and is expressed by the following formula:
Figure FDA0002281135500000021
6) repeating the steps of 4) and 5), repeating i +1 each time until all kinetic predictors have been trained;
7) performing a plurality of rounds of training, each round repeating steps 2) to 6);
8) the training is stopped until the loss function converges.
CN201911141752.9A 2019-11-20 2019-11-20 Gene regulation and control network reconstruction method based on deep learning Pending CN110853707A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911141752.9A CN110853707A (en) 2019-11-20 2019-11-20 Gene regulation and control network reconstruction method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911141752.9A CN110853707A (en) 2019-11-20 2019-11-20 Gene regulation and control network reconstruction method based on deep learning

Publications (1)

Publication Number Publication Date
CN110853707A true CN110853707A (en) 2020-02-28

Family

ID=69602916

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911141752.9A Pending CN110853707A (en) 2019-11-20 2019-11-20 Gene regulation and control network reconstruction method based on deep learning

Country Status (1)

Country Link
CN (1) CN110853707A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111445944A (en) * 2020-03-27 2020-07-24 江南大学 RNA binding protein recognition based on multi-view depth features and multi-label learning
CN112992267A (en) * 2021-04-13 2021-06-18 中国人民解放军军事科学院军事医学研究院 Single-cell transcription factor regulation network prediction method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109215735A (en) * 2018-09-21 2019-01-15 西南民族大学 A method of building gene regulatory network
CN110223785A (en) * 2019-05-28 2019-09-10 北京师范大学 A kind of infectious disease transmission network reconstruction method based on deep learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109215735A (en) * 2018-09-21 2019-01-15 西南民族大学 A method of building gene regulatory network
CN110223785A (en) * 2019-05-28 2019-09-10 北京师范大学 A kind of infectious disease transmission network reconstruction method based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHANG ZHANG ET AL.: "A General Deep Learning Framework for Network Reconstruction and Dynamics Learning", 《HTTPS://ARXIV.ORG/ABS/1812.11482》 *
杨斌: "基于计算智能的基因调控网络建模研究", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111445944A (en) * 2020-03-27 2020-07-24 江南大学 RNA binding protein recognition based on multi-view depth features and multi-label learning
CN111445944B (en) * 2020-03-27 2023-04-18 江南大学 RNA binding protein recognition based on multi-view depth features and multi-label learning
CN112992267A (en) * 2021-04-13 2021-06-18 中国人民解放军军事科学院军事医学研究院 Single-cell transcription factor regulation network prediction method and device
CN112992267B (en) * 2021-04-13 2024-02-09 中国人民解放军军事科学院军事医学研究院 Single-cell transcription factor regulation network prediction method and device

Similar Documents

Publication Publication Date Title
Alaloul et al. Data processing using artificial neural networks
Munakata Fundamentals of the new artificial intelligence: neural, evolutionary, fuzzy and more
Moustafa et al. Performance evaluation of artificial neural networks for spatial data analysis
KR20170031695A (en) Decomposing convolution operation in neural networks
Carpenter et al. A comparison of polynomial approximations and artificial neural nets as response surfaces
Bai et al. Prediction of SARS epidemic by BP neural networks with online prediction strategy
CN110223785A (en) A kind of infectious disease transmission network reconstruction method based on deep learning
JP2016536664A (en) An automated method for correcting neural dynamics
Lun et al. The modified sufficient conditions for echo state property and parameter optimization of leaky integrator echo state network
CN111382840B (en) HTM design method based on cyclic learning unit and oriented to natural language processing
CN110853707A (en) Gene regulation and control network reconstruction method based on deep learning
Kozlova et al. The use of neural networks for planning the behavior of complex systems
CN117786286A (en) Fluid mechanics equation solving method based on physical information neural network
Kuang et al. Digital implementation of the spiking neural network and its digit recognition
Stromatias Developing a supervised training algorithm for limited precision feed-forward spiking neural networks
Pupezescu Pulsating Multilayer Perceptron
Bakumenko et al. Synthesis method of robust neural network models of systems and processes
Park et al. Development of compositional and contextual communication of robots by using the multiple timescales dynamic neural network
CN115952838B (en) Self-adaptive learning recommendation system-based generation method and system
Gruodis Realizations of the Artificial Neural Network for Process Modeling. Overview of Current Implementations
Shi et al. Temporal coding in recurrent spiking neural networks with synaptic delay-weight plasticity
Nouh et al. Artificial Neural Network Approach versus Analytical Solutions for Relativistic Polytropes
Zairi DeepLearning for Computer Vision Problems: Litterature Review
Nourafza et al. Design of a Cellular Sugarscape Environment to Increase the Learning Speed in a Stochastic Multi-agent Network
Messineo et al. FEED-FORWARD NEURAL NETWORKS: AN APPLICATION TO THE PREDICTION OF STUDENTS'PERFORMANCE

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Zhang Zhang

Inventor after: Wang Lifei

Inventor after: Wang Shuo

Inventor after: Tao Ruyi

Inventor after: Mou Muyun

Inventor after: Xiao Jingshu

Inventor after: Zhang Jiang

Inventor after: Cai Jun

Inventor before: Zhang Zhang

Inventor before: Wang Lifei

Inventor before: Wang Shuo

Inventor before: Tao Ruyi

Inventor before: Mou Muyun

Inventor before: Xiao Jingshu

Inventor before: Zhang Jiang

WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200228