CN110277173A - BiGRU drug toxicity forecasting system and prediction technique based on Smi2Vec - Google Patents
BiGRU drug toxicity forecasting system and prediction technique based on Smi2Vec Download PDFInfo
- Publication number
- CN110277173A CN110277173A CN201910423330.4A CN201910423330A CN110277173A CN 110277173 A CN110277173 A CN 110277173A CN 201910423330 A CN201910423330 A CN 201910423330A CN 110277173 A CN110277173 A CN 110277173A
- Authority
- CN
- China
- Prior art keywords
- bigru
- smi2vec
- drug toxicity
- disaggregated model
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 208000030453 Drug-Related Side Effects and Adverse reaction Diseases 0.000 title claims abstract description 79
- 206010070863 Toxicity to various agents Diseases 0.000 title claims abstract description 79
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000012549 training Methods 0.000 claims abstract description 45
- 238000012512 characterization method Methods 0.000 claims abstract description 15
- 238000012360 testing method Methods 0.000 claims description 21
- 238000006243 chemical reaction Methods 0.000 claims description 12
- 239000003814 drug Substances 0.000 claims description 11
- 229940079593 drug Drugs 0.000 claims description 8
- 238000011161 development Methods 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 239000002574 poison Substances 0.000 claims description 3
- 231100000614 poison Toxicity 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 8
- 238000004617 QSAR study Methods 0.000 description 7
- 238000010801 machine learning Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000007480 spreading Effects 0.000 description 4
- 108091005942 ECFP Proteins 0.000 description 3
- 238000012827 research and development Methods 0.000 description 3
- 230000001988 toxicity Effects 0.000 description 3
- 231100000419 toxicity Toxicity 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000007477 logistic regression Methods 0.000 description 2
- 238000003041 virtual screening Methods 0.000 description 2
- 102000014654 Aromatase Human genes 0.000 description 1
- 108010078554 Aromatase Proteins 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001351 cycling effect Effects 0.000 description 1
- 238000009510 drug design Methods 0.000 description 1
- 238000009509 drug development Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
- G16H70/40—ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Toxicology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medicinal Chemistry (AREA)
- Pharmacology & Pharmacy (AREA)
- Chemical & Material Sciences (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention provides a kind of BiGRU drug toxicity forecasting system and prediction technique based on Smi2Vec, comprising: Smi2Vec module, the Smi2Vec module are used to characterization of molecules being converted to atom vector;BiGRU drug toxicity disaggregated model is set to the Smi2Vec output end, the BiGRU drug toxicity disaggregated model includes 1 embeding layer, 1 BiGRU layers, 2 pond layers and 2 dense layers for training the atom vector;And classifier is set to the output end of the BiGRU drug toxicity disaggregated model for generating the output label of classification of task.Compared with the relevant technologies, the BiGRU drug toxicity forecasting system and prediction technique provided by the invention based on Smi2Vec can reach high stable and height accurately requires.
Description
[technical field]
The present invention relates to pharmaceutical properties prediction field more particularly to a kind of BiGRU drug toxicity based on Smi2Vec are pre-
Examining system and prediction technique.
[background technique]
The process of drug design and development needs to expend a large amount of human and material resources and financial resources, grinds when by biological or chemical
When studying carefully means proves that a certain specific molecular can realize certain therapeutic effect, due to newfound molecule Chang Yinwei toxicity, low activity
Novel drugs cannot be finally developed into the various problems such as low solubility, lead to that all that has been achieved is spoiled.
Traditional neural network was once widely used in pharmaceutical properties prediction, such as bioactivity, toxicity, water solubility, still
That there are algorithms is inefficient for these methods, it is difficult to for batch training, the disadvantages of being easy to appear over-fitting.
Bolt is assisted to select some obvious non-compliant molecular structures using computer approach in the related technology.Due to meter
The limitation of sample is not present in calculation machine virtual screening, so if first carrying out computer virtual screening in medicament research and development early stage, then
Pharmacology test is carried out again, and such R&D process is compared with conventional measures, more scientific, reasonability, will shorten significantly new
The R&D cycle of medicine reduces R & D Cost.
The main flow direction of primer discovery is the D-M (Determiner-Measure) construction of molecule and the research of activity relationship (QSAR), common at present
QSAR method be mainly two dimensional quantitative structure activity relationship method (2D-QSAR), three-dimensional quantitative structure activity relationship method (3D-QSAR) and
The characteristics of four-dimensional quantitative structure activity relationship method (4D-QSAR), these three methods can all be limited to itself, based on big data analysis and
The method of machine learning needs mass data, higher to the Spreading requirements of positive negative sample;Conventional machines learning method is for sample
Acquisition classification, training need to take a substantial amount of time;Above based on have supervision and unsupervised machine learning algorithm not only need
Mass data, and need to calculate characterization of molecules using stoichiometry software, it also needs to take considerable time.
Therefore, it is necessary to provide a kind of new BiGRU drug toxicity forecasting system based on Smi2Vec and prediction technique come
It solves the above problems.
[summary of the invention]
The technical problem to be solved by the present invention is to the foreseeable various methods of Drug in the prior art can all be limited to certainly
The characteristics of body, needs mass data based on big data analysis and the method for machine learning, higher to the Spreading requirements of positive negative sample;
Conventional machines learning method classifies for sample collection, training needs to take a substantial amount of time;It is based on having supervision and without prison above
The machine learning algorithm superintended and directed not only needs mass data, but also needs to calculate characterization of molecules using stoichiometry software, equally needs
The technical issues of taking considerable time.
The present invention solves above-mentioned technical problem by the following technical programs:
The BiGRU drug toxicity forecasting system based on Smi2Vec that the present invention provides a kind of, comprising:
Smi2Vec module, the Smi2Vec module are used to characterization of molecules being converted to atom vector;
BiGRU drug toxicity disaggregated model is set to the Smi2Vec output for training the atom vector
End, the BiGRU drug toxicity disaggregated model include 1 embeding layer, 1 BiGRU layers, 2 pond layers and 2 dense layers;
And classifier is set to the BiGRU drug toxicity disaggregated model for generating the output label of classification of task
Output end.
Preferably, the embeding layer is set to the output end of the Smi2Vec module, and the classifier is set to described close
Collect the output end of layer.
The BiGRU drug toxicity prediction technique based on Smi2Vec that the present invention also provides a kind of, comprising:
Step S1: building data set, the data set includes training set, test set and development set;
The conversion of step S2:Smi2Vec: by Smi2Vec module by the training set with the molecule of SMILES format
Feature Conversion is atom vector;
Step S3: building BiGRU drug toxicity disaggregated model: the BiGRU drug toxicity disaggregated model includes 1 insertion
Layer, 1 BiGRU layers, 2 pond layers, 2 dense layers;
Step S4: the atom vector is input to the BiGRU drug toxicity disaggregated model to the BiGRU drug poison
Property disaggregated model is trained;
Step S5: the training result of the BiGRU drug toxicity disaggregated model is sent to the classifier, the classifier
The BiGRU drug toxicity disaggregated model that continues to make a gift to someone the training result after optimization loss function is trained;
Step 6: being calculated by successive ignition, the BiGRU drug toxicity disaggregated model training is completed;
Step 7: the conversion of Smi2Vec being carried out to the data in the test set and transformation result is input to BiGRU medicine
In object toxicity category model, test result is obtained;
Step S8: the test result is analyzed and is discussed.
Preferably, the data set building is made of the training set (80%) and the test set (20%).
Preferably, the data set building is by the training set (80%), the test set (10%) and the development set
(10%) it forms.
Preferably, the step 2 can specifically be divided into following steps:
The molecule of SMILES format is cut into independent atom by step 21, and is extracted to the feature of the atom;
One by one coding of the step 22 with one-hot coding method to the atom being syncopated as, is converted to original for SMILES molecule
Subvector;
Step 23 constructs mapping function, is carried out with Word2Vec Open-Source Tools to the SMILES molecule in the training set
Pre-training, generate dictionary, corresponding sample vector is found by dictionary enquiring, if lacked in dictionary corresponding sample to
It is matching to generate a vector at random for amount.
Preferably, in the step S4, the atom vector is sequentially into the embeding layer, BiGRU layers described, described
Pond layer and the dense layer are handled, to be trained to the BiGRU drug toxicity disaggregated model.
Preferably, in the step S5, the training result of the dense layer is sent to the classifier, the classifier
Continue to make a gift to someone the training result embeding layer after optimization loss function to continue mould of classifying to the BiGRU drug toxicity
Type is trained.
Preferably, in the step S6,100 iterative calculation or the structure iterated to calculate when continuous 5 times are carried out no longer
When variation, the BiGRU drug toxicity disaggregated model i.e. training is completed.
It is provided by the invention to propose a kind of BiGRU drug toxicity prediction based on Smi2Vec compared with the relevant technologies
SMILES characterization of molecules is converted to atom vector using Smi2Vec module, changes mode to characterization of molecules by system and prediction technique
Propose that a kind of conversion time is short, direction of high conversion efficiency;In addition, by comparing several common conventional machines
Model is practised, the performance of BiGRU drug toxicity forecasting system on Tox21 data set provided by the invention based on Smi2Vec is equal
Better than the performance of conventional machines learning model, high stable can be reached and height accurately requires;In addition, provided by the invention be based on
The BiGRU drug toxicity forecasting system of Smi2Vec have it is low to the Spreading requirements of positive negative sample, for sample collection classification, instruction
White silk needs to expend time short advantage.
[Detailed description of the invention]
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for
For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing, in which:
Fig. 1 is the frame diagram of the BiGRU drug toxicity forecasting system provided by the invention based on Smi2Vec;
Fig. 2 is the BiGRU drug toxicity prediction technique flow chart based on Smi2Vec described in Fig. 1;
Fig. 3 is the computing block diagram of Smi2Vec;
Fig. 4 is the working principle diagram of atom vector;
Fig. 5 is the main frame composition of BiGRU drug toxicity disaggregated model;
Fig. 6 is the BiGRU drug toxicity prediction technique provided by the invention based on Smi2Vec and traditional characterization of molecules side
The effect contrast figure of method ECFP.
[specific embodiment]
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that the described embodiments are merely a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts all other
Embodiment shall fall within the protection scope of the present invention.
Fig. 1 is please referred to, Fig. 1 is the frame of the BiGRU drug toxicity forecasting system provided by the invention based on Smi2Vec
Frame figure, the BiGRU drug toxicity forecasting system based on Smi2Vec that the present invention provides a kind of, including Smi2Vec module, BiGRU
(bidirectional valve controlled Recognition with Recurrent Neural Network) drug toxicity disaggregated model and classifier, in which:
The Smi2Vec module is used to characterization of molecules being converted to atom vector, specifically, the Smi2Vec is used to incite somebody to action
(Simplified molecular input line entry specification simplifies molecule and linearly inputs rule SMILES
Model) characterization of molecules of format is converted to atom vector;
It is defeated to be set to the Smi2Vec for training the atom vector for the BiGRU drug toxicity disaggregated model
Outlet, the BiGRU drug toxicity disaggregated model include 1 embeding layer, 1 BiGRU layers, 2 pond layers and 2 dense layers;
The classifier is used to generate the output label of classification of task, is set to the BiGRU drug toxicity classification mould
The output end of type.
Specifically, the embeding layer is set to the output end of the Smi2Vec module, the classifier is set to described close
Collect the output end of layer.
Fig. 2-5 is please referred to, Fig. 2 is the BiGRU drug toxicity prediction technique process based on Smi2Vec described in Fig. 1
Figure;Fig. 3 is the computing block diagram of Smi2Vec;Fig. 4 is the working principle diagram of atom vector;Fig. 5 is BiGRU drug toxicity classification mould
The main frame composition of type.
The BiGRU drug toxicity prediction technique based on Smi2Vec that the present invention also provides a kind of characterized by comprising
Step S1: building data set: the data set is made of training set (80%) and test set (20%), certainly, institute
Stating data set can also be made of training set (80%), development set (10%) and test set (10%).By in the data set to by
The influential drug labelling of body is set as positive sample for 1, does not have influential label to be set as negative sample, removes the negative sample
This, to reject interference data, reduces the influence of noise in the data set;
The conversion of step S2:Smi2Vec: by Smi2Vec module by the training set with the molecule of SMILES format
Be converted to vector;
The conversion process of specific Smi2Vec is as follows,
Step 2.1: independent atom will be cut into the molecule of SMILES format in the training set, wherein to base
Group occur atomic group then by than with inquiry after extract, regard as with individual atom computing.
Statistics is extracted to the data for the atom being syncopated as again and obtains following feature: [' c ', ' C ', ' (', ') ', ' O ',
'=', ' N ', ' [', '] ', ' n ', ' H ', '/', '-', ' S ', ' Cl ', '@@', '@', ' F ', '+', ' ', ' s ', ' # ', ' o ',
' Br ', ' P ', ' ', ' I ', ' Si ', ' % ', ' Sn ', ' As ', ' Se ', ' * ', ' Hg ', ' B ', ' Pt ', ' e ', ' Au ', ' Ge ',
‘Cu’,‘Na’,‘Fe’,‘Sb’,‘T’,‘R’,‘Co’,‘i’,‘Pd’,‘Zn’,‘Pb’,‘M’,‘a’,‘Cd’, ‘Ni’,‘A’,
‘V’,‘d’,‘Ag’,‘K’,‘G’,‘r’,‘Al’,‘p’,‘L’,‘u’,‘Ca’,‘t’,‘Cr’,‘Mn’,‘h’, ‘Li’,‘Mg’,
‘Tl',‘Ti',‘W',‘In',‘Zr',‘b'].Features above is comprising common elements and represents special valence link, bracket, and special point
Son, the symbol of ion etc. ignore number, decimal point.The dictionary comprising all statistical natures in molecule is obtained, dictionary value is
The molecule or character frequency of occurrence;
Step 2.2: the atom being syncopated as being encoded one by one with one-hot coding method, SMILES molecule is converted into original
Subvector;
Step 2.3 constructs mapping function, with Word2Vec Open-Source Tools to the SMILES character in the training set
The molecule that string form occurs carries out pre-training, generates dictionary, corresponding atom vector is found by dictionary enquiring, if in word
Corresponding atom vector is lacked in allusion quotation, and it is matching to generate an atom vector at random;
Step S3: building BiGRU drug toxicity disaggregated model, wherein the BiGRU drug toxicity disaggregated model includes 1
A embeding layer, 1 BiGRU layers, 2 pond layers and 2 dense layers;
Step S4: vector described in step 2 is input to the BiGRU drug toxicity disaggregated model and is trained;
Specifically, the atom vector is sequentially into the embeding layer, BiGRU layers described, the pond layer and described intensive
Layer is handled, to be trained to the BiGRU drug toxicity disaggregated model.
Specific training process is as follows,
Step S41: input x is the atom vector of drug;
Step 42: the true value for exporting y indicates 0 with [1,0], and [0,1] indicates 1, and the result of training and test is one every time
A probability value, respectively a and b, and a+b=1 form a data [a, b];
Step 43: key is BiGRU link in the BiGRU drug toxicity disaggregated model, for list entries X=
(x1, x2 ..., xt), for currently hiding layer state in each GRU unit of t momentIt is by current input X, (t-1)
Moment forward hidden state outputWith the output of reversed hidden stateThree parts codetermine.Since BiGRU can
Regard two unidirectional GRU as, so hiding layer state of the BiGRU in t momentBy preceding to hiding layer stateWith it is reversed
Hide layer stateWeighted sum obtains:
Here Φ and σ represents different activation primitives, W, WZ, and WR and WR represent corresponding weight matrix, and bz and br divide
Door Biao Shi not updated and reset the bigoted of door.One update doorIts hiding layer state is calculated for control loop unit.When
Reset door rtValue when being 0, its meeting so that cycling element progress reset operation come the calculating state before forgetting.
Step S5: the result of the BiGRU drug toxicity disaggregated model training is sent to the classifier, the classifier
The BiGRU drug toxicity disaggregated model that continues to make a gift to someone after optimization loss function is trained,
Specifically, the training result of the dense layer is sent to the classifier, after the classifier optimization loss function
Continue to make a gift to someone the training result embeding layer to continue to be trained the BiGRU drug toxicity disaggregated model.
Preferably, classification results probability value y is calculated using sigmoid function herei, and original tag beforeIt is right
Than objective function LOSS can be obtained are as follows:
yi=sigmoid (Wiht+bi)
Step 6: being calculated by successive ignition, obtain the model that finally training is completed, specifically, carrying out 100 iteration meters
When the structure calculated or iterated to calculate when continuous 5 times no longer changes, the BiGRU drug toxicity disaggregated model i.e. training is completed;
Step 7: the data in the test set or the test set and the development set are carried out with the conversion of Smi2Vec
And transformation result is input in the BiGRU drug toxicity disaggregated model of training completion and is calculated, obtain test result;
Step S8: the obtained test result of step S7 is analyzed and is discussed.
In the following, by described in proposed by the invention based on Smi2Vec BiGRU drug toxicity forecasting system and prediction side
The carry out performance evaluating of method.
It should be noted that the data set used in the present embodiment is Tox21 data set (Tox21 Data
Challenge) the progress performance evaluating of the BiGRU drug toxicity forecasting system to described based on Smi2Vec and prediction technique
Performance is evaluated and tested, which may be to human body 12 kinds of receptors (NR-AR, NR-AR-LBD, NR-AhR, NR- comprising 8013 kinds
Aromatase, NR-ER, NR-ER-LBD, NR-PPAR-gamma, SR-ARE, SR-ATAD5, SR-HSE, SR-MMP, SR-
P53 the data) having an impact.
Firstly, being commented for the performance of the BiGRU drug toxicity forecasting system based on Smi2Vec provided by the invention
It surveys, in this embodiment, each task in the Tox21 data set is tested.In this group experiment, main presentation
Radom Forrest and SVM conventional machines learning model as a result, because Radom Forrest and SVM conventional machines learn mould
Type shows better performance on the Tox21 data set than other conventional methods.Specifically, the Tox21 data set
There are 12 tasks.From following table as can be seen that generally, the BiGRU drug based on Smi2Vec proposed by the invention is malicious
Property forecasting system all shows optimal performance on the Tox21 data set.Specifically, on the verifying collection of all task class
The BiGRU drug toxicity forecasting system based on Smi2Vec provided by the invention is passed compared to Radom Forrest and SVM
System machine learning model has the performance boost of 12.74%-32.75%, there is the performance boost of 5%-40.4% on test set, real
The classifying quality of high standard is showed.
Again, it is please predicted in conjunction with refering to Fig. 6, Fig. 6 for the BiGRU drug toxicity provided by the invention based on Smi2Vec
Method and traditional characterization of molecules method ECFP respectively RF (Ranom Forrest), LR (Logistic RegRession),
Effect contrast figure on DT (Decision Tree), KN (K-Nearest Neighbor) model.In order to embody characterization of molecules side
The effect and conventional molecular characterizing method ECFP of method training in identical machine learning model, from the Tox21 data set
From the point of view of contrast and experiment, the BiGRU drug toxicity prediction technique provided by the present invention based on Smi2Vec is on 4 kinds of models
ROC-AUC score be superior to conventional method.
It is provided by the invention to propose a kind of BiGRU drug toxicity prediction based on Smi2Vec compared with the relevant technologies
SMILES characterization of molecules is converted to atom vector using Smi2Vec module, changes mode to characterization of molecules by system and prediction technique
Propose that a kind of conversion time is short, direction of high conversion efficiency;In addition, by comparing several common conventional machines
Model is practised, the performance of BiGRU drug toxicity forecasting system on Tox21 data set provided by the invention based on Smi2Vec is equal
Better than the performance of conventional machines learning model, high stable can be reached and height accurately requires;In addition, provided by the invention be based on
The BiGRU drug toxicity forecasting system of Smi2Vec have it is low to the Spreading requirements of positive negative sample, for sample collection classification, instruction
White silk needs to expend time short advantage.
Above-described is only embodiments of the present invention, it should be noted here that for those of ordinary skill in the art
For, without departing from the concept of the premise of the invention, improvement can also be made, but these belong to protection model of the invention
It encloses.
Claims (9)
1. a kind of BiGRU drug toxicity forecasting system based on Smi2Vec characterized by comprising
Smi2Vec module, the Smi2Vec module are used to characterization of molecules being converted to atom vector;
BiGRU drug toxicity disaggregated model is set to the Smi2Vec output end for training the atom vector, described
BiGRU drug toxicity disaggregated model includes 1 embeding layer set gradually, 1 BiGRU layers, 2 pond layers and 2 dense layers;
And
Classifier is set to the output of the BiGRU drug toxicity disaggregated model for generating the output label of classification of task
End.
2. the BiGRU drug toxicity forecasting system according to claim 1 based on Smi2Vec, which is characterized in that described embedding
Enter the output end that layer is set to the Smi2Vec module, the classifier is set to the output end of the dense layer.
3. a kind of BiGRU drug toxicity prediction technique based on Smi2Vec characterized by comprising
Step S1: building data set, the data set includes training set, test set and development set;
The conversion of step S2:Smi2Vec: by Smi2Vec module by the training set with the characterization of molecules of SMILES format
Be converted to atom vector;
Step S3: building BiGRU drug toxicity disaggregated model: the BiGRU drug toxicity disaggregated model includes 1 set gradually
A embeding layer, 1 BiGRU layers, 2 pond layers, 2 dense layers;
Step S4: the atom vector is input to the BiGRU drug toxicity disaggregated model to the BiGRU drug toxicity point
Class model is trained;
Step S5: the training result of the BiGRU drug toxicity disaggregated model is sent to the classifier, the classifier optimization
The BiGRU drug toxicity disaggregated model that continues to make a gift to someone the training result after loss function continues to train;
Step S6: calculating by successive ignition, and the BiGRU drug toxicity disaggregated model training is completed;
Step S7: the conversion of Smi2Vec is carried out to the data in the test set and transformation result is input to BiGRU drug poison
In property disaggregated model, test result is obtained;
Step S8: the test result is analyzed and is discussed.
4. the BiGRU drug toxicity prediction technique according to claim 3 based on Smi2Vec, which is characterized in that the number
It is made of according to collection building the training set (80%) and the test set (20%).
5. the BiGRU drug toxicity prediction technique according to claim 3 based on Smi2Vec, which is characterized in that the number
It is made of according to collection building the training set (80%), the test set (10%) and the development set (10%).
6. the BiGRU drug toxicity prediction technique according to claim 3 based on Smi2Vec, which is characterized in that the step
Rapid S2 the following steps are included:
Step S21: the molecule of SMILES format is cut into independent atom, and the feature of the atom is extracted;
Step S22: the coding one by one with one-hot coding method to the atom being syncopated as, by SMILES molecule be converted to atom to
Amount;
Step S23: building mapping function instructs the SMILES molecule in the training set with Word2Vec Open-Source Tools in advance
Practice, generates dictionary, corresponding sample vector is found by dictionary enquiring, if lacking corresponding sample vector in dictionary,
It is matching that a vector is generated at random.
7. the BiGRU drug toxicity prediction technique according to claim 3 based on Smi2Vec, which is characterized in that described
In step S4, the atom vector is carried out sequentially into the embeding layer, BiGRU layers described, the pond layer and the dense layer
Processing, to be trained to the BiGRU drug toxicity disaggregated model.
8. the BiGRU drug toxicity prediction technique according to claim 3 based on Smi2Vec, which is characterized in that described
In step S5, the training result of the dense layer is sent to the classifier, continue after the classifier optimization loss function by
The training result makes a gift to someone the embeding layer to continue to be trained the BiGRU drug toxicity disaggregated model.
9. the BiGRU drug toxicity prediction technique according to claim 3 based on Smi2Vec, which is characterized in that described
In step S6, when the structure for carrying out 100 iterative calculation or iterating to calculate when continuous 5 times no longer changes, the BiGRU drug poison
Property disaggregated model i.e. training complete.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910423330.4A CN110277173A (en) | 2019-05-21 | 2019-05-21 | BiGRU drug toxicity forecasting system and prediction technique based on Smi2Vec |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910423330.4A CN110277173A (en) | 2019-05-21 | 2019-05-21 | BiGRU drug toxicity forecasting system and prediction technique based on Smi2Vec |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110277173A true CN110277173A (en) | 2019-09-24 |
Family
ID=67960147
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910423330.4A Pending CN110277173A (en) | 2019-05-21 | 2019-05-21 | BiGRU drug toxicity forecasting system and prediction technique based on Smi2Vec |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110277173A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111210878A (en) * | 2020-01-06 | 2020-05-29 | 湖南大学 | Medicine prediction method based on deep learning |
CN111243682A (en) * | 2020-01-10 | 2020-06-05 | 京东方科技集团股份有限公司 | Method, device, medium and apparatus for predicting toxicity of drug |
CN112185477A (en) * | 2020-09-25 | 2021-01-05 | 北京望石智慧科技有限公司 | Method and device for extracting molecular characteristics and calculating three-dimensional quantitative structure-activity relationship |
CN113378168A (en) * | 2021-07-04 | 2021-09-10 | 昆明理工大学 | Method for realizing DDoS attack detection in SDN environment based on Renyi entropy and BiGRU algorithm |
CN115691703A (en) * | 2022-10-15 | 2023-02-03 | 苏州创腾软件有限公司 | Drug property prediction method and system based on pharmacokinetic model |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108536999A (en) * | 2018-03-21 | 2018-09-14 | 南京邮电大学 | A kind of ligand small molecule key minor structure screening technique and device |
CN108830334A (en) * | 2018-06-25 | 2018-11-16 | 江西师范大学 | A kind of fine granularity target-recognition method based on confrontation type transfer learning |
CN108875722A (en) * | 2017-12-27 | 2018-11-23 | 北京旷视科技有限公司 | Character recognition and identification model training method, device and system and storage medium |
CN109033738A (en) * | 2018-07-09 | 2018-12-18 | 湖南大学 | A kind of pharmaceutical activity prediction technique based on deep learning |
CN109391602A (en) * | 2017-08-11 | 2019-02-26 | 北京金睛云华科技有限公司 | A kind of zombie host detection method |
-
2019
- 2019-05-21 CN CN201910423330.4A patent/CN110277173A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109391602A (en) * | 2017-08-11 | 2019-02-26 | 北京金睛云华科技有限公司 | A kind of zombie host detection method |
CN108875722A (en) * | 2017-12-27 | 2018-11-23 | 北京旷视科技有限公司 | Character recognition and identification model training method, device and system and storage medium |
CN108536999A (en) * | 2018-03-21 | 2018-09-14 | 南京邮电大学 | A kind of ligand small molecule key minor structure screening technique and device |
CN108830334A (en) * | 2018-06-25 | 2018-11-16 | 江西师范大学 | A kind of fine granularity target-recognition method based on confrontation type transfer learning |
CN109033738A (en) * | 2018-07-09 | 2018-12-18 | 湖南大学 | A kind of pharmaceutical activity prediction technique based on deep learning |
Non-Patent Citations (4)
Title |
---|
CHAKRABARTY A等: "Context sensitive lemmatization using two successive bidirectional gated recurrent networks", 《PROC OF PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION OF COMPUTATION OF LINGUISTICS》 * |
CHAKRABARTY A等: "Context sensitive lemmatization using two successive bidirectional gated recurrent networks", 《PROC OF PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION OF COMPUTATION OF LINGUISTICS》, 4 August 2017 (2017-08-04), pages 1481 - 1491 * |
ZHE QUAN等: "A System for Learning Atoms Based on Long Short-Term Memory Recurrent Neural Networks", 《2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE(BIBM)》 * |
ZHE QUAN等: "A System for Learning Atoms Based on Long Short-Term Memory Recurrent Neural Networks", 《2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE(BIBM)》, 24 January 2019 (2019-01-24), pages 728 - 733 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111210878A (en) * | 2020-01-06 | 2020-05-29 | 湖南大学 | Medicine prediction method based on deep learning |
CN111243682A (en) * | 2020-01-10 | 2020-06-05 | 京东方科技集团股份有限公司 | Method, device, medium and apparatus for predicting toxicity of drug |
CN112185477A (en) * | 2020-09-25 | 2021-01-05 | 北京望石智慧科技有限公司 | Method and device for extracting molecular characteristics and calculating three-dimensional quantitative structure-activity relationship |
CN112185477B (en) * | 2020-09-25 | 2024-04-16 | 北京望石智慧科技有限公司 | Method and device for extracting molecular characteristics and calculating three-dimensional quantitative structure-activity relationship |
CN113378168A (en) * | 2021-07-04 | 2021-09-10 | 昆明理工大学 | Method for realizing DDoS attack detection in SDN environment based on Renyi entropy and BiGRU algorithm |
CN113378168B (en) * | 2021-07-04 | 2022-05-31 | 昆明理工大学 | Method for realizing DDoS attack detection in SDN environment based on Renyi entropy and BiGRU algorithm |
CN115691703A (en) * | 2022-10-15 | 2023-02-03 | 苏州创腾软件有限公司 | Drug property prediction method and system based on pharmacokinetic model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110277173A (en) | BiGRU drug toxicity forecasting system and prediction technique based on Smi2Vec | |
Li et al. | A survey of convolutional neural networks: analysis, applications, and prospects | |
Cui et al. | Efficient human motion prediction using temporal convolutional generative adversarial network | |
Wu et al. | Weight-adapted convolution neural network for facial expression recognition in human–robot interaction | |
CN109920501A (en) | Electronic health record classification method and system based on convolutional neural networks and Active Learning | |
Wei et al. | Multi-modal facial expression feature based on deep-neural networks | |
CN109086886A (en) | A kind of convolutional neural networks learning algorithm based on extreme learning machine | |
CN112732921B (en) | False user comment detection method and system | |
CN109815920A (en) | Gesture identification method based on convolutional neural networks and confrontation convolutional neural networks | |
Guo et al. | Facial expressions recognition with multi-region divided attention networks for smart education cloud applications | |
US20210264300A1 (en) | Systems and methods for labeling data | |
CN110009108A (en) | A kind of completely new quantum transfinites learning machine | |
Zhou et al. | Enhance the recognition ability to occlusions and small objects with Robust Faster R-CNN | |
Zhao et al. | Crop pest recognition in real agricultural environment using convolutional neural networks by a parallel attention mechanism | |
Shang et al. | Image spam classification based on convolutional neural network | |
Menaga et al. | Deep learning: a recent computing platform for multimedia information retrieval | |
CN114398485B (en) | Expert portrait construction method and device based on multi-view fusion | |
Hattori et al. | A deep bidirectional long short-term memory approach applied to the protein secondary structure prediction problem | |
Zeng et al. | Flower image classification based on an improved lightweight neural network with multi-scale feature fusion and attention mechanism | |
Serpa et al. | Milestones and new frontiers in deep learning | |
Pei et al. | Financial trading decisions based on deep fuzzy self-organizing map | |
Lim et al. | Predicting drug-target interaction using 3D structure-embedded graph representations from graph neural networks | |
CN110176279A (en) | Lead compound virtual screening method and device based on small sample | |
Tiwari et al. | SketchGPT: Autoregressive Modeling for Sketch Generation and Recognition | |
Zhou et al. | Deep learning model and its application in big data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190924 |
|
RJ01 | Rejection of invention patent application after publication |