CN117594157A - Method and device for generating molecules of single system based on reinforcement learning - Google Patents

Method and device for generating molecules of single system based on reinforcement learning Download PDF

Info

Publication number
CN117594157A
CN117594157A CN202410077808.3A CN202410077808A CN117594157A CN 117594157 A CN117594157 A CN 117594157A CN 202410077808 A CN202410077808 A CN 202410077808A CN 117594157 A CN117594157 A CN 117594157A
Authority
CN
China
Prior art keywords
model
training
molecular
smiles
training model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410077808.3A
Other languages
Chinese (zh)
Other versions
CN117594157B (en
Inventor
李中伟
谢爱峰
柳彦宏
鲍雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yantai Guogong Intelligent Technology Co ltd
Original Assignee
Yantai Guogong Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yantai Guogong Intelligent Technology Co ltd filed Critical Yantai Guogong Intelligent Technology Co ltd
Priority to CN202410077808.3A priority Critical patent/CN117594157B/en
Publication of CN117594157A publication Critical patent/CN117594157A/en
Application granted granted Critical
Publication of CN117594157B publication Critical patent/CN117594157B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Medical Informatics (AREA)
  • Medicinal Chemistry (AREA)
  • Databases & Information Systems (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A molecular generation method and device of a single system based on reinforcement learning belong to the technical field of molecular generation prediction, and the method carries out de-duplication treatment on collected molecular expressions to obtain a molecular data set; expanding the molecular data set in an atomic substitution mode to obtain an expanded data set and performing deduplication treatment; pre-training the transducer model through the expanded data set subjected to the de-preprocessing to obtain a pre-training model V1; performing reinforcement learning treatment on the pre-training model V1 to obtain a pre-training model V2; and carrying out fine tuning treatment on the pre-training model V2, quantitatively selecting molecules meeting the conditions in the fine tuning treatment process to participate in training of the pre-training model V2, obtaining a pre-training model V3 after the fine tuning treatment, and carrying out new molecule generation of a single system through the pre-training model V3. The invention obviously improves the discovery efficiency of new molecules meeting the production requirement and greatly shortens the period of research and development of new molecules in laboratories in the chemical field.

Description

Method and device for generating molecules of single system based on reinforcement learning
Technical Field
The invention relates to a method and a device for generating molecules of a single system based on reinforcement learning, and belongs to the technical field of molecular generation prediction.
Background
At present, in the field of chemical molecule research and development, the design and generation of new molecules is always a time-consuming and difficult task, and the research and development personnel often waste a great deal of energy by searching new molecules from a huge chemical space through traditional methods such as chemical reaction paths and the like.
In recent years, with the vigorous development of AI technology such as deep learning, the development of AI assisted chemical molecules is receiving more and more attention, and novel innovative ideas and solutions are provided for developers. The introduction of AI technology accelerates the molecular development process, shortens the development period, reduces the development cost, provides more choices for the research personnel, and brings breakthrough progress to the chemical molecular development.
However, the bottleneck of AI molecule generation is limited by the public data set, so that the diversity of the generated molecules is difficult to deviate from the restriction of the public data set, but the generation of new molecules deviating from the existing patent protection is the most core task of molecular discovery, so that the bottleneck of insufficient data is avoided, and the generation of new molecules with diversity and meeting the target conditions is a technical problem to be solved in the field of molecular discovery.
Disclosure of Invention
Therefore, the invention provides a method and a device for generating molecules by a single system based on reinforcement learning, which solve the problems that the traditional technology cannot get rid of the bottleneck of insufficient data, cannot generate new molecules with diversity and meeting target conditions, and causes long research and development period of the new molecules.
In order to achieve the above object, the present invention provides the following technical solutions: a method of molecular generation for reinforcement learning based single systems comprising:
collecting molecular expressions from a public database, and performing deduplication processing on the collected molecular expressions to obtain a molecular data set;
expanding the molecular data set in an atomic substitution mode to obtain an expanded data set, and performing deduplication processing on the expanded data set;
pre-training the transducer model through the expanded data set subjected to the de-preprocessing to obtain a pre-training model V1; performing reinforcement learning treatment on the pre-training model V1 to obtain a pre-training model V2;
and carrying out fine adjustment treatment on the pre-training model V2, quantitatively selecting molecules meeting the conditions in the fine adjustment treatment process to participate in the training of the pre-training model V2, obtaining a pre-training model V3 after the fine adjustment treatment, and carrying out new molecule generation of a single system through the pre-training model V3.
As a preferable scheme of a molecular generation method of a single system based on reinforcement learning, br atoms are used for replacing H atoms connected with C atoms in a smiles molecular expression in the molecular data set in the process of expanding the molecular data set in an atom replacement mode.
As a preferred scheme of the molecular generation method of the single system based on reinforcement learning, the step of pre-training the transducer model through the extended data set after the deduplication processing to obtain a pre-training model V1 includes:
encoding smiles molecular expressions in the molecular dataset as a matrix;
inputting the coding matrix into a transducer model to obtain molecular coding output;
calculating a loss value between the molecular code output and the correct smiles molecular expression by using the cross entropy loss; and updating the parameters of the transducer model by adopting back propagation;
when the transducer model tends to stabilize after a plurality of rounds of training loss values, the current transducer model is saved as a pre-training model V1.
As a preferred scheme of a molecular generation method of a single system based on reinforcement learning, the step of performing reinforcement learning treatment on the pre-training model V1 to obtain a pre-training model V2 comprises the following steps:
generating smiles expression of the molecules of the current batch by utilizing the pre-training model V1;
evaluating and scoring the smiles expression of the current batch according to the set scoring standard;
training the weight of the pre-training model V1 by taking the evaluation score as the reward of the pre-training model V1;
after a plurality of rounds of iterative training, the pretraining model V1 of the last round is saved as a pretraining model V2.
As a preferred mode of a molecular generation method of a single system based on reinforcement learning, a scoring standard is setscoreThe method comprises the following steps:
wherein similarity represents similarity of smiles expression of the generated molecule to the molecule in a single system; score values are similarity when smiles are valid and score values are 0 when invalid;
calculating a loss value between a molecular encoding output and a correct smiles molecular expression using cross entropy lossThe formula of (2) is: />
As a preferred embodiment of the molecular generation method based on a single system of reinforcement learning, the step of performing fine tuning processing on the pre-training model V2 includes:
the parameters of the pre-training model V2 are respectively assigned to an Agent model and a priority model, so that the Agent model participates in training, the parameters of the pre-training model V2 are updated, and the gradient freezing of the priority model does not participate in parameter updating;
generating smiles expression of the molecules by using the Agent model, screening smiles expression meeting the set condition, and stopping generating when the number meets the set threshold; generating smiles expressions with the same quantity through a primary model;
summarizing all generated smiles expressions, and then inputting AgThe Agent model and the primary model respectively obtain the output of the Agent modelAnd ∈Prior model->And uses the output of Agent model +.>And output of Prior model->Constructing a loss function;
and taking an average value of the loss values, updating parameters of the pre-training model V2 by adopting back propagation, and storing the current model as a pre-training model V3 when the pre-training model V2 tends to be stable after the loss values are trained.
As a preferred embodiment of the molecular generation method of a single system based on reinforcement learning, the output of an Agent model is usedAnd output of Prior model->Constructed loss function->The formula of (2) is:
in the method, in the process of the invention,a loss value of a smiles expression calculated for the Agent model; />The loss value of the smiles expression calculated for the Prior model.
The invention also provides a molecular generating device of a single system based on reinforcement learning, which comprises:
the original data acquisition module is used for collecting the molecular expressions from the public database and carrying out de-duplication processing on the collected molecular expressions to obtain a molecular data set;
the data expansion module is used for expanding the molecular data set in an atomic replacement mode to obtain an expanded data set, and performing deduplication processing on the expanded data set;
the first model training module is used for pre-training the transducer model through the expanded data set subjected to the reprocessing to obtain a pre-training model V1;
the second model training module is used for performing reinforcement learning treatment on the pre-training model V1 to obtain a pre-training model V2;
the third model training module is used for carrying out fine adjustment processing on the pre-training model V2, and quantitatively selecting molecules meeting the conditions to participate in the training of the pre-training model V2 in the fine adjustment processing process to obtain a pre-training model V3 after the fine adjustment processing;
and the molecule generation module is used for generating new molecules of a single system through the pre-training model V3.
As a preferred scheme of the molecular generating device based on a single system of reinforcement learning, in the raw data acquisition module, br atoms are used for replacing H atoms connected with C atoms in smiles molecular expressions in the molecular data set;
the first model training module includes:
the encoding processing submodule is used for encoding smiles molecular expressions in the molecular dataset into a matrix;
the coding output sub-module is used for inputting the coding matrix into a transducer model and obtaining molecular coding output;
a loss value calculation sub-module for calculating a loss value between the molecular code output and the correct smiles molecular expression by using cross entropy loss; and updating the parameters of the transducer model by adopting back propagation;
the first model storage submodule is used for storing the current transducer model as a pre-training model V1 when the transducer model tends to be stable after a plurality of rounds of training loss values.
As a preferred embodiment of the molecular generating device based on the reinforcement learning single system, the second model training module includes:
an expression generation sub-module for generating smiles expressions of molecules of the current batch by using the pre-training model V1;
the expression scoring submodule is used for evaluating and scoring the generated smiles expressions of the current batch according to the set scoring standard;
the reward training sub-module is used for training the weight of the pre-training model V1 by taking the evaluation score as the reward of the pre-training model V1;
the second model storage submodule is used for storing the pretraining model V1 of the last round as a pretraining model V2 after performing a plurality of rounds of iterative training;
in the expression scoring module, the set scoring standardscoreThe method comprises the following steps:
wherein similarity represents similarity of smiles expression of the generated molecule to the molecule in a single system; score values are similarity when smiles are valid and score values are 0 when invalid;
calculating a loss value between a molecular encoding output and a correct smiles molecular expression using cross entropy lossThe formula of (2) is: />
As a preferred embodiment of the molecular generating device based on the reinforcement learning single system, the third model training module includes:
the model parametrization sub-module is used for respectively assigning the parameters of the pre-training model V2 to an Agent model and a priority model, so that the Agent model participates in training, the parameters of the pre-training model V2 are updated, and the gradient freezing of the priority model does not participate in parameter updating;
the intermediate generation sub-module is used for generating smiles expressions of molecules by using the Agent model, screening smiles expressions meeting set conditions, and stopping generation when the number meets a set threshold; generating smiles expressions with the same quantity through a primary model;
the loss construction submodule is used for summarizing all the generated smiles expressions, inputting the agents model and the principles model, and respectively obtaining the output of the agents modelAnd ∈Prior model->And uses the output of Agent model +.>And output of Prior model->Constructing a loss function;
the parameter updating sub-module is used for taking an average value of the loss values and adopting back propagation to update the parameters of the pre-training model V2;
the third model storage submodule is used for storing the current model as a pre-training model V3 when the pre-training model V2 tends to be stable after the training loss value;
in the loss construction submodule, the output of an Agent model is utilizedAnd output of the Prior modelConstructed loss function->The formula of (2) is:
in the method, in the process of the invention,a loss value of a smiles expression calculated for the Agent model; />The loss value of the smiles expression calculated for the Prior model.
The invention has the following advantages: collecting molecular expressions from a public database, and performing deduplication processing on the collected molecular expressions to obtain a molecular data set; expanding the molecular data set in an atomic substitution mode to obtain an expanded data set, and performing deduplication processing on the expanded data set; pre-training the transducer model through the expanded data set subjected to the de-preprocessing to obtain a pre-training model V1; performing reinforcement learning treatment on the pre-training model V1 to obtain a pre-training model V2; and carrying out fine adjustment treatment on the pre-training model V2, quantitatively selecting molecules meeting the conditions in the fine adjustment treatment process to participate in the training of the pre-training model V2, obtaining a pre-training model V3 after the fine adjustment treatment, and carrying out new molecule generation of a single system through the pre-training model V3. The invention obviously improves the discovery efficiency of new molecules meeting the production requirement and greatly shortens the period of research and development of new molecules in laboratories in the chemical field.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those skilled in the art from this disclosure that the drawings described below are merely exemplary and that other embodiments may be derived from the drawings provided without undue effort.
FIG. 1 is a schematic flow chart of a molecular generation method of a single system based on reinforcement learning provided in an embodiment of the present invention;
FIG. 2 is a schematic diagram of a reinforcement learning flow in a molecular generation method of a reinforcement learning-based single system according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a fine tuning flow in a molecular generation method of a single system based on reinforcement learning according to an embodiment of the present invention;
FIG. 4 is a diagram of a molecular generating device architecture for a reinforcement learning-based single system provided in an embodiment of the present invention.
Detailed Description
Other advantages and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
Referring to fig. 1, 2 and 3, an embodiment of the present invention provides a method for generating molecules of a single system based on reinforcement learning, including the steps of:
s1, collecting a molecular expression from a public database, and performing de-duplication treatment on the collected molecular expression to obtain a molecular data set;
s2, expanding the molecular data set in an atomic replacement mode to obtain an expanded data set, and performing deduplication processing on the expanded data set;
s3, pre-training the transducer model through the extended data set subjected to the de-duplication treatment to obtain a pre-training model V1;
s4, performing reinforcement learning treatment on the pre-training model V1 to obtain a pre-training model V2;
s5, carrying out fine adjustment treatment on the pre-training model V2, and quantitatively selecting molecules meeting the conditions to participate in training of the pre-training model V2 in the fine adjustment treatment process to obtain a pre-training model V3 after the fine adjustment treatment;
s6, generating new molecules of a single system through the pre-training model V3.
In this embodiment, in step S1, molecular smiles expressions are collected from a published QM9 dataset (including the composition of 13 ten thousand organic molecules, spatial information and their corresponding attributes, which are widely used in experiments and comparisons of various data-driven molecular attribute prediction methods), and deduplication is performed to obtain ten-million smiles expressions. Next, in step S2, in the process of expanding the molecular dataset by means of atomic substitution, br atoms are used to replace H atoms connected to C atoms in smiles molecular expressions in the molecular dataset. Deduplication of the extended dataset yields a million non-duplicate molecular smiles expression, and follows 9: the ratio of 1 is divided into a training set and a verification set.
In this embodiment, in step S3, the step of pre-training the transducer model to obtain the pre-training model V1 through the extended data set after the deduplication process includes:
encoding smiles molecular expressions in the molecular dataset as a matrix;
inputting the coding matrix into a transducer model to obtain molecular coding output;
calculating a loss value between the molecular code output and the correct smiles molecular expression by using the cross entropy loss; and updating the parameters of the transducer model by adopting back propagation;
when the transducer model tends to stabilize after a plurality of rounds of training loss values, the current transducer model is saved as a pre-training model V1.
Specifically, a transducer model is adopted to perform a pre-training task, and parameters are set as follows: batch_size setting 256; the number of heads of the Multi-Head Attention is 8; the maximum sequence length is set to 140; the optimizer adopts self-adaptive Adam; dropout is set to 0.1; setting the parameter of the warm starting mode to be 500; the loss function employs a cross entropy loss function.
The transducer model performs a pre-training task using the setup described above, first encoding the smiles expression of the molecules into a matrix, then inputting the encoded matrix into the encoder portion of the transducer model, processing it with the multi-headed attention layer and obtaining the encoded representation. The encoded representation is input to a decoder section to predict molecular smiles, cross entropy loss is used to calculate predicted values for molecular smiles and loss values between correct molecular smiles, and back propagation is used to update the transducer model parameters. Finally, when the transducer model is trained for a plurality of times, the loss value tends to be stable, the current transducer model is saved and named as V1.
In this embodiment, in step S4, the step of performing reinforcement learning processing on the pre-training model V1 to obtain the pre-training model V2 includes:
generating smiles expression of the molecules of the current batch by utilizing the pre-training model V1;
evaluating and scoring the smiles expression of the current batch according to the set scoring standard;
training the weight of the pre-training model V1 by taking the evaluation score as the reward of the pre-training model V1;
after a plurality of rounds of iterative training, the pretraining model V1 of the last round is saved as a pretraining model V2.
Specifically, parameters of the pre-training model V1 are imported using a transducer model architecture, where the parameters of the transducer model are set as follows: the batch_size setting 100; the number of heads of the Multi-Head Attention is 8; the maximum sequence length is set to 140; the optimizer adopts self-adaptive Adam; dropout is set to 0.1; the parameters of the warm start mode are set to 500.
Generating a batch of smiles expressions of the molecules by using the pre-training model V1, and then scoring the batch of smiles expressions according to a scoring standard, wherein the scoring standard is setscoreThe method comprises the following steps:
wherein similarity represents similarity of smiles expression of the generated molecule to the molecule in a single system; score values are similarity when smiles are valid and score values are 0 when invalid; smiles are measured in terms of both molecular availability and molecular similarity. Model weights are then trained using the evaluation scores as rewards for the model, and the molecular code output and correct smiles molecular expression are calculated using cross entropy lossLoss value betweenThe formula of (2) is:
in this embodiment, in step S5, the parameters of the pretrained model V2 are imported by using the transducer model, wherein the parameters of the transducer are modified as follows: the epoch setting is 100, the batch_size is 50, and the other parameter settings are the same as those of the training process of the pre-training model V1.
The step of performing fine tuning processing on the pre-training model V2 includes:
the parameters of the pre-training model V2 are respectively assigned to an Agent model and a priority model, so that the Agent model participates in training, the parameters of the pre-training model V2 are updated, and the gradient freezing of the priority model does not participate in parameter updating;
generating smiles expression of the molecules by using the Agent model, screening smiles expression meeting the set condition according to the scoring standard formula, and stopping generating when the number meets the set threshold 30; generating smiles expressions with the same quantity through a primary model;
summarizing all generated smiles expressions, then inputting an Agent model and a priority model to obtain the output of the Agent model respectivelyAnd ∈Prior model->And uses the output of Agent model +.>And output of Prior model->Constructing a loss function;
and taking an average value of the loss values, updating parameters of the pre-training model V2 by adopting back propagation, and storing the current model as a pre-training model V3 when the pre-training model V2 tends to be stable after the loss values are trained.
Wherein the output of Agent model is utilizedAnd output of Prior model->Constructed loss function->The formula of (2) is:
in the method, in the process of the invention,a loss value of a smiles expression calculated for the Agent model; />The loss value of the smiles expression calculated for the Prior model.
Finally, generating new molecules of a single system by utilizing the pre-training model V3; and delivering the generated molecules of the single system to an experimenter for experiment screening out molecules meeting the requirements.
In summary, the invention collects the molecular expression from the public database, and performs the de-duplication processing on the collected molecular expression to obtain the molecular data set; expanding the molecular data set in an atomic substitution mode to obtain an expanded data set, and performing deduplication processing on the expanded data set; the step of pre-training the transducer model through the expanded data set after the de-preprocessing to obtain a pre-training model V1 includes: encoding smiles molecular expressions in the molecular dataset as a matrix; inputting the coding matrix into a transducer model to obtain molecular coding output; calculating a loss value between the molecular code output and the correct smiles molecular expression by using the cross entropy loss; and updating the parameters of the transducer model by adopting back propagation; when the transducer model is trained for a plurality of roundsWhen the failure value tends to be stable, the current transducer model is saved as a pre-training model V1. The step of performing reinforcement learning processing on the pre-training model V1 to obtain a pre-training model V2 includes: generating smiles expression of the molecules of the current batch by utilizing the pre-training model V1; evaluating and scoring the smiles expression of the current batch according to the set scoring standard; training the weight of the pre-training model V1 by taking the evaluation score as the reward of the pre-training model V1; after a plurality of rounds of iterative training, the pretraining model V1 of the last round is saved as a pretraining model V2. The step of performing fine tuning processing on the pre-training model V2 includes: the parameters of the pre-training model V2 are respectively assigned to an Agent model and a priority model, so that the Agent model participates in training, the parameters of the pre-training model V2 are updated, and the gradient freezing of the priority model does not participate in parameter updating; generating smiles expression of the molecules by using the Agent model, screening smiles expression meeting the set condition, and stopping generating when the number meets the set threshold; generating smiles expressions with the same quantity through a primary model; summarizing all generated smiles expressions, then inputting an Agent model and a priority model to obtain the output of the Agent model respectivelyAnd ∈Prior model->And uses the output of Agent model +.>And output of Prior model->Constructing a loss function; and taking an average value of the loss values, updating parameters of the pre-training model V2 by adopting back propagation, and storing the current model as a pre-training model V3 when the pre-training model V2 tends to be stable after the loss values are trained. The invention obviously improves the discovery efficiency of new molecules meeting the production requirement and greatly shortens the period of research and development of new molecules in laboratories in the chemical field.
It should be noted that the method of the embodiments of the present disclosure may be performed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene, and is completed by mutually matching a plurality of devices. In the case of such a distributed scenario, one of the devices may perform only one or more steps of the methods of embodiments of the present disclosure, the devices interacting with each other to accomplish the methods.
It should be noted that the foregoing describes some embodiments of the present disclosure. In some cases, the acts or steps recited in the present disclosure may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
Example 2
Referring to fig. 4, embodiment 2 of the present invention further provides a molecular generating device of a single system based on reinforcement learning, including:
the original data acquisition module 001 is used for collecting the molecular expressions from the public database, and performing de-duplication processing on the collected molecular expressions to obtain a molecular data set;
the data expansion module 002 is configured to expand the molecular data set by means of atomic replacement to obtain an expanded data set, and perform deduplication processing on the expanded data set;
the first model training module 003 is configured to pretrain the transducer model through the extended data set after the preprocessing, so as to obtain a pretrained model V1;
the second model training module 004 is used for performing reinforcement learning processing on the pre-training model V1 to obtain a pre-training model V2;
the third model training module 005 is configured to perform fine tuning on the pre-training model V2, and quantitatively select molecules meeting the conditions to participate in training of the pre-training model V2 during the fine tuning process, so as to obtain a pre-training model V3 after the fine tuning process;
a molecular generation module 006 for generating new molecules of a single system by the pre-training model V3.
In this embodiment, in the raw data obtaining module 001, br atoms are used to replace H atoms connected to C atoms in smiles molecular expressions in the molecular dataset;
the first model training module 003 includes:
a coding processing sub-module 301, configured to code smiles molecular expressions in the molecular dataset into a matrix;
the code output sub-module 302 is configured to input the code matrix into a transducer model and obtain a molecular code output;
a loss value calculation sub-module 303 for calculating a loss value between the molecular code output and the correct smiles molecular expression using cross entropy loss; and updating the parameters of the transducer model by adopting back propagation;
the first model saving submodule 304 is configured to save the current transducer model as the pre-training model V1 when the transducer model tends to be stable after several rounds of training.
In this embodiment, the second model training module 004 includes:
an expression generation sub-module 401 for generating smiles expressions of the molecules of the current batch using the pre-training model V1;
an expression scoring submodule 402, configured to evaluate and score the generated smiles expression of the current batch according to a set scoring standard;
a reward training sub-module 403, configured to train the weight of the pre-training model V1 by using the evaluation score as a reward of the pre-training model V1;
a second model saving sub-module 404, configured to save the pre-training model V1 of the last round as a pre-training model V2 after performing a plurality of rounds of iterative training;
in the expression scoring submodule 402, a scoring standard is setscoreThe method comprises the following steps:
wherein similarity represents similarity of smiles expression of the generated molecule to the molecule in a single system; score values are similarity when smiles are valid and score values are 0 when invalid;
calculating a loss value between a molecular encoding output and a correct smiles molecular expression using cross entropy lossThe formula of (2) is: />
In this embodiment, the third model training module 005 includes:
a model assigning sub-module 501, configured to assign parameters of the pre-training model V2 to an Agent model and a priority model, respectively, so that the Agent model participates in training, update parameters of the pre-training model V2, and enable gradient freezing of the priority model not to participate in parameter update;
an intermediate generation sub-module 502, configured to generate smiles expressions of molecules using an Agent model, screen smiles expressions that satisfy a set condition, and stop generation when the number satisfies a set threshold; generating smiles expressions with the same quantity through a primary model;
a loss construction sub-module 503 for summarizing all generated smiles expressions, and then inputting the Agent model and the priority model to obtain the output of the Agent model respectivelyAnd ∈Prior model->And uses the output of Agent model +.>And output of Prior model->Constructing a loss function;
a parameter updating sub-module 504, configured to average the loss value and update the parameter of the pre-training model V2 by using back propagation;
a third model saving sub-module 505, configured to save the current model as a pre-training model V3 when the pre-training model V2 tends to be stable after the training loss value;
in the loss construction sub-module 503, the output of the Agent model is usedAnd output of the Prior modelConstructed loss function->The formula of (2) is:
in the method, in the process of the invention,a loss value of a smiles expression calculated for the Agent model; />The loss value of the smiles expression calculated for the Prior model.
It should be noted that, because the content of information interaction and execution process between the modules of the above-mentioned apparatus is based on the same concept as the method embodiment in embodiment 1 of the present application, the technical effects brought by the content are the same as the method embodiment of the present application, and specific content can be referred to the description in the foregoing illustrated method embodiment of the present application, which is not repeated herein.
Example 3
Embodiment 3 of the present invention provides a non-transitory computer-readable storage medium having stored therein program code of a reinforcement learning-based single-system molecule generation method, the program code including instructions for performing the reinforcement learning-based single-system molecule generation method of embodiment 1 or any possible implementation thereof.
Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk, SSD), etc.
Example 4
Embodiment 4 of the present invention provides an electronic device, including: a memory and a processor;
the processor and the memory complete communication with each other through a bus; the memory stores program instructions executable by the processor to invoke the program instructions capable of performing the reinforcement learning based single-system molecular generation method of embodiment 1 or any possible implementation thereof.
Specifically, the processor may be implemented by hardware or software, and when implemented by hardware, the processor may be a logic circuit, an integrated circuit, or the like; when implemented in software, the processor may be a general-purpose processor, implemented by reading software code stored in a memory, which may be integrated in the processor, or may reside outside the processor, and which may reside separately.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.).
It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module for implementation. Thus, the present invention is not limited to any specific combination of hardware and software.
While the invention has been described in detail in the foregoing general description and specific examples, it will be apparent to those skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.

Claims (10)

1. A method for generating a single-system molecule based on reinforcement learning, comprising:
collecting molecular expressions from a public database, and performing deduplication processing on the collected molecular expressions to obtain a molecular data set;
expanding the molecular data set in an atomic substitution mode to obtain an expanded data set, and performing deduplication processing on the expanded data set;
pre-training the transducer model through the expanded data set subjected to the de-preprocessing to obtain a pre-training model V1; performing reinforcement learning treatment on the pre-training model V1 to obtain a pre-training model V2;
and carrying out fine adjustment treatment on the pre-training model V2, quantitatively selecting molecules meeting the conditions in the fine adjustment treatment process to participate in the training of the pre-training model V2, obtaining a pre-training model V3 after the fine adjustment treatment, and carrying out new molecule generation of a single system through the pre-training model V3.
2. The reinforcement learning-based single-system molecular generation method according to claim 1, wherein Br atoms are used to replace H atoms connected to C atoms in smiles molecular expressions in the molecular dataset in the process of expanding the molecular dataset by means of atomic substitution.
3. The method for generating a single-system molecule based on reinforcement learning of claim 1, wherein the step of pre-training a transducer model by the extended data set after the de-duplication process to obtain a pre-training model V1 comprises:
encoding smiles molecular expressions in the molecular dataset as a matrix;
inputting the coding matrix into a transducer model to obtain molecular coding output;
calculating a loss value between the molecular code output and the correct smiles molecular expression by using the cross entropy loss; and updating the parameters of the transducer model by adopting back propagation;
when the transducer model tends to stabilize after a plurality of rounds of training loss values, the current transducer model is saved as a pre-training model V1.
4. The method for generating a single-system molecule based on reinforcement learning as claimed in claim 3, wherein the step of performing reinforcement learning processing on said pre-training model V1 to obtain a pre-training model V2 comprises:
generating smiles expression of the molecules of the current batch by utilizing the pre-training model V1;
evaluating and scoring the smiles expression of the current batch according to the set scoring standard;
training the weight of the pre-training model V1 by taking the evaluation score as the reward of the pre-training model V1;
after a plurality of rounds of iterative training, the pretraining model V1 of the last round is saved as a pretraining model V2.
5. The method for molecular generation of reinforcement learning-based single system according to claim 4, wherein the scoring criteria is setscoreThe method comprises the following steps:
wherein similarity represents similarity of smiles expression of the generated molecule to the molecule in a single system; score values are similarity when smiles are valid and score values are 0 when invalid;
calculating a loss value between a molecular encoding output and a correct smiles molecular expression using cross entropy lossThe formula of (2) is: />
6. The reinforcement learning-based single-system molecular generation method according to claim 4, wherein the step of performing fine tuning processing on the pre-training model V2 comprises:
the parameters of the pre-training model V2 are respectively assigned to an Agent model and a priority model, so that the Agent model participates in training, the parameters of the pre-training model V2 are updated, and the gradient freezing of the priority model does not participate in parameter updating;
generating smiles expression of the molecules by using the Agent model, screening smiles expression meeting the set condition, and stopping generating when the number meets the set threshold; generating smiles expressions with the same quantity through a primary model;
summarizing all generated smiles expressions, then inputting an Agent model and a priority model to obtain the output of the Agent model respectivelyAnd ∈Prior model->And uses the output of Agent model +.>And output of Prior model->Constructing a loss function;
and taking an average value of the loss values, updating parameters of the pre-training model V2 by adopting back propagation, and storing the current model as a pre-training model V3 when the pre-training model V2 tends to be stable after the loss values are trained.
7. The method for generating a single-system molecule based on reinforcement learning of claim 6, wherein an output of an Agent model is usedAnd output of Prior model->Constructed loss function->The formula of (2) is:
in the method, in the process of the invention,a loss value of a smiles expression calculated for the Agent model; />The loss value of the smiles expression calculated for the Prior model.
8. A single-system molecular generation device based on reinforcement learning, comprising:
the original data acquisition module is used for collecting the molecular expressions from the public database and carrying out de-duplication processing on the collected molecular expressions to obtain a molecular data set;
the data expansion module is used for expanding the molecular data set in an atomic replacement mode to obtain an expanded data set, and performing deduplication processing on the expanded data set;
the first model training module is used for pre-training the transducer model through the expanded data set subjected to the reprocessing to obtain a pre-training model V1;
the second model training module is used for performing reinforcement learning treatment on the pre-training model V1 to obtain a pre-training model V2;
the third model training module is used for carrying out fine adjustment processing on the pre-training model V2, and quantitatively selecting molecules meeting the conditions to participate in the training of the pre-training model V2 in the fine adjustment processing process to obtain a pre-training model V3 after the fine adjustment processing;
and the molecule generation module is used for generating new molecules of a single system through the pre-training model V3.
9. The reinforcement learning based single system molecular generation device of claim 8, wherein Br atoms are used in the raw data acquisition module to replace H atoms connected to C atoms in smiles molecular expressions in the molecular dataset;
the first model training module includes:
the encoding processing submodule is used for encoding smiles molecular expressions in the molecular dataset into a matrix;
the coding output sub-module is used for inputting the coding matrix into a transducer model and obtaining molecular coding output;
a loss value calculation sub-module for calculating a loss value between the molecular code output and the correct smiles molecular expression by using cross entropy loss; and updating the parameters of the transducer model by adopting back propagation;
the first model storage submodule is used for storing the current transducer model as a pre-training model V1 when the transducer model tends to be stable after a plurality of rounds of training loss values;
the second model training module includes:
an expression generation sub-module for generating smiles expressions of molecules of the current batch by using the pre-training model V1;
the expression scoring submodule is used for evaluating and scoring the generated smiles expressions of the current batch according to the set scoring standard;
the reward training sub-module is used for training the weight of the pre-training model V1 by taking the evaluation score as the reward of the pre-training model V1;
the second model storage submodule is used for storing the pretraining model V1 of the last round as a pretraining model V2 after performing a plurality of rounds of iterative training;
in the expression scoring module, the set scoring standardscoreThe method comprises the following steps:
wherein similarity represents similarity of smiles expression of the generated molecule to the molecule in a single system; score values are similarity when smiles are valid and score values are 0 when invalid;
calculating a loss value between a molecular encoding output and a correct smiles molecular expression using cross entropy lossThe formula of (2) is: />
10. The reinforcement learning based single-system molecular generation device of claim 9, wherein the third model training module comprises:
the model parametrization sub-module is used for respectively assigning the parameters of the pre-training model V2 to an Agent model and a priority model, so that the Agent model participates in training, the parameters of the pre-training model V2 are updated, and the gradient freezing of the priority model does not participate in parameter updating;
the intermediate generation sub-module is used for generating smiles expressions of molecules by using the Agent model, screening smiles expressions meeting set conditions, and stopping generation when the number meets a set threshold; generating smiles expressions with the same quantity through a primary model;
the loss construction submodule is used for summarizing all the generated smiles expressions, inputting the agents model and the principles model, and respectively obtaining the output of the agents modelAnd ∈Prior model->And uses the output of Agent model +.>And output of Prior model->Constructing a loss function;
the parameter updating sub-module is used for taking an average value of the loss values and adopting back propagation to update the parameters of the pre-training model V2;
the third model storage submodule is used for storing the current model as a pre-training model V3 when the pre-training model V2 tends to be stable after the training loss value;
in the loss construction submodule, the output of an Agent model is utilizedAnd output of the Prior modelConstructed loss function->The formula of (2) is:
in the method, in the process of the invention,a loss value of a smiles expression calculated for the Agent model; />The loss value of the smiles expression calculated for the Prior model.
CN202410077808.3A 2024-01-19 2024-01-19 Method and device for generating molecules of single system based on reinforcement learning Active CN117594157B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410077808.3A CN117594157B (en) 2024-01-19 2024-01-19 Method and device for generating molecules of single system based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410077808.3A CN117594157B (en) 2024-01-19 2024-01-19 Method and device for generating molecules of single system based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN117594157A true CN117594157A (en) 2024-02-23
CN117594157B CN117594157B (en) 2024-04-09

Family

ID=89920519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410077808.3A Active CN117594157B (en) 2024-01-19 2024-01-19 Method and device for generating molecules of single system based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN117594157B (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111508568A (en) * 2020-04-20 2020-08-07 腾讯科技(深圳)有限公司 Molecule generation method and device, computer readable storage medium and terminal equipment
CN113707233A (en) * 2021-07-16 2021-11-26 内蒙合成化工研究所 Energetic compound molecular structure generation method based on deep reinforcement learning
US11263534B1 (en) * 2020-12-16 2022-03-01 Ro5 Inc. System and method for molecular reconstruction and probability distributions using a 3D variational-conditioned generative adversarial network
WO2022047677A1 (en) * 2020-09-02 2022-03-10 深圳晶泰科技有限公司 Drug molecule screening method and system
CN114171125A (en) * 2021-12-02 2022-03-11 中山大学 Protein degradation targeting chimera conjugate generation method based on deep reinforcement learning
CN114187978A (en) * 2021-11-24 2022-03-15 中山大学 Compound optimization method based on deep learning connection fragment
CN114627981A (en) * 2020-12-14 2022-06-14 阿里巴巴集团控股有限公司 Method and apparatus for generating molecular structure of compound, and nonvolatile storage medium
US20220351808A1 (en) * 2021-04-29 2022-11-03 Uchicago Argonne, Llc Systems and methods for reinforcement learning molecular modeling
CN115565622A (en) * 2022-09-06 2023-01-03 中国海洋大学 Marine compound molecule generation method based on deep learning and chemical reaction rules
CN115831261A (en) * 2022-11-14 2023-03-21 浙江大学杭州国际科创中心 Three-dimensional space molecule generation method and device based on multi-task pre-training inverse reinforcement learning
EP4152336A1 (en) * 2021-09-17 2023-03-22 TotalEnergies OneTech Method and computing system for molecular design via multi-task reinforcement learning
US20230290114A1 (en) * 2020-12-16 2023-09-14 Ro5 Inc. System and method for pharmacophore-conditioned generation of molecules
CN117153294A (en) * 2023-10-31 2023-12-01 烟台国工智能科技有限公司 Molecular generation method of single system
WO2023246834A1 (en) * 2022-06-24 2023-12-28 King Abdullah University Of Science And Technology Reinforcement learning (rl) for protein design
CN117334271A (en) * 2023-09-25 2024-01-02 江苏运动健康研究院 Method for generating molecules based on specified attributes
US20240021275A1 (en) * 2020-11-13 2024-01-18 Osmo Labs, Pbc Machine-learned models for sensory property prediction

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111508568A (en) * 2020-04-20 2020-08-07 腾讯科技(深圳)有限公司 Molecule generation method and device, computer readable storage medium and terminal equipment
WO2022047677A1 (en) * 2020-09-02 2022-03-10 深圳晶泰科技有限公司 Drug molecule screening method and system
US20240021275A1 (en) * 2020-11-13 2024-01-18 Osmo Labs, Pbc Machine-learned models for sensory property prediction
CN114627981A (en) * 2020-12-14 2022-06-14 阿里巴巴集团控股有限公司 Method and apparatus for generating molecular structure of compound, and nonvolatile storage medium
US11263534B1 (en) * 2020-12-16 2022-03-01 Ro5 Inc. System and method for molecular reconstruction and probability distributions using a 3D variational-conditioned generative adversarial network
US20230290114A1 (en) * 2020-12-16 2023-09-14 Ro5 Inc. System and method for pharmacophore-conditioned generation of molecules
US20220351808A1 (en) * 2021-04-29 2022-11-03 Uchicago Argonne, Llc Systems and methods for reinforcement learning molecular modeling
CN113707233A (en) * 2021-07-16 2021-11-26 内蒙合成化工研究所 Energetic compound molecular structure generation method based on deep reinforcement learning
EP4152336A1 (en) * 2021-09-17 2023-03-22 TotalEnergies OneTech Method and computing system for molecular design via multi-task reinforcement learning
CN114187978A (en) * 2021-11-24 2022-03-15 中山大学 Compound optimization method based on deep learning connection fragment
CN114171125A (en) * 2021-12-02 2022-03-11 中山大学 Protein degradation targeting chimera conjugate generation method based on deep reinforcement learning
WO2023246834A1 (en) * 2022-06-24 2023-12-28 King Abdullah University Of Science And Technology Reinforcement learning (rl) for protein design
CN115565622A (en) * 2022-09-06 2023-01-03 中国海洋大学 Marine compound molecule generation method based on deep learning and chemical reaction rules
CN115831261A (en) * 2022-11-14 2023-03-21 浙江大学杭州国际科创中心 Three-dimensional space molecule generation method and device based on multi-task pre-training inverse reinforcement learning
CN117334271A (en) * 2023-09-25 2024-01-02 江苏运动健康研究院 Method for generating molecules based on specified attributes
CN117153294A (en) * 2023-10-31 2023-12-01 烟台国工智能科技有限公司 Molecular generation method of single system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JIAYI FAN 等: "Validity Improvement in MolGAN-Based Molecular Generation", IEEE ACCESS, 2 June 2023 (2023-06-02), pages 58359 - 58366 *
XUHAN LIU 等: "DrugEx v3: scaffold-constrained drug design with graph transformer-based reinforcement learning", JOURNAL OF CHEMINFORMATICS, 20 February 2023 (2023-02-20), pages 1 - 14 *
成凯阳 等: "基于分子生成模型的SOS1抑制剂衍生物设计", 计算机时代, no. 11, 30 November 2023 (2023-11-30), pages 94 - 99 *

Also Published As

Publication number Publication date
CN117594157B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
US11544474B2 (en) Generation of text from structured data
CN111105029B (en) Neural network generation method, generation device and electronic equipment
CN113535984A (en) Attention mechanism-based knowledge graph relation prediction method and device
US11544542B2 (en) Computing device and method
CN113312505B (en) Cross-modal retrieval method and system based on discrete online hash learning
CN116959613B (en) Compound inverse synthesis method and device based on quantum mechanical descriptor information
US20220392585A1 (en) Method for training compound property prediction model, device and storage medium
WO2024067373A1 (en) Data processing method and related apparatus
Wiegrebe et al. Deep learning for survival analysis: a review
CN110009048B (en) Method and equipment for constructing neural network model
CN115982480A (en) Sequence recommendation method and system based on cooperative attention network and comparative learning
CN117153294A (en) Molecular generation method of single system
CN114613450A (en) Method and device for predicting property of drug molecule, storage medium and computer equipment
Zhao et al. KuaiSim: A comprehensive simulator for recommender systems
Huai et al. Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization
CN114463596A (en) Small sample image identification method, device and equipment of hypergraph neural network
CN117594157B (en) Method and device for generating molecules of single system based on reinforcement learning
CN117193772A (en) Deep learning code-free application layout optimization method and system based on user feedback
CN115881209B (en) RNA secondary structure prediction processing method and device
CN115080587B (en) Electronic component replacement method, device and medium based on knowledge graph
CN113779994B (en) Element extraction method, element extraction device, computer equipment and storage medium
Ma et al. CRBP-HFEF: prediction of RBP-Binding sites on circRNAs based on hierarchical feature expansion and fusion
CN114595641A (en) Method and system for solving combined optimization problem
CN111259176B (en) Cross-modal Hash retrieval method based on matrix decomposition and integrated with supervision information
Ouyang et al. Grarep++: flexible learning graph representations with weighted global structural information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant