CN117877608B

CN117877608B - Monte Carlo tree search inverse synthesis planning method and device based on experience network

Info

Publication number: CN117877608B
Application number: CN202410281805.1A
Authority: CN
Inventors: 李中伟; 肖瑞; 祝艺玮; 柳彦宏
Original assignee: Yantai Guogong Intelligent Technology Co ltd
Current assignee: Yantai Guogong Intelligent Technology Co ltd
Priority date: 2024-03-13
Filing date: 2024-03-13
Publication date: 2024-05-28
Anticipated expiration: 2044-03-13
Also published as: CN117877608A

Abstract

The method preprocesses collected empirical data of an empirical network RVN and an empirical network QVN, trains the empirical network RVN by using the collected empirical data of the empirical network RVN, trains an empirical network QVN by using the collected empirical data of the empirical network QVN; the response on a successful route in the experience data of the experience network RVN and the experience network QVN is used for forming an optimized dataset, and the single-step inverse synthetic model trained by the single-step dataset is trained and optimized by adopting the optimized dataset; inputting the target molecule into a Monte Carlo tree search algorithm, replacing a first set constant value by using a trained empirical network RVN, replacing a second set constant value by using a trained empirical network QVN, and outputting a synthetic route by using an optimized single-step inverse synthetic model. The invention has high success rate and efficiency, better interpretability and strong application capability.

Description

Monte Carlo tree search inverse synthesis planning method and device based on experience network

Technical Field

The invention relates to a Monte Carlo tree search inverse synthesis planning method and device based on an empirical network, and belongs to the technical field of inverse synthesis analysis of organic reaction products.

Background

At present, the aim of the reverse synthesis of organic reaction products is to design a reaction route which is formed by a group of molecules which can be purchased in reality and finally generates target molecules through a plurality of chemical reactions. The organic reaction product reverse synthesis mainly comprises a single-step reverse synthesis model and a multi-step search algorithm, wherein the single-step reverse synthesis model is used for predicting reactants capable of synthesizing a certain product in one reaction, and the multi-step search algorithm is used for quickly finding a feasible reaction route from a large number of reactions predicted by the single-step reverse synthesis model.

At present, a large amount of reactions predicted by a single-step inverse synthetic model create a huge search space, and how to quickly find a feasible reaction route is a very challenging problem. With the development of deep learning, multi-step search algorithms based on deep neural networks have been successful, such as retro, EG-MCTS, etc. These methods utilize deep neural networks to learn limited chemical knowledge and empirical data to guide the search of multi-step routes. However, the search efficiency and success rate of these methods remain to be improved, and have a disadvantage of poor interpretability.

Disclosure of Invention

Therefore, the invention provides a Monte Carlo tree search inverse synthesis planning method and device based on an empirical network, so as to improve the success rate and search efficiency of a multi-step search algorithm and solve the technical problem of poor interpretability of the existing scheme.

In order to achieve the above object, the present invention provides the following technical solutions: the Monte Carlo tree search inverse synthesis planning method based on the experience network comprises the following steps:

Constructing a single-step inverse synthesis model, the single-step inverse synthesis model being targeted to input a specified molecule, predicting reactants for synthesizing the specified molecule by a one-step reaction, and training the single-step inverse synthesis model using the preprocessed single-step data set;

Constructing an empirical network RVN and an empirical network QVN by which whether a given reaction produces a successfully planned route is predicted; predicting, via the empirical network QVN, a reaction route length for synthesizing a reactant of a given reaction;

Constructing a Monte Carlo tree search algorithm, wherein the Monte Carlo tree search algorithm starts from a root node, selects the reaction nodes predicted by the single-step inverse synthesis model downwards along a tree structure according to the prediction scores of the empirical network RVN and the empirical network QVN until leaf nodes formed by reactants, expands the leaf nodes by using the single-step inverse synthesis model, then backtracks to the root node, and repeatedly executes the Monte Carlo tree search algorithm until the leaf nodes are all routes of commercial molecules or reach the limit of search times, and takes the obtained routes of the leaf nodes which are all routes of the commercial molecules as synthesis routes of target molecules;

Replacing the empirical network RVN predictor with a first set constant value and replacing the empirical network QVN predictor with a second set constant value; inputting target molecules in the preprocessed multi-step dataset into the Monte Carlo tree search algorithm in sequence to search a synthetic route of the target molecules, and respectively collecting experience data of the experience network RVN and the experience network QVN from a search tree;

Preprocessing the collected empirical data of the empirical network RVN and the empirical network QVN, training the empirical network RVN using the collected empirical data of the empirical network RVN, and training the empirical network QVN using the collected empirical data of the empirical network QVN;

using the response on the successful route in the experience data of the experience network RVN and the experience network QVN to form an optimized dataset, and adopting the optimized dataset to train and optimize the single-step inverse synthetic model after the single-step dataset is trained;

Inputting a target molecule into the Monte Carlo tree search algorithm, wherein the Monte Carlo tree search algorithm uses the trained empirical network RVN to replace the first set constant value, uses the trained empirical network QVN to replace the second set constant value, uses the optimized single-step inverse synthesis model, and outputs a synthesis route composed of a group of reactions.

As a preferred scheme of the Monte Carlo tree search inverse synthetic planning method based on an empirical network, the method further comprises the step of processing the single-step data set, wherein the single-step data set takes USPTO as an original data set and consists of chemical reactions expressed by SMILES;

Filtering the reaction of the single-step dataset which does not accord with the preset rule by using RDKit tool kit, performing atomic mapping on the filtered reaction by using RXNMapper tool kit, and randomly dividing the single-step dataset into a training set, a verification set and a test set according to a set proportion; and extracting a reverse reaction template from the single-step dataset by using RDChiral to obtain a reverse reaction template library, and matching the reverse reaction template with the reaction of the single-step dataset to obtain a training label.

As a preferred scheme of the Monte Carlo tree search inverse synthesis planning method based on an empirical network, the method further comprises the steps of processing the multi-step data set, wherein the multi-step data set takes USPTO as an original data set, and a multi-step reaction route is formed by chemical reactions expressed by a group of SMILES;

And filtering the reaction which does not accord with the preset rule by using RDKit tool kit, if the reaction route which does not accord with the preset rule exists in the reaction route, filtering the reaction route which does not accord with the preset rule, traversing all terminal reactant molecules in the reaction route, and if the reactant is not a commercially available molecule, filtering the corresponding reaction route, extracting target molecules at the terminal of the reaction route, and forming a target molecule library.

As a preferable scheme of the Monte Carlo tree search inverse synthesis planning method based on an empirical network, the single-step inverse synthesis modelThe expression formula of (2) is:

；

In the method, in the process of the invention, And/>Representing a nonlinear activation function,/>And/>Is a parameter matrix of the model,/>And/>Is a bias vector; /(I)Feature vectors transformed for product morgan fingerprints generated using RDKit toolkit.

As a preferred scheme of the Monte Carlo tree search inverse synthetic planning method based on an empirical network, the inputs of the empirical network RVN and the empirical network QVN are feature vectors of the Morgan fingerprint conversion of the reactionMorgan fingerprint transformed eigenvectors/>, using RDKit toolkit to generate products, respectivelyAnd the eigenvector of Morgan fingerprint conversion of the reactant {/>The Morgan fingerprint of the reaction is expressed as the product of Morgan fingerprint minus the reactant Morgan fingerprint, and the formula is:

；

In the method, in the process of the invention, Indicating the amount of reactant;

Feature vector for transforming Morgan fingerprint of reaction Inputting the experience network RVN, and obtaining a constant value/>, through two layers of MLP and Sigmoid activation functionsRepresenting the probability value that the reaction produced a successfully planned reaction route, the probability value/>, that the reaction produced a successfully planned reaction routeThe expression formula of (2) is:

；

In the method, in the process of the invention, Representing reaction nodes in the search process,/>And/>Is a parameter matrix of the model,/>And/>Is a paranoid vector;

Feature vector for transforming Morgan fingerprint of reaction Inputting the empirical network QVN, and obtaining a constant value/>, through two layers of MLP and Softplus activation functionsRepresents the length of the reaction route of the reactant synthesizing the reaction, the length of the reaction route of the reactant synthesizing the reaction/>The expression formula of (2) is:

；

In the method, in the process of the invention, And/>Is a parameter matrix of the model,/>And/>Is a paranoid vector.

As a preferred scheme of the Monte Carlo tree search inverse synthetic planning method based on an empirical network, the search result of the Monte Carlo tree search algorithm is represented by an AND-OR tree structure, the AND-OR tree comprises AND nodes AND OR nodes, the AND nodes represent reaction nodes, the OR nodes represent molecular nodes, AND the root nodesThe parent node of the AND node is the OR node of the corresponding reaction product, AND the child node is the OR node of the corresponding reaction reactant;

In the searching process of the Monte Carlo tree searching algorithm, the OR node comprises a molecule and represents the synthesizability of the molecule />, Which represents the cost of molecular synthesisThe AND node comprises a reaction template, a probability value/>, which indicates that the reaction produces a successfully planned reaction routeReaction route length indicating the synthesis of the reactants of the given reaction/>And predictive value/>, of a single-step inverse synthetic model；

The AND node marks success when all children of the AND node succeed, AND the OR node marks success when the children of the AND node exist successful nodes; when the molecule of the OR node is a commercial molecule, the OR node is marked as successful, and when reactants of all leaf nodes on one reaction route are commercial molecules, a successful reverse synthesis route is found;

If the molecules of the OR nodes cannot predict reasonable reaction through the single-step inverse synthesis model, marking the molecules corresponding to the OR nodes as failure; if there is a failure in the child nodes of the AND node, the AND node is marked as failed, AND if all the child nodes of the OR node fail, the OR node is marked as failed.

As a preferable scheme of the Monte Carlo tree search inverse synthetic planning method based on the empirical network, in the Monte Carlo tree search algorithm, starting from a root node, selecting an optimal reaction node from child nodes of the root node, wherein the selection of the reaction node is based on an empirical valueAnd/>Calculate the integrated value/>Select the integrated value/>Maximum reaction node:

；

In the method, in the process of the invention, For the selected reaction node,/>Representing molecular nodes/>Reaction nodes in the child node set,/>Is a super parameter for controlling the weight of the single-step inverse synthetic model in the selection stage,/>Representing the reaction node/>Is used for the number of accesses of (a),Representing molecular nodes/>Father node of/>Is a super parameter and represents punishment items of programming failure; /(I)Representing a reaction nodeIs used for the number of accesses.

As a preferable scheme of the Monte Carlo tree search inverse synthesis planning method based on the empirical network, in the Monte Carlo tree search algorithm, the single-step inverse synthesis model is used for predicting the selected moleculesPossible matches/>Templates, applicationsThe individual templates obtained the reactants, respectively will/>Constructing an AND node AND an OR node by the templates AND the reactant molecules, AND adding the AND nodes AND the OR node into an AND-OR tree;

calculating two empirical values of a new reaction node through the empirical network RVN, the empirical network QVN AndIf all the child nodes of the reaction node are commercially available molecules, let/>And/>According to the experience value/>And/>Calculating/>, of the reaction nodeValue, select/>Maximum reaction node/>Will react node/>/>And/>Value gives parent node/>Let/>，/>，/>Is a super parameter which represents the cost of the reaction of reactants; /(I)For successful planning of molecules/>The probability of the reaction route of (2); /(I)For synthesizing molecules/>Is a reaction path length of (2); /(I)For reaction/>Generating a probability value of successfully planning a reaction route; /(I)Reaction/>, designated for synthesisThe reaction path length of the reactants of (a) is determined.

As a preferable scheme of the Monte Carlo tree search inverse synthetic planning method based on the empirical network, in the Monte Carlo tree search algorithm, a reaction node is searched forAccording to the reaction node/>Updated/>, of child nodes of (a)And/>Obtain a new/>AndValue/>Update to all child nodes/>Product of/>Update to all child nodes/>And according to the number of accessesAveraging to obtain updated/>And/>; And the access times of the reaction nodes are/>Adding 1;

for molecular nodes According to child node/>The updated child node reselects the reaction node, and the experience value/> of the reaction node is used for selecting the reaction node、/>Update/>And/>Values according to calculation/>The formula of (a) calculates the/>, of all sub-reaction nodesSelectingReaction node with maximum value/>；

Starting from the selected molecular node, the update process is repeated up the tree structure to the root node whenWhen this search is performed, all leaf nodes on the route are commercially available molecules, and the corresponding inverse synthetic route is determined.

The invention also provides a Monte Carlo tree search inverse synthesis planning device based on the experience network, which adopts the Monte Carlo tree search inverse synthesis planning method based on the experience network, and comprises the following steps:

a single-step inverse synthesis model construction module for constructing a single-step inverse synthesis model, the single-step inverse synthesis model being targeted for inputting a specified molecule, predicting reactants for synthesizing the specified molecule by a one-step reaction, and training the single-step inverse synthesis model using the preprocessed single-step data set;

An empirical network construction module for constructing an empirical network RVN and an empirical network QVN by which whether a specified reaction generates a successfully planned route is predicted; predicting, via the empirical network QVN, a reaction route length for synthesizing a reactant of a given reaction;

The Monte Carlo tree search module is used for constructing a Monte Carlo tree search algorithm, the Monte Carlo tree search algorithm starts from a root node, the reaction nodes predicted by the single-step inverse synthesis model are downwards selected along a tree structure according to the prediction scores of the empirical network RVN and the empirical network QVN until the leaf nodes formed by reactants, the single-step inverse synthesis model is used for expanding the leaf nodes, backtracking is carried out to the root node, the Monte Carlo tree search algorithm is repeatedly executed until the leaf nodes are all routes of commercially available molecules or reach the limit of search times, and the obtained leaf nodes are all routes of the commercially available molecules and serve as synthesis routes of target molecules;

An empirical data collection module for replacing the empirical network RVN predictor with a first set constant value and replacing the empirical network QVN predictor with a second set constant value; inputting target molecules in the preprocessed multi-step dataset into the Monte Carlo tree search algorithm in sequence to search a synthetic route of the target molecules, and respectively collecting experience data of the experience network RVN and the experience network QVN from a search tree;

An empirical network training module for preprocessing the collected empirical data of the empirical network RVN and the empirical network QVN, training the empirical network RVN using the collected empirical data of the empirical network RVN, and training the empirical network QVN using the collected empirical data of the empirical network QVN;

The single-step inverse synthesis model training module is used for forming an optimized data set by using the response on a successful route in the experience data of the experience network RVN and the experience network QVN, and training and optimizing the single-step inverse synthesis model trained by the single-step data set by adopting the optimized data set;

The synthetic route generation module is configured to input a target molecule into the monte carlo tree search algorithm, wherein the monte carlo tree search algorithm uses the trained empirical network RVN to replace the first set constant value, uses the trained empirical network QVN to replace the second set constant value, uses the optimized single-step inverse synthetic model, and outputs a synthetic route composed of a set of reactions.

The invention has the following advantages: by constructing a single-step inverse synthesis model, the objective of which is to input a specified molecule, predict reactants that synthesize the specified molecule by a one-step reaction, and train the single-step inverse synthesis model using a pre-processed single-step dataset; constructing an empirical network RVN and an empirical network QVN by which whether a given reaction produces a successfully planned route is predicted; predicting, via the empirical network QVN, a reaction route length for synthesizing a reactant of a given reaction; constructing a Monte Carlo tree search algorithm, wherein the Monte Carlo tree search algorithm starts from a root node, selects the reaction nodes predicted by the single-step inverse synthesis model downwards along a tree structure according to the prediction scores of the empirical network RVN and the empirical network QVN until leaf nodes formed by reactants, expands the leaf nodes by using the single-step inverse synthesis model, then backtracks to the root node, and repeatedly executes the Monte Carlo tree search algorithm until the leaf nodes are all routes of commercial molecules or reach the limit of search times, and takes the obtained routes of the leaf nodes which are all routes of the commercial molecules as synthesis routes of target molecules; replacing the empirical network RVN predictor with a first set constant value and replacing the empirical network QVN predictor with a second set constant value; inputting target molecules in the preprocessed multi-step dataset into the Monte Carlo tree search algorithm in sequence to search a synthetic route of the target molecules, and respectively collecting experience data of the experience network RVN and the experience network QVN from a search tree; preprocessing the collected empirical data of the empirical network RVN and the empirical network QVN, training the empirical network RVN using the collected empirical data of the empirical network RVN, and training the empirical network QVN using the collected empirical data of the empirical network QVN; using the response on the successful route in the experience data of the experience network RVN and the experience network QVN to form an optimized dataset, and adopting the optimized dataset to train and optimize the single-step inverse synthetic model after the single-step dataset is trained; inputting a target molecule into the Monte Carlo tree search algorithm, wherein the Monte Carlo tree search algorithm uses the trained empirical network RVN to replace the first set constant value, uses the trained empirical network QVN to replace the second set constant value, uses the optimized single-step inverse synthesis model, and outputs a synthesis route composed of a group of reactions. The invention improves the success rate and efficiency of the inverse synthesis planning, surpasses the EG-MCTS of the optimal inverse synthesis planning algorithm at present, has better interpretability than the EG-MCTS, and optimizes the single-step inverse synthesis model through empirical data so as to learn the characteristics of real reaction and the experience preference and improve the application capability of the single-step inverse synthesis model in reality.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those skilled in the art from this disclosure that the drawings described below are merely exemplary and that other embodiments may be derived from the drawings provided without undue effort.

Fig. 1 is a schematic flow chart of a monte carlo tree search inverse synthesis planning method based on an empirical network provided in an embodiment of the present invention;

Fig. 2 is a schematic flow chart of a monte carlo tree search algorithm in the monte carlo tree search inverse synthesis planning method based on an empirical network provided in an embodiment of the present invention;

Fig. 3 is a framework diagram of a monte carlo tree search algorithm in a monte carlo tree search inverse synthesis planning method based on an empirical network provided in an embodiment of the present invention;

fig. 4 is a diagram of a monte carlo tree search inverse synthesis planning apparatus according to an embodiment of the present invention.

Detailed Description

Other advantages and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

Referring to fig. 1, 2 and 3, embodiment 1 of the present invention provides a method for inverse synthesis planning of a monte carlo tree search based on an empirical network, comprising the steps of:

T1, processing a single-step data set, wherein the single-step data set takes USPTO as an original data set and consists of chemical reactions expressed by SMILES; preprocessing the single-step data set, dividing a training set, a verification set and a test set, and extracting an inverse reaction template library for training a single-step inverse synthesis model;

T2, processing the multi-step data set, wherein the multi-step data set takes USPTO as an original data set, and a multi-step reaction route is formed by chemical reactions expressed by a group of SMILES; the multi-step data set consists of reaction routes, the multi-step data set is preprocessed, routes of which the terminal reactants are not purchasable molecules in reality are filtered, terminal molecules of the residual reaction routes are extracted to form a target molecule library, and the target molecule library is used for collecting experience data;

T3, constructing a single-step inverse synthesis model, wherein the single-step inverse synthesis model aims at inputting a specified molecule, predicting reactants for synthesizing the specified molecule through a one-step reaction, and training the single-step inverse synthesis model by using a single-step data set after pretreatment;

t4, constructing an empirical network RVN and an empirical network QVN, and predicting whether a specified reaction generates a successfully planned route through the empirical network RVN; predicting, via the empirical network QVN, a reaction route length for synthesizing a reactant of a given reaction;

T5, constructing a Monte Carlo tree search algorithm, wherein the Monte Carlo tree search algorithm starts from a root node, the reaction nodes predicted by the single-step inverse synthesis model are downwards selected along a tree structure according to the prediction scores of the empirical network RVN and the empirical network QVN until leaf nodes formed by reactants are expanded by using the single-step inverse synthesis model, then backtracking is carried out to the root node, and the Monte Carlo tree search algorithm is repeatedly executed until the leaf nodes are all routes of commercial molecules or reach the limit of search times, and the obtained leaf nodes are all routes of the commercial molecules and serve as synthesis routes of target molecules;

T6, replacing the empirical network RVN predicted value with a first set constant value, and replacing the empirical network QVN predicted value with a second set constant value; inputting target molecules in the preprocessed multi-step dataset into the Monte Carlo tree search algorithm in sequence to search a synthetic route of the target molecules, and respectively collecting experience data of the experience network RVN and the experience network QVN from a search tree;

T7, preprocessing the collected empirical data of the empirical network RVN and the empirical network QVN, training the empirical network RVN using the collected empirical data of the empirical network RVN, and training the empirical network QVN using the collected empirical data of the empirical network QVN;

T8, using the response on the successful route in the experience data of the experience network RVN and the experience network QVN to form an optimized data set, and training and optimizing the single-step inverse synthesis model trained by the single-step data set by adopting the optimized data set;

And T9, inputting target molecules into the Monte Carlo tree search algorithm, wherein the Monte Carlo tree search algorithm uses the trained empirical network RVN to replace the first set constant value, uses the trained empirical network QVN to replace the second set constant value, uses the optimized single-step inverse synthesis model, and outputs a synthesis route composed of a group of reactions.

In this embodiment, in step T1, specifically, a RDKit tool kit is used to filter the reaction of the single-step dataset that does not conform to the preset rule, and a RXNMapper tool kit is used to perform atomic mapping on the filtered reaction, and the single-step dataset is randomly divided into a training set, a verification set and a test set according to a set ratio of 6:2:2; and extracting a reverse reaction template from the single-step dataset by using RDChiral to obtain a reverse reaction template library, and matching the reverse reaction template with the reaction of the single-step dataset to obtain a training label.

In this embodiment, in step T2, specifically, a RDKit kit is used to filter the reaction of the multi-step dataset that does not conform to the preset rule, if a reaction route that does not conform to the preset rule exists in the reaction route, the reaction route that does not conform to the preset rule is filtered, all terminal reactant molecules in the reaction route are traversed, if the reactant is not a commercially available molecule, the corresponding reaction route is filtered, and the target molecule at the terminal of the reaction route is extracted to form the target molecule library.

In this embodiment, in step T3, the constructed single-step inverse synthesis model is composed of a two-layer MLP (multi-layer perceptron), and the prediction layer uses the Softmax activation function to implement the multi-classification task.

Wherein the Mogen fingerprint transformation dimension of the product generated using RDKit kit isFeature vector/>As the input of a single-step inverse synthesis model, a predictive vector/>, the length of which is the same as the number of templates, is obtained through two layers of MPL plus an activation function SoftmaxThe specific formula of the single-step inverse synthesis model is as follows:

；

Initializing parameters in the single-step inverse synthesis model, and inputting a training set into the model according to the batch size of 1024 to obtain a prediction resultCalculating cross entropy loss with a real tag y, setting a learning rate to be 0.001 according to a loss counter-propagation optimization model parameter, inputting a verification set into a model according to a batch size of 1024, calculating the verification set loss after a prediction result is obtained, iterating for a plurality of times until the verification set loss is continuously reduced for 5 times, stopping training, and finally inputting a test set into a single-step inverse synthesis model to verify the performance of the single-step inverse synthesis model.

In this embodiment, in step T4, the inputs of the empirical network RVN and the empirical network QVN are feature vectors of the Morgan fingerprint conversion of the reactionMorgan fingerprint transformed eigenvectors/>, using RDKit toolkit to generate products, respectivelyAnd the eigenvector of Morgan fingerprint conversion of the reactant {/>The Morgan fingerprint of the reaction is expressed as the product of Morgan fingerprint minus the reactant Morgan fingerprint, and the formula is:

；

In the method, in the process of the invention, Indicating the amount of reactant.

Specifically, the empirical network RVN consists of a two-layer MLP, and the prediction layer uses a Sigmoid activation function to realize a regression task; feature vector by transforming reacted Morgan fingerprintInputting the experience network RVN, and obtaining a constant value/>, through two layers of MLP and Sigmoid activation functionsRepresenting the probability value that the reaction produced a successfully planned reaction route, the probability value/>, that the reaction produced a successfully planned reaction routeThe expression formula of (2) is:

；

In the method, in the process of the invention, Representing reaction nodes in the search process,/>And/>Is a parameter matrix of the model,/>And/>Is a paranoid vector.

Specifically, the empirical network QVN is composed of a two-layer MLP, and the predictive layer uses Softplus activation functions to implement regression tasks; feature vector by transforming reacted Morgan fingerprintInputting the empirical network QVN, and obtaining a constant value/>, through two layers of MLP and Softplus activation functionsRepresents the length of the reaction route of the reactant synthesizing the reaction, the length of the reaction route of the reactant synthesizing the reaction/>The expression formula of (2) is:

；

In this embodiment, in step T5, constructing an empirical network based Monte Carlo tree search algorithm (RQ-MCTS) includes the steps of:

the search results of the Monte Carlo tree search algorithm are represented using an AND-OR tree structure, the AND-OR tree including AND nodes (reaction nodes) representing reaction nodes, OR nodes representing molecular nodes, AND root nodes, AND OR nodes (molecular nodes) The parent node of the AND node is the OR node of the corresponding reaction product, AND the child node is the OR node of the corresponding reaction reactant;

Referring to fig. 2 and 3, in one possible embodiment, the monte carlo tree search algorithm (RQ-MCTS) includes the steps of:

S1, a selection stage, namely starting from a root node, selecting an optimal reaction node from child nodes of the root node, if leaf nodes (unexpanded molecular nodes) exist in the child nodes of the selected reaction node, entering an expansion stage, otherwise, continuing to select downwards until the leaf nodes are searched;

s2, in the expansion stage, applying a single-step inverse synthesis model to the selected molecular nodes to obtain The inverse reaction template is applied to the selected molecules to obtain reactants, the template AND the reactants are constructed into AND AND OR nodes, an AND-OR tree is added, AND the experience values of the selected molecules AND the newly added reaction nodes are calculated;

S3, updating the molecular nodes upwards along the tree from the selected molecular nodes And/>/>, Reaction nodeAnd/>Up to the root node;

S4, checking whether a route exists in the tree, wherein leaf nodes of the route are purchasable molecules, if so, stopping searching, and if not, continuing to iterate three stages until reaching the iteration number limit, wherein the route is an inverse synthetic route successfully planned.

Specifically, in step S1, the goal of the selection stage is to find a leaf node that is more likely to find a successful route to expand, starting from the root node, selecting an optimal reaction node from the child nodes of the root node, the selection of the reaction node being based on empirical valuesAnd/>Calculate the integrated value/>Select the integrated value/>Maximum reaction node:

；

In the method, in the process of the invention, For the selected reaction node,/>Representing molecular nodes/>Reaction nodes in the child node set,/>Is a super parameter for controlling the weight of the single-step inverse synthetic model in the selection stage,/>Representing the reaction node/>Is used for the number of accesses of (a),Representing molecular nodes/>Father node of/>Is a super-parameter, represents punishment item of programming failure, and is set；/>Representing the reaction node/>Is used for the number of accesses.

Then, the molecular nodes of its child nodes are selected according to the selected reaction nodes, if there are unexpanded molecular nodes, one unexpanded molecular node is randomly selected, if there are no expanded molecular nodes, one expanded unsuccessful (note that unsuccessful is not failed) molecular node (leaf node) is selected randomly, then the reaction nodes are selected according to the above procedure continuously downwards, and the procedure is repeated until the leaf node.

Specifically, in step S2, the expansion phase predicts the selected molecule by the single-step inverse synthesis modelPossibly matchedTemplates, applications/>The individual templates obtained the reactants, respectively will/>Constructing an AND node AND an OR node by the templates AND the reactant molecules, AND adding the AND nodes AND the OR node into an AND-OR tree; calculating two empirical values/>, of a new reaction node, through the empirical network RVN, the empirical network QVNAnd/>If all the child nodes of the reaction node are commercially available molecules, let/>And/>According to the experience value/>And/>Calculating/>, of the reaction nodeValue, select/>Maximum reaction node/>Will react node/>/>And/>Value gives parent node/>Let/>，/>，/>Is a super parameter representing the cost of the reaction of the reactants, and is set/>；/>For successful planning of molecules/>The probability of the reaction route of (2); /(I)For synthesizing molecules/>Is a reaction path length of (2); /(I)For reaction/>Generating a probability value of successfully planning a reaction route; /(I)Reaction/>, designated for synthesisThe reaction path length of the reactants of (a) is determined.

Specifically, in step S3, for the reaction nodeAccording to the reaction node/>Updated/>, of child nodes of (a)And/>Obtain a new/>And/>Value/>Update to all child nodes/>Product of/>, i.e.)，/>Update to all child nodes/>Sum, i.e./>According to the number of accesses/>Averaging to obtain updated/>AndI.e./>，/>; And the access times of the reaction nodes are/>1 Is added.

Wherein for molecular nodesAccording to child node/>The updated child node reselects the reaction node, and the experience value/> of the reaction node is used for selecting the reaction node、/>Update/>And/>Values according to calculation/>The formula of (a) calculates the/>, of all sub-reaction nodesSelect/>Reaction node with maximum value/>Let/>，/>; Starting from the selected molecular node, the update process is repeated up the tree structure until the root node, when the root node/>When this search is performed, all leaf nodes on the route are commercially available molecules, and the corresponding inverse synthetic route is determined.

In this embodiment, in step T6, the experience data collection of the experience network RVN and the experience network QVN includes:

Using fixed values =10 And/>=10 Replaces the prediction value/>, respectively, of RVN networkAnd QVN predicted value of network/>；

Traversing the USPTO multi-step data set, sequentially inputting target molecules into a multi-step search algorithm, setting the iteration number to be 500, AND obtaining a search tree (AND-OR tree) of the target molecules;

For the empirical data of the empirical network RVN, selecting all the reaction nodes in the AND-OR tree, AND collecting parent node molecules, child node molecules AND child node molecules of the reaction nodes The value, the complete reaction of the parent node and the child node is used as the input of RVN network, and the/>The value is used as a training label of the RVN network, when/>The value remains unchanged whenTime, let/>Wherein/>Is a super parameter, is used as punishment item, and is set/>；

For the empirical data of empirical network QVN, select a reaction node on a successful route in the AND-OR tree, collect parent AND child node molecules of the reaction node as input to QVN network, calculate the synthetic route length of the reactants of the selected reaction as training label of QVN network.

In this embodiment, in step T7, in the training process of the empirical network RVN and the empirical network QVN:

For an experience network RVN, dividing experience data of the experience network RVN into a training set and a verification set according to a fixed ratio of 8:2, initializing parameters of the RVN, inputting the training set into a model according to a batch size of 1024, obtaining a predicted value, calculating mean square error loss with a true label, using loss back propagation to optimize the parameters of the RVN network, setting a learning rate to be 0.001, then inputting the verification set into the model according to batches and calculating loss, iterating for multiple times, and stopping training without continuously reducing the loss of the verification set for 5 times to obtain the optimal experience network RVN;

For the experience network QVN, dividing experience data of the experience network QVN into a training set and a verification set according to a fixed ratio of 8:2, initializing QVN parameters, inputting the training set into a model according to a batch size of 1024, obtaining a predicted value, calculating a mean square error loss with a true label, optimizing QVN parameters of the network by using loss back propagation, setting a learning rate to be 0.001, then inputting the verification set into the model according to batches and calculating the loss, iterating for multiple training, and stopping training without continuously reducing the verification set loss for 5 times to obtain the optimal experience network QVN.

In this embodiment, in step T8, in the process of optimizing a single-step inverse synthesis model using the empirical data of the empirical network RVN and the empirical network QVN, a reaction of a successful route in the empirical data and a corresponding inverse reaction template thereof are extracted, a dataset is generated, and the dataset is divided into a training set, a verification set and a test set according to a fixed ratio of 6:2:2, and a corresponding label is obtained according to the inverse reaction template; loading parameters of a single-step inverse synthetic model trained by using a USPTO single-step dataset, inputting the batch size of a training set 1024 into the model to obtain a prediction result, calculating cross entropy loss with a real label, optimizing model parameters according to loss reaction propagation, setting the learning rate to be 0.001, inputting a verification set into the model according to batches, calculating the verification set loss after obtaining the prediction result, iterating for a plurality of times until the verification set loss is not reduced continuously for 5 times, stopping training, and finally inputting a test set into the single-step inverse synthetic model to verify the performance of the single-step inverse synthetic model. Further, in step T9, the target molecule is input into the monte carlo tree search algorithm, which uses the trained empirical network RVN to replace the first set constant value, uses the trained empirical network QVN to replace the second set constant value, uses the optimized single-step inverse synthesis model, and outputs a synthesis route composed of a set of reactions.

An example of a performance comparison is given below:

Comparing rtero and EG-MCTS on the test set of USPTO 190, the results are shown in the table 1, and the success rate of the inverse synthesis planning when the limiting iteration times are 100 and 500 and the average iteration times when the limiting iteration is 500 are tested, and the experimental result can find that the accuracy of the invention (RQ-MCTS) is obviously superior to that of the current optimal EG-MCTS, and the average iteration times are smaller, which means that the efficiency of the RQ-MCTS is higher.

Table 1 performance comparison

In summary, the present invention is composed of a chemical reaction expressed by SMILES by processing a single-step dataset with USPTO as the original dataset; preprocessing the single-step data set, dividing a training set, a verification set and a test set, and extracting an inverse reaction template library for training a single-step inverse synthesis model; processing the multi-step data set, wherein the multi-step data set takes USPTO as a raw data set, and a multi-step reaction route is formed by chemical reactions expressed by a group of SMILES; the multi-step data set consists of reaction routes, the multi-step data set is preprocessed, routes of which the terminal reactants are not purchasable molecules in reality are filtered, terminal molecules of the residual reaction routes are extracted to form a target molecule library, and the target molecule library is used for collecting experience data; constructing a single-step inverse synthesis model, the single-step inverse synthesis model being targeted to input a specified molecule, predicting reactants for synthesizing the specified molecule by a one-step reaction, and training the single-step inverse synthesis model using the preprocessed single-step data set; constructing an empirical network RVN and an empirical network QVN by which whether a given reaction produces a successfully planned route is predicted; predicting, via the empirical network QVN, a reaction route length for synthesizing a reactant of a given reaction; constructing a Monte Carlo tree search algorithm, wherein the Monte Carlo tree search algorithm starts from a root node, selects the reaction nodes predicted by the single-step inverse synthesis model downwards along a tree structure according to the prediction scores of the empirical network RVN and the empirical network QVN until leaf nodes formed by reactants, expands the leaf nodes by using the single-step inverse synthesis model, then backtracks to the root node, and repeatedly executes the Monte Carlo tree search algorithm until the leaf nodes are all routes of commercial molecules or reach the limit of search times, and takes the obtained routes of the leaf nodes which are all routes of the commercial molecules as synthesis routes of target molecules; replacing the empirical network RVN predictor with a first set constant value and replacing the empirical network QVN predictor with a second set constant value; inputting target molecules in the preprocessed multi-step dataset into the Monte Carlo tree search algorithm in sequence to search a synthetic route of the target molecules, and respectively collecting experience data of the experience network RVN and the experience network QVN from a search tree; preprocessing the collected empirical data of the empirical network RVN and the empirical network QVN, training the empirical network RVN using the collected empirical data of the empirical network RVN, and training the empirical network QVN using the collected empirical data of the empirical network QVN; using the response on the successful route in the experience data of the experience network RVN and the experience network QVN to form an optimized dataset, and adopting the optimized dataset to train and optimize the single-step inverse synthetic model after the single-step dataset is trained; inputting a target molecule into the Monte Carlo tree search algorithm, wherein the Monte Carlo tree search algorithm uses the trained empirical network RVN to replace the first set constant value, uses the trained empirical network QVN to replace the second set constant value, uses the optimized single-step inverse synthesis model, and outputs a synthesis route composed of a group of reactions. According to the invention, the probability of successful planning caused by a certain reaction and the cost (reaction route length) of synthesizing reactants of the method are respectively predicted through the empirical networks RVN and QVN, the Monte Carlo tree is guided to search the inverse synthesis planning through the empirical networks RVN and QVN, the success rate and the efficiency of the inverse synthesis planning are improved, the success rate and the efficiency of the existing optimal inverse synthesis planning algorithm EG-MCTS are surpassed, the method has better interpretability than EG-MCTS, in addition, the single-step inverse synthesis model trained through the USPTO is optimized through empirical data, so that the method learns the characteristics of real reaction and the experience preference, and the application capability of the single-step inverse synthesis model in reality is improved.

It should be noted that the method of the embodiments of the present disclosure may be performed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene, and is completed by mutually matching a plurality of devices. In the case of such a distributed scenario, one of the devices may perform only one or more steps of the methods of embodiments of the present disclosure, the devices interacting with each other to accomplish the methods.

It should be noted that the foregoing describes some embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

Example 2

Referring to fig. 3, embodiment 2 of the present invention further provides a monte carlo tree search inverse synthesis planning apparatus based on an empirical network, and the monte carlo tree search inverse synthesis planning method based on an empirical network of embodiment 1 above is adopted, including:

a single-step inverse synthesis model construction module 001 for constructing a single-step inverse synthesis model, the single-step inverse synthesis model being aimed at inputting a specified molecule, predicting a reactant for synthesizing the specified molecule by a one-step reaction, and training the single-step inverse synthesis model using the preprocessed single-step data set;

An empirical network construction module 002 for constructing an empirical network RVN and an empirical network QVN by which whether a specified reaction produces a successfully planned route is predicted; predicting, via the empirical network QVN, a reaction route length for synthesizing a reactant of a given reaction;

The monte carlo tree search module 003 is configured to construct a monte carlo tree search algorithm, the monte carlo tree search algorithm starts from a root node, selects the reaction node predicted by the single-step inverse synthesis model downwards along a tree structure according to the prediction scores of the empirical network RVN and the empirical network QVN until a leaf node formed by reactants, uses the single-step inverse synthesis model to expand the leaf node, and then backtracks to the root node, and repeatedly executes the monte carlo tree search algorithm until the leaf node is a route of a commercially available molecule or reaches a limit of search times, and takes the obtained route of the commercially available molecule as a synthesis route of a target molecule;

An empirical data collection module 004 for replacing the empirical network RVN forecast with a first set constant value and replacing the empirical network QVN forecast with a second set constant value; inputting target molecules in the preprocessed multi-step dataset into the Monte Carlo tree search algorithm in sequence to search a synthetic route of the target molecules, and respectively collecting experience data of the experience network RVN and the experience network QVN from a search tree;

An empirical network training module 005 for preprocessing the collected empirical data of the empirical network RVN and the empirical network QVN, training the empirical network RVN using the collected empirical data of the empirical network RVN, and training the empirical network QVN using the collected empirical data of the empirical network QVN;

the single-step inverse synthesis model training module 006 is configured to use the response on the successful route in the empirical data of the empirical network RVN and the empirical network QVN to form an optimized dataset, and perform training optimization on the single-step inverse synthesis model after the single-step dataset training by using the optimized dataset;

The synthetic route generation module 007 is configured to input a target molecule into the monte carlo tree search algorithm, wherein the monte carlo tree search algorithm uses the trained empirical network RVN to replace the first set constant value, uses the trained empirical network QVN to replace the second set constant value, uses the optimized single-step inverse synthetic model, and outputs a synthetic route composed of a set of reactions.

In a possible embodiment, the method further includes a first data set processing module 008 for processing the single-step data set, where the single-step data set uses USPTO as a raw data set, and is composed of chemical reactions expressed by SMILES;

in the first data set processing module 008, a RDKit tool kit is used for filtering the reaction of the single-step data set which does not accord with a preset rule, a RXNMapper tool kit is used for performing atomic mapping on the filtered reaction, and the single-step data set is randomly divided into a training set, a verification set and a test set according to a set proportion; and extracting a reverse reaction template from the single-step dataset by using RDChiral to obtain a reverse reaction template library, and matching the reverse reaction template with the reaction of the single-step dataset to obtain a training label.

In a possible embodiment, the method further includes a second dataset processing module 009 for processing the multi-step dataset, wherein the multi-step dataset uses USPTO as a raw dataset, and a multi-step reaction route is formed by a set of chemical reactions expressed by SMILES;

In the second dataset processing module 009, a RDKit tool pack is used to filter the reaction of the multi-step dataset which does not meet the preset rule, if the reaction route which does not meet the preset rule exists in the reaction route, the reaction route which does not meet the preset rule is filtered, all terminal reactant molecules in the reaction route are traversed, if the reactant is not a commercially available molecule, the corresponding reaction route is filtered, and the target molecule at the terminal of the reaction route is extracted to form a target molecule library.

In one possible embodiment, in the single-step inverse synthetic model construction module 001, the single-step inverse synthetic modelThe expression formula of (2) is:

；

In one possible embodiment, in the empirical network construction module 002, the inputs of the empirical network RVN and the empirical network QVN are feature vectors of the Morgan fingerprint conversion of the reactionMorgan fingerprint transformed eigenvectors/>, using RDKit toolkit to generate products, respectivelyAnd the eigenvector of Morgan fingerprint conversion of the reactant {/>The Morgan fingerprint of the reaction is expressed as the product of Morgan fingerprint minus the reactant Morgan fingerprint, and the formula is:

；

in the empirical network construction module 002, the characteristic vector of the Morgan fingerprint of the reaction is converted Inputting the experience network RVN, and obtaining a constant value/>, through two layers of MLP and Sigmoid activation functionsRepresenting the probability value that the reaction produced a successfully planned reaction route, the probability value/>, that the reaction produced a successfully planned reaction routeThe expression formula of (2) is:

；

in the empirical network construction module 002, the characteristic vector of the Morgan fingerprint of the reaction is converted Inputting the empirical network QVN, and obtaining a constant value/>, through two layers of MLP and Softplus activation functionsRepresents the length of the reaction route of the reactant synthesizing the reaction, the length of the reaction route of the reactant synthesizing the reaction/>The expression formula of (2) is: /(I)

；

In the method, in the process of the invention,And/>Is a parameter matrix of the model,/>And/>Is a paranoid vector.

In a possible embodiment, in the monte carlo tree search module 003, the search result of the monte carlo tree search algorithm is represented by using an AND-OR tree structure, the AND-OR tree includes AND nodes AND OR nodes, the AND nodes represent reaction nodes, the OR nodes represent molecular nodes, AND the root nodeThe parent node of the AND node is the OR node of the corresponding reaction product, AND the child node is the OR node of the corresponding reaction reactant;

in the Monte Carlo tree search module 003, the OR node includes one molecule indicating the synthesizability of the molecule />, Which represents the cost of molecular synthesisThe AND node comprises a reaction template, a probability value/>, which indicates that the reaction produces a successfully planned reaction routeReaction route length indicating the synthesis of the reactants of the given reaction/>And predictive value/>, of a single-step inverse synthetic model；

In the Monte Carlo tree search module 003, when all the children nodes of the AND node succeed, the AND node is marked as successful, AND when the children nodes of the AND node have successful nodes, the OR node is marked as successful; when the molecule of the OR node is a commercial molecule, the OR node is marked as successful, and when reactants of all leaf nodes on one reaction route are commercial molecules, a successful reverse synthesis route is found;

In the Monte Carlo tree search module 003, if the molecules of the OR node cannot predict reasonable reaction through the single-step inverse synthesis model, marking the molecules corresponding to the OR node as failure; if there is a failure in the child nodes of the AND node, the AND node is marked as failed, AND if all the child nodes of the OR node fail, the OR node is marked as failed.

In a possible embodiment, in the monte carlo tree searching module 003, in the monte carlo tree searching algorithm, starting from the root node, an optimal reaction node is selected from the child nodes of the root node, where the selection of the reaction node is based on an empirical valueAnd/>Calculate the integrated value/>Select the integrated value/>Maximum reaction node:

；

In one possible embodiment, in the Monte Carlo tree search module 003, the selected molecules are predicted by the single-step inverse synthetic modelPossible matches/>Templates, applications/>The individual templates obtained the reactants, respectively will/>Constructing an AND node AND an OR node by the templates AND the reactant molecules, AND adding the AND nodes AND the OR node into an AND-OR tree;

In a possible embodiment, in the monte carlo tree searching module 003, two empirical values of a new reaction node are calculated through the empirical network RVN and the empirical network QVN And/>If all the child nodes of the reaction node are commercially available molecules, let/>And/>According to the experience value/>And/>Calculating/>, of the reaction nodeValue, select/>Maximum reaction node/>Will react node/>/>And/>Value gives parent node/>Let/>，/>，Is a super parameter which represents the cost of the reaction of reactants; /(I)For successful planning of molecules/>The probability of the reaction route of (2); /(I)For synthesizing molecules/>Is a reaction path length of (2); /(I)For reaction/>Generating a probability value of successfully planning a reaction route; /(I)Reaction/>, designated for synthesisThe reaction path length of the reactants of (a) is determined.

In a possible embodiment, in the monte carlo tree search module 003, in the monte carlo tree search algorithm, for the reaction nodeAccording to the reaction node/>Updated/>, of child nodes of (a)And/>Obtain a new/>And/>Value/>Update to all child nodes/>Product of/>Update to all child nodes/>According to the number of accesses/>Averaging to obtain updated/>And/>; And the access times of the reaction nodes are/>Adding 1;

It should be noted that, because the content of information interaction and execution process between the modules of the above-mentioned device is based on the same concept as the method embodiment in the embodiment 1 of the present application, the technical effects brought by the content are the same as the method embodiment of the present application, and the specific content can be referred to the description in the foregoing illustrated method embodiment of the present application, which is not repeated herein.

Example 3

Embodiment 3 of the present invention provides a non-transitory computer-readable storage medium having stored therein program code of an empirical network based monte carlo tree search inverse synthetic planning method, the program code comprising instructions for performing the empirical network based monte carlo tree search inverse synthetic planning method of embodiment 1 or any possible implementation thereof.

Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk (Solid STATE DISK, SSD)), etc.

Example 4

Embodiment 4 of the present invention provides an electronic device, including: a memory and a processor;

The processor and the memory complete communication with each other through a bus; the memory stores program instructions executable by the processor to invoke the program instructions capable of performing the empirical network based monte carlo tree search inverse synthetic planning method of embodiment 1 or any possible implementation thereof.

Specifically, the processor may be implemented by hardware or software, and when implemented by hardware, the processor may be a logic circuit, an integrated circuit, or the like; when implemented in software, the processor may be a general-purpose processor, implemented by reading software code stored in a memory, which may be integrated in the processor, or may reside outside the processor, and which may reside separately.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.).

It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module for implementation. Thus, the present invention is not limited to any specific combination of hardware and software.

While the invention has been described in detail in the foregoing general description and specific examples, it will be apparent to those skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.

Claims

1. The Monte Carlo tree search inverse synthesis planning method based on the experience network is characterized by comprising the following steps:

Inputting a target molecule into the Monte Carlo tree search algorithm, wherein the Monte Carlo tree search algorithm uses the trained empirical network RVN to replace the first set constant value, uses the trained empirical network QVN to replace the second set constant value, uses the optimized single-step inverse synthesis model, and outputs a synthesis route composed of a group of reactions;

The inputs to the empirical network RVN and the empirical network QVN are the reflected Morgan fingerprint transformed feature vector f _r, the Morgan fingerprint transformed feature vector f _p' of the product, and the Morgan fingerprint transformed feature vector of the reactant, respectively, are generated using a RDKit toolkit The Morgan fingerprint of the reaction is expressed as the product Morgan fingerprint minus the sum of the Morgan fingerprints of the reactants, the formula is:

Wherein s represents the amount of the reactant;

inputting the characteristic vector f _r converted from the Morgan fingerprint of the reaction into the empirical network RVN, obtaining a constant value R (a) through two layers of MLP and Sigmoid activation functions, and representing the probability value of the successfully planned reaction route generated by the reaction, wherein the expression formula of the probability value R (a) of the successfully planned reaction route generated by the reaction is as follows:

Wherein a represents a reaction node in the search process, And/>Is a parameter matrix of the model,/>And/>Is a paranoid vector;

inputting the characteristic vector f _r of the Morgan fingerprint conversion of the reaction into the empirical network QVN, and obtaining a constant value Q (a) through two layers of MLP and Softplus activation functions, wherein the expression formula of the reaction path length Q (a) of the reactants for synthesizing the reaction is as follows:

In the method, in the process of the invention, And/>Is a parameter matrix of the model,/>And/>Is a paranoid vector;

The search result of the Monte Carlo tree search algorithm is represented by an AND-OR tree structure, the AND-OR tree comprises AND nodes AND OR nodes, the AND nodes represent reaction nodes, the OR nodes represent molecular nodes, the root node T is the OR node of the target molecule, the father node of the AND nodes is the OR node of the corresponding reaction product, AND the child nodes are the OR nodes of the corresponding reaction reactant;

In the searching process of the Monte Carlo tree searching algorithm, the OR node comprises a molecule, V ^r representing the synthesizability of the molecule AND V ^q representing the cost of synthesizing the molecule, AND the AND node comprises a reaction template, a probability value R (a) representing that the reaction generates a successfully planned reaction route, a reaction route length Q (a) representing the reactant synthesizing the appointed reaction AND a predicted value P (a) of a single-step inverse synthesis model;

2. The empirical network based monte carlo tree search inverse synthetic planning method of claim 1, further comprising processing the single step dataset with USPTO as the raw dataset, consisting of SMILES expressed chemical reactions;

3. The method of claim 2, further comprising processing the multi-step data set, wherein the multi-step data set uses USPTO as a raw data set, and wherein a multi-step reaction path is formed by a set of SMILES-expressed chemical reactions;

4. The empirical network based monte carlo tree search inverse synthesis planning method of claim 1, wherein the single step inverse synthesis modelThe expression formula of (2) is:

where Softmax and RELU represent nonlinear activation functions, And/>Is a parameter matrix of the model,/>And/>Is a bias vector; f _p is the feature vector of the product Morgan fingerprint transformation generated using the RDKit toolkit.

5. The method for inverse synthesis planning of a monte carlo tree search based on an empirical network according to claim 1, wherein in the monte carlo tree search algorithm, starting from a root node, an optimal reaction node is selected from sub-nodes of the root node, the reaction node is selected by calculating a composite value U (a) according to empirical values R (a) and Q (a), and the reaction node with the largest composite value U (a) is selected:

U(a)＝-(R(a)*Q(a)+(1-R(a))*C_fail)

wherein a ^* is a selected reaction node, a ' represents a reaction node in a sub-node set of a molecular node m, C is a super-parameter for controlling the weight of the single-step inverse synthesis model in a selection stage, N (a ') represents the number of accesses of the reaction node a ', C _fail is a super parameter representing a penalty term of programming failure; /(I)Representing a reaction nodeIs used for the number of accesses.

6. The method for inverse synthesis planning of a Monte Carlo tree search based on an empirical network according to claim 5, wherein in the Monte Carlo tree search algorithm, top-k templates possibly matched with the selected molecule m are predicted through the single-step inverse synthesis model, the top-k templates are applied to obtain reactants, AND the top-k templates AND the reactant molecules are respectively constructed into an AND node AND an OR node, AND are added into an AND-OR tree;

Calculating two empirical values R (a) and Q (a) of a new reaction node through the empirical network RVN and the empirical network QVN, if the child nodes of the reaction node are all commercial molecules, making R (a) =1 and Q (a) =0, calculating the U (a) value of the reaction node according to the empirical values R (a) and Q (a), selecting the reaction node a 'with the largest U (a), giving the R (a') and Q (a ') values of the reaction node a' to a father node m, and making C _rc is a hyper-parameter that represents the cost of the reactants to react; /(I)Probability of successfully planning a reaction route of the molecule m; /(I)The length of the reaction route for synthesizing the molecule m; r (a ') is the probability value that reaction a' produces a successfully planned reaction route; q (a ') is the length of the reaction route for synthesizing the reactants of the designated reaction a'.

7. The method for inverse synthesis planning of Monte Carlo tree search based on empirical network according to claim 6, wherein in the Monte Carlo tree search algorithm, for the reaction node a, new nodes are obtained according to updated sub-nodes V ^r and V ^q of the reaction node aAnd/>Value/>Updated as the product of all child nodes V ^r,/>Updating to be the sum of all child nodes V ^q, and obtaining updated R (a) and Q (a) by taking the average value according to the access times N (a); and 1 is added to the access times N (a) of the reaction node;

For the molecular node m, the reaction node is reselected according to the updated child node of the child node m, and the reaction node is updated according to the empirical values R (a) and Q (a) of the reaction node And/>The value of the reaction node a' with the largest value of the U (a) is selected by calculating the U (a) of all the sub-reaction nodes according to the formula of calculating the U (a);

Starting from the selected molecular node, the updating process is repeated up the tree structure until the root node, when V ^r = 1 of the root node, it is indicated that all the leaf nodes on the route searched at this time are commercially available molecules, and the corresponding inverse synthetic route is determined.

8. The empirical network-based monte carlo tree search inverse synthesis planning apparatus employing the empirical network-based monte carlo tree search inverse synthesis planning method according to any one of claims 1 to 7, comprising: