CN112786108A - Molecular understanding model training method, device, equipment and medium - Google Patents

Molecular understanding model training method, device, equipment and medium Download PDF

Info

Publication number
CN112786108A
CN112786108A CN202110082654.3A CN202110082654A CN112786108A CN 112786108 A CN112786108 A CN 112786108A CN 202110082654 A CN202110082654 A CN 202110082654A CN 112786108 A CN112786108 A CN 112786108A
Authority
CN
China
Prior art keywords
molecular
molecule
output
sequence
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110082654.3A
Other languages
Chinese (zh)
Other versions
CN112786108B (en
Inventor
李宇琨
张涵
肖东凌
孙宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110082654.3A priority Critical patent/CN112786108B/en
Publication of CN112786108A publication Critical patent/CN112786108A/en
Application granted granted Critical
Publication of CN112786108B publication Critical patent/CN112786108B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biotechnology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Chemical & Material Sciences (AREA)
  • Public Health (AREA)
  • Analytical Chemistry (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Machine Translation (AREA)

Abstract

The disclosure discloses a training method, a training device, equipment and a medium of a molecular understanding model, and relates to the technical field of computers, in particular to the technical field of artificial intelligence such as natural language processing and deep learning. The training method comprises the following steps: obtaining pre-training data, the pre-training data comprising: a first molecule represents a sequence sample and a second molecule represents a sequence sample, the first molecule represents a sequence sample and the second molecule; processing the first molecular representation sequence sample by using the molecular understanding model to obtain a pre-training output; and calculating a pre-training loss function according to the pre-training output and the second molecular representation sequence sample, and updating parameters of the molecular understanding model according to the pre-training loss function. The present disclosure can improve the molecular understanding effect of the molecular understanding model.

Description

Molecular understanding model training method, device, equipment and medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to artificial intelligence technologies such as natural language processing and deep learning, and more particularly, to a method, an apparatus, a device, and a medium for training a molecular understanding model.
Background
Artificial Intelligence (AI) is a subject of research that makes computers simulate some human mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both hardware-level and software-level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge map technology and the like.
The Simplified Molecular Input Line Entry Specification (SMILES) is a Specification that explicitly describes molecules using American Standard Code for Information Interchange (ASCII) strings. Based on SMILES, a molecule can be represented as one or more SMILES sequences. With the development of deep learning technology, the deep learning technology can be applied to the field of physical chemistry.
In the related art, in the molecular understanding, based on a single SMILES sequence of a molecule, a Bidirectional Transformer Encoder (BERT) Model is used, and a Mask Language Model (MLM) task is used for training.
Disclosure of Invention
The present disclosure provides a method, apparatus, device, and medium for training a molecular understanding model.
According to an aspect of the present disclosure, there is provided a method for training a molecular understanding model, including: obtaining pre-training data, the pre-training data comprising: a first molecule represents a sequence sample and a second molecule represents a sequence sample, the first molecule represents a sequence sample and the second molecule; processing the first molecular representation sequence sample by using the molecular understanding model to obtain a pre-training output; and calculating a pre-training loss function according to the pre-training output and the second molecular representation sequence sample, and updating parameters of the molecular understanding model according to the pre-training loss function.
According to another aspect of the present disclosure, there is provided a molecular processing method based on a molecular model, the molecular model including a molecular understanding model and an output network, the molecular understanding model being obtained by representing sequence samples using two different molecules of a same molecule, the molecular processing method including: processing a molecular application input by using the molecular understanding model to obtain a hidden layer output, wherein the molecular application input comprises a fixed identifier when the output network is a molecular generation network; and processing the hidden layer output by adopting the output network to obtain the molecular application output.
According to another aspect of the present disclosure, there is provided a training apparatus for a molecular understanding model, including: an obtaining module, configured to obtain pre-training data, where the pre-training data includes: a first molecule represents a sequence sample and a second molecule represents a sequence sample, the first molecule represents a sequence sample and the second molecule; the processing module is used for processing the first molecular representation sequence sample by adopting the molecular understanding model to obtain pre-training output; and the updating module is used for calculating a pre-training loss function according to the pre-training output and the second molecular representation sequence sample, and updating the parameters of the molecular understanding model according to the pre-training loss function.
According to another aspect of the present disclosure, there is provided a molecular processing apparatus based on a molecular model including a molecular understanding model and an output network, the molecular processing apparatus including: a first processing module, configured to process a molecular application input by using the molecular understanding model to obtain a hidden layer output, where the molecular application input includes a fixed identifier when the output network is a molecular generation network; and the second processing module is used for processing the hidden layer output by adopting the output network so as to obtain the molecular application output.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the above aspects.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to any one of the above aspects.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of the above aspects.
According to the technical scheme disclosed by the invention, the molecular understanding effect of the molecular understanding model can be improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;
FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;
FIG. 3 is a schematic diagram according to a third embodiment of the present disclosure;
FIG. 4 is a schematic diagram according to a fourth embodiment of the present disclosure;
FIG. 5 is a schematic diagram according to a fifth embodiment of the present disclosure;
FIG. 6 is a schematic diagram according to a sixth embodiment of the present disclosure;
FIG. 7 is a schematic diagram according to a seventh embodiment of the present disclosure;
FIG. 8 is a schematic diagram according to an eighth embodiment of the present disclosure;
FIG. 9 is a schematic diagram according to a ninth embodiment of the present disclosure;
FIG. 10 is a schematic diagram according to a tenth embodiment of the present disclosure;
FIG. 11 is a schematic diagram according to an eleventh embodiment of the present disclosure;
fig. 12 is a schematic diagram of an electronic device for implementing either a molecular understanding model training method or a molecular processing method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
When the deep learning technology is applied to the field of physical chemistry, molecules of a large number of compounds can be converted into SMILES (chemical-induced interference) sequences, the SMILES sequences are input into a BERT model like texts for pre-training, a pre-training model is trained, and then the pre-training model can be finely tuned (fine-tuning) based on downstream molecular tasks.
In the related art, a single SMILES sequence of a molecule is input into a BERT model, and pre-training is performed based on an MLM task to obtain a pre-training model for molecular understanding. The single SMILES sequence is adopted, and the characteristics of the SMILES sequence are not fully utilized, so that the molecular understanding effect of the pre-trained molecular understanding model is poor.
In order to solve the problem of poor molecular understanding effect of the molecular understanding model existing in the related art, the present disclosure provides some examples as follows.
Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure. The embodiment provides a training method of a molecular understanding model, which comprises the following steps:
101. obtaining pre-training data, the pre-training data comprising: the first molecule represents a sequence sample and the second molecule represents a sequence sample, and the first molecule represents a sequence sample and the second molecule represents a sequence sample which are two different molecules of the same molecule.
102. Processing the first molecular representation sequence sample using the molecular understanding model to obtain a pre-training output.
103. And calculating a pre-training loss function according to the pre-training output and the second molecular representation sequence sample, and updating parameters of the molecular understanding model according to the pre-training loss function.
A molecule is a physicochemical term, a molecule being the smallest unit of a substance that can exist independently, is relatively stable, and retains the physicochemical properties of the substance. Molecules are composed of atoms that are combined into molecules by certain forces in a certain order and arrangement, which may be referred to as a molecular structure. Thus, a molecule can be characterized by atomic and molecular structure, and the physicochemical properties of a molecule depend not only on the type and number of atoms that make up the molecule, but also on the molecular structure.
Natural Language Understanding (NLU) is an important component of Natural Language Processing (NLP), and the core task of NLU is to convert Natural Language into a formal Language that can be processed by a machine, and establish connection between Natural Language and the machine.
Similar to natural language understanding, molecular understanding means that a sequence of molecular representations is converted into a molecular understanding representation, and the molecular understanding representation is a representation that can be processed by a machine, for example, the molecular understanding representation specifically includes probability distribution vectors corresponding to time steps, and the i (i ═ 1.. n) th element of the probability distribution vector is the probability of the i-th word in a word list, where n is the dimension of the word list.
In some embodiments, the molecular representation sequence is a SMILES sequence. By adopting the SMILES sequence, the characteristics that the same molecule corresponds to a plurality of SMILES sequences can be fully utilized, and compared with a mode of understanding the molecule by a single SMILES sequence, the molecule can be better understood by different SMILES sequences of the same molecule, so that the molecule understanding effect of the molecule understanding model is improved.
Based on the SMILES, different SMILES sequences of the same molecule can be obtained, for example, referring to fig. 2, corresponding to the same molecule 201, and based on the SMILES, a plurality of SMILES sequences 202 corresponding to the molecule 201 can be obtained.
Further, two different SMILES sequences can be randomly acquired among the plurality of SMILES sequences in a random manner. For example, two SMILES sequences captured corresponding to the molecule 201 shown in fig. 2 may be the first and third, i.e.: CC (Oc1cccc 1C (═ O) ═ O, and C (C1C (cccc1) Oc (═ O) C) (═ O) O.
To distinguish from the application phase, the data used in the training phase may be referred to as samples, e.g., the application phase is referred to as a molecular representation sequence and the training phase is referred to as a molecular representation sequence sample. Therefore, in the training stage, the above-mentioned manner may be adopted to obtain two different molecule representation sequence samples corresponding to the same molecule, where the two different molecule representation sequence samples may be referred to as a first molecule representation sequence sample and a second molecule representation sequence sample.
After a first molecular representation sequence sample and a second molecular representation sequence sample are obtained, when the first molecular representation sequence sample can be input into a molecular understanding model, initially, the molecular understanding model processes the first molecular representation sequence sample by using initial parameters, the output of the molecular understanding model is called pre-training output, then, a pre-training loss function can be calculated based on the pre-training output and the second molecular representation sequence sample, parameters of the molecular understanding model are updated based on the pre-training loss function until the pre-training loss function converges, and the parameters when the pre-training loss function converges are used as final parameters of the molecular understanding model. The pre-training loss function is not limited, and may be, for example, a negative log-likelihood (NLL) function.
In some embodiments, as shown in fig. 3, the molecular understanding model may include an input layer, which may be an embedding (embedding) layer for converting an input sequence into an input vector, and a hidden layer, which may specifically include an encoder (encoder)301 and a decoder (decoder) 302. Taking the example of the molecular representation sequence as the SMILES sequence, when training a molecular understanding model, a first SMILES sequence sample is converted into an input vector through an embedding layer and then is input into an encoder, and the input vector is processed by the encoder and a decoder to obtain pre-training output. The pre-training output is a probability distribution vector of each time step, and then a pre-training loss function can be calculated based on the pre-training output and an expected output sequence sample corresponding to the time step, namely a second SMILES sequence sample, so that parameters of the molecular understanding model are updated based on the pre-training loss function.
In some embodiments, the encoder includes a first self-attention (self-attention) layer that employs a bi-directional self-attention mechanism; and/or the decoder comprises a second self-attention layer, wherein the second self-attention layer adopts a unidirectional self-attention mechanism.
The encoder adopts a bidirectional self-attention mechanism, the decoder adopts a unidirectional self-attention mechanism, different self-attention mechanisms can be adopted according to different inputs, more flexibility is realized, and the molecular understanding effect of the molecular understanding model can be improved.
In some embodiments, the encoder further comprises a first shared network, and the decoder further comprises a second shared network, the first shared network and the second shared network having the same network structure and network parameters.
By adopting a shared network by the encoder and the decoder, the same characteristics can be better utilized in the encoding and decoding processes, and the molecular understanding effect of the molecular understanding model is improved.
For example, referring to fig. 4, the encoder and the decoder may be implemented based on a Transformer network, and in fig. 4, the encoder and the decoder are represented by including a plurality of Transformer layers (Transformer layers), and the structure of each Transformer layer corresponding to the encoder is, for example, the structure of each encoder of the Transformer network. The structures of the transform layers are the same, in each transform layer, the decoder and the encoder are similar in structure, that is, each of the decoder and the encoder may include a self-attention layer and a shared network, and for the sake of distinction, the network in the encoder may be referred to as a first attention layer and a first shared network, and the network in the decoder may be referred to as a second attention layer and a second shared network, except that, as shown in fig. 4, the first self-attention layer 401 in the encoder is a bidirectional self-attention layer, the second self-attention layer 402 in the decoder is a unidirectional self-attention layer, and the shared networks of the two, that is, the first shared network and the second shared network, may be both forward feedback (forward feedback) layers of the encoder of the transform network.
The sequence refers to a combination of a plurality of sequence units, which may be different according to different application scenarios, for example, in the field of NLP in chinese, a sequence unit may refer to each word in chinese.
In the field of physicochemical technology to which embodiments of the present disclosure relate, sequence units may refer to characters characterizing molecules, for example, corresponding to a SMILES sequence, and sequence units are ASCII characters, specifically C, O shown in fig. 2.
In outputting the sequence, the output may be output sequence unit by sequence unit, for example, corresponding to A, B, C three characters, and may be output at a first time step, a second time step, B, and a third time step, C. In the embodiment of the present disclosure, when outputting the current character, the character that has been output before may be used for outputting, for example, when outputting the character B, the character may be output based on the character a that has been output, and when outputting the character C, the character may be output based on the character a and the character B.
Accordingly, in some embodiments, processing the first molecular representation sequence samples using the molecular understanding model to obtain a pre-training output comprises: performing bi-directional self-attention processing on the first molecular representation sequence samples by using the first self-attention layer of the encoder to obtain a bi-directional self-attention processing result; processing the bi-directional self-attention processing result with the first shared network portion of the encoder to obtain an encoded output; performing unidirectional self-attention processing on the encoded output and the generated output using the second self-attention layer of the decoder to obtain a unidirectional self-attention processing result; processing the one-way self-attention processing result using the second shared network portion of the decoder to obtain the pre-training output.
For example, referring to fig. 5, an embedding layer 501 is used to convert a first SMILES sequence sample into a first input vector, the first input vector is sequentially processed by a first attention layer and a first shared network of an encoder 502, and then outputs a coded vector to a decoder, another input of the decoder is a second input vector obtained by converting an output sequence through the embedding layer 501, the coded vector and the second input vector are sequentially processed by a second attention layer and a second shared network of a decoder 503, and then outputs a pre-training output, where the pre-training output may specifically be a probability distribution vector. The currently generated sequence units may then be determined based on the probability distribution vector as generated output for a subsequent time step, passed through the embedding layer and input to the decoder, and so on, generating sequence units one by one until the end of generating the terminator.
Through the generation process of the pre-training output, the accuracy of the pre-training output can be improved, and the molecular understanding effect of the molecular understanding model is further improved.
In this embodiment, two different molecules of the same molecule are used to represent the sequence sample training molecule understanding model, so that the characteristics of the molecule representation sequence can be fully utilized, and the molecule understanding effect of the molecule understanding model can be improved, compared with a method of using a single molecule to represent the sequence sample training model.
In any of the above embodiments or combinations of the above embodiments, the pre-training process of the molecular understanding model is described, so that the molecular understanding model can be used as a pre-training model, and then the pre-training model can be fine-tuned to obtain a fine-tuned model, and the fine-tuned model can be used in a downstream molecular processing task. The trimmed model can be called a molecular model, and a training process of the molecular model is explained below.
Fig. 6 is a schematic diagram according to a sixth embodiment of the present disclosure. The present embodiment provides a training method of a molecular model, as shown in fig. 6, the training method includes:
601. fine training data is obtained.
602. And fine-tuning the molecular understanding model by adopting the fine-tuning training data to obtain the molecular model.
Wherein, the molecular understanding model can be obtained by adopting any embodiment of the above training.
Based on the differences in the molecular processing tasks, the fine-tuning training data can be selected accordingly.
In some embodiments, the molecular processing tasks may include: a molecular prediction task, and/or a molecular generation task. Further, molecular prediction tasks may include: a molecular classification task, and/or a molecular regression task. Further, the molecular generation task may include: generating new molecules, generating new molecules with specific properties, generating optimized molecules.
For molecular prediction tasks:
the corresponding fine tuning training data may be referred to as first fine tuning training data, which includes: a first input sample and a first output sample; the first input sample is a molecule representation sequence sample, the first output sample is label data corresponding to the molecule representation sequence sample, and if the prediction is classification, the label data is a classification label; and/or, if the prediction is regression, the label data is a regression label.
The classification labels can be labeled manually according to actual requirements, for example, a labeled DNA sequence or protein is labeled, the labeled protein comprises a labeled seed storage protein, a labeled isozyme, a labeled allelic enzyme and the like, the isozymes are different molecular forms of the enzymes encoded at a plurality of gene loci, and the allelic enzymes are different molecular forms of the enzymes encoded by different alleles of the same gene locus. The regression label can also be manually marked according to actual requirements.
For the molecular generation task:
the corresponding fine tuning training data may be referred to as second fine tuning training data, and the corresponding second fine tuning training data is also different corresponding to the three molecular generation tasks.
Corresponding to the task of molecular generation of new molecules:
the second fine training data includes: a plurality of sets of sample pairs, each set of sample pairs comprising: and the second input sample comprises a fixed identifier, the second output sample is a molecule representation sequence sample, and the molecule representation sequence sample in each group of sample pairs is a molecule representation sequence sample of similar molecules meeting a preset similarity condition. The similar conditions may be set according to actual requirements, for example, molecules with the same atomic composition and the same molecular structure are used as similar molecules, or the like. Wherein, the judgment conditions of similar atomic composition and/or similar molecular structure can also be set according to the actual requirement.
Corresponding to the task of generating molecules with specific properties:
the second fine training data includes: a plurality of sets of sample pairs, each set of sample pairs comprising: and the second input sample comprises a fixed identifier and an attribute sample, the second output sample is a molecule representation sequence sample, and the molecule representation sequence sample in each group of sample pairs is a molecule representation sequence sample of similar molecules which have the attribute and meet a preset similar condition. Among these, similar conditions can be found in the new molecule generation task described above. Unlike the task of generating new molecules described above, the task here also requires that the new molecules have specific attributes, and therefore, the input sample also includes an attribute sample, and the selected output sample-corresponding molecule also needs to have an attribute corresponding to the attribute sample. Attributes refer to biological, physical, chemical, etc. properties that a molecule possesses, such as toxicity, activity, etc. In practical application, the attribute values of the attributes may be configured in advance, then one attribute may be selected as an attribute sample, a molecule expression sequence of similar molecules having the selected attribute is used as a second output sample, a vector corresponding to a fixed identifier including attribute sample information is used as an input vector of a molecule understanding model, and a molecule model of a corresponding task is obtained by training using the input vector and the second output sample. The vector corresponding to the fixed identifier containing the attribute sample information can be obtained according to the principle of the subsequent application stage.
Corresponding to the molecular generation task for generating optimized molecules:
the second fine training data includes: a plurality of sets of sample pairs, each set of sample pairs comprising: and the second output sample is an output molecule representation sequence sample, and the output molecule represents the molecule corresponding to the sequence sample and is an optimized molecule of the molecule corresponding to the input molecule representation sequence sample. The optimized molecule may be selected according to the requirement, for example, a molecule having a certain property is used as the optimized molecule of the molecule to be optimized.
In this embodiment, the molecular understanding model is finely adjusted to obtain a molecular model, which can be applied to various downstream molecular tasks, reduce the training workload, and improve the training efficiency.
The above embodiments illustrate the training process of the molecular model based on which molecules can be processed in the application phase to accomplish various molecular processing tasks.
Fig. 7 is a schematic diagram according to a seventh embodiment of the present disclosure, which provides a molecular processing method based on a molecular model, where the molecular model includes a molecular understanding model and an output network, the molecular understanding model is obtained by using two different molecular representation sequence samples of the same molecule, and the processing method includes:
701. processing the molecular application input by using the molecular understanding model to obtain hidden layer output, wherein the molecular application input comprises a fixed identifier when the output network is a molecular generation network.
702. And processing the hidden layer output by adopting the output network to obtain the molecular application output.
The output network is different based on the difference in molecular processing tasks.
For example, the output network is a molecular prediction network corresponding to the molecular prediction task, and the output network is a molecular generation network corresponding to the molecular generation task.
Further, the molecular prediction network and/or the molecular generation network may also be different depending on the specific molecular prediction task and/or the specific molecular generation task.
In addition, the molecular application inputs and molecular application outputs are different based on the difference in molecular processing tasks.
For molecular prediction tasks:
referring to fig. 8, the molecular application inputs are: the molecules to be predicted represent sequences, exemplified by the SMILES sequences in fig. 8. The molecular application output is: and (5) predicting the value. The molecular representation sequence to be predicted may be a single molecular representation sequence, or a plurality of spliced molecular representation sequences.
After the SMILES sequence is input into the molecular understanding model 801, a predicted value corresponding to the SMILES sequence is output through the molecular prediction network 802 as an output network, and the predicted value may be a classification value and/or a regression value.
For the molecular generation task:
the output network is a molecular generation network; the molecular application input comprises: a fixed identifier; the molecular application output comprises: the molecules represent sequences.
Corresponding to the task of molecular generation of new molecules:
referring to the left panel of fig. 9, the molecular application inputs are: a fixed identifier; the molecular application output is: the molecular representation sequence of the new molecule is exemplified by the molecular representation sequence of SMILES in fig. 9. The fixed identifier may be [ CLS ], or the fixed identifier may be a start identifier, or the like, and in addition, the fixed identifier may be one or more, and may include a start identifier and a stop identifier, for example.
After the fixed identifier is input into the molecule understanding model 901, the SMILES sequence of the new molecule is output through the molecule generating network 902 as an output network.
Corresponding to the task of generating molecules with specific properties:
referring to the middle panel of fig. 9, the molecular application inputs are: a fixed identifier and specific attribute information; the molecular application output is: a molecule representing a sequence of a new molecule having the specified property.
After the fixed identifier and the specific attribute information are input into the molecular understanding model 901, the embedding layer may convert the fixed identifier and the specific information into a vector corresponding to the fixed identifier containing the specific attribute information, and then output the SMILES sequence having the specific attribute information through the molecular generation network 902 serving as an output network.
The fixed identifier and the vector corresponding to the fixed identifier containing the specific attribute information are represented by different filling manners in fig. 9, and the vector corresponding to the fixed identifier containing the specific attribute information may be obtained by multiplying the attribute value corresponding to the specific attribute information by the value corresponding to the fixed identifier and converting the product into a vector by using an embedding layer; alternatively, the embedding layer may include: the character embedding layer converts the fixed identifier into a fixed identifier vector, the attribute embedding layer converts the attribute value of the specific attribute information into an attribute vector, and then the fixed identifier vector and the attribute vector are added to obtain the attribute value.
Corresponding to the molecular generation task for generating optimized molecules:
referring to the right panel of fig. 9, the molecular application inputs are: fixing the identifier and the molecule representation sequence to be optimized; the molecular application output is: the optimized molecules represent the sequence.
After the fixed identifier and the SMILES sequence to be optimized are input into the molecular understanding model 901, the optimized SMILES sequence is output through the molecular generation network 902 serving as an output network.
In some embodiments, the processing the hidden layer output with the output network to obtain a molecular application output includes: searching for a molecular application output corresponding to the hidden layer output using the output network, the searching comprising: a random sampling search, or a bundled search.
Further, as shown in the left and middle diagrams of fig. 9, when a new molecule is generated or a new molecule with a specific attribute is generated, random sampling search (random sampling search) may be specifically adopted, so that a wider range of new molecules may be acquired; as shown in the right-hand diagram of fig. 9, when generating optimized molecules, beam search (beam search) may be employed, so that more targeted and accurate optimized molecules may be obtained.
In this embodiment, the molecular model is obtained by fine tuning of the molecular understanding model by using the molecular model, and is applicable to various downstream molecular tasks, and the complexity of molecular generation can be reduced by performing a molecular generation task based on the fixed identifier. In addition, different molecular tasks, such as molecular prediction tasks and/or molecular generation tasks, can be accomplished through different output networks and different molecular application inputs.
Fig. 10 is a schematic diagram according to a tenth embodiment of the present disclosure. The present embodiment provides a training apparatus for a molecular understanding model, as shown in fig. 10, the apparatus 1000 includes: an acquisition module 1001, a processing module 1002 and an update module 1003.
The obtaining module 1001 is configured to obtain pre-training data, where the pre-training data includes: a first molecule represents a sequence sample and a second molecule represents a sequence sample, the first molecule represents a sequence sample and the second molecule; the processing module 1002 is configured to process the first molecular representation sequence sample by using the molecular understanding model to obtain a pre-training output; the updating module 1003 is configured to calculate a pre-training loss function according to the pre-training output and the second molecular representation sequence sample, and update a parameter of the molecular understanding model according to the pre-training loss function.
In some embodiments, the molecular understanding model comprises an encoder and a decoder; the encoder comprises a first self-attention layer, wherein the first self-attention layer adopts a bidirectional self-attention mechanism; and/or the decoder comprises a second self-attention layer, wherein the second self-attention layer adopts a unidirectional self-attention mechanism.
In some embodiments, the encoder further comprises a first shared network portion, and the decoder further comprises a second shared network portion, the first shared network portion and the second shared network portion having the same network structure and network parameters.
In some embodiments, the processing module 1002 is specifically configured to: performing bi-directional self-attention processing on the first molecular representation sequence samples by using the first self-attention layer of the encoder to obtain a bi-directional self-attention processing result; processing the bi-directional self-attention processing result with the first shared network portion of the encoder to obtain an encoded output; performing unidirectional self-attention processing on the encoded output and the generated output using the second self-attention layer of the decoder to obtain a unidirectional self-attention processing result; processing the one-way self-attention processing result using the second shared network portion of the decoder to obtain the pre-training output.
In some embodiments, the first molecular representation sequence sample is a SMILES sequence sample; and/or, the second molecule represents that the sequence sample is a SMILES sequence sample.
In this embodiment, two different molecule expression sequence samples of the same molecule are used to train the molecule understanding model, so that the characteristics of the molecule expression sequence can be fully utilized, and the molecule understanding effect of the molecule understanding model can be improved.
Fig. 11 is a schematic diagram according to an eleventh embodiment of the present disclosure. The present embodiment provides a molecular processing apparatus based on a molecular module, where the molecular model includes a molecular understanding model and an output network, the molecular understanding model is obtained by using two different molecular representation sequence samples of the same molecule, and the molecular processing apparatus 1100 includes: a first processing module 1101 and a second processing module 1102.
The first processing module 1101 is configured to process a molecular application input by using the molecular understanding model to obtain a hidden layer output, where the molecular application input includes a fixed identifier when the output network is a molecular generation network; the second processing module 1102 is configured to process the hidden layer output by using the output network to obtain a molecular application output.
In some embodiments, when the output network is a molecule generating network, the molecule application output comprises a molecule representation sequence, wherein, if the molecule generating network is used to generate a new molecule, the molecule representation sequence is a molecule representation sequence of the new molecule; or, if the molecule generation network is used to generate a new molecule with a specific property, the molecule application input further comprises: information of the specific attribute; the molecule representation sequence is a molecule representation sequence of a new molecule having the specific property; or, if the molecule generation network is used to generate optimized molecules, the molecule application input further comprises: the molecule to be optimized represents the sequence; the molecular representation sequence is an optimized molecular representation sequence.
In some embodiments, the output network is a molecular prediction network; the molecular application input comprises: the molecule to be predicted represents a sequence; the molecular application output comprises: and the molecules to be predicted represent predicted values corresponding to the sequences.
In this embodiment, the molecular model is obtained by fine tuning of the molecular understanding model by using the molecular model, and is applicable to various downstream molecular tasks, and the complexity of molecular generation can be reduced by performing a molecular generation task based on the fixed identifier.
It is understood that the same or corresponding contents in different embodiments of the present disclosure may be mutually referred, and the contents not described in detail in the embodiments may be referred to the related contents in other embodiments.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 12 shows a schematic block diagram of an example electronic device 1200, which can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 12, the electronic apparatus 1200 includes a computing unit 1201, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)1202 or a computer program loaded from a storage unit 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data necessary for the operation of the electronic apparatus 1200 may also be stored. The computing unit 1201, the ROM 1202, and the RAM 1203 are connected to each other by a bus 1204. An input/output (I/O) interface 1205 is also connected to bus 1204.
Various components in the electronic device 1200 are connected to the I/O interface 1205, including: an input unit 1206 such as a keyboard, a mouse, or the like; an output unit 1207 such as various types of displays, speakers, and the like; a storage unit 1208, such as a magnetic disk, optical disk, or the like; and a communication unit 1209 such as a network card, modem, wireless communication transceiver, etc. The communication unit 1209 allows the electronic device 1200 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 1201 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1201 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 1201 performs various methods and processes described above, such as a training method of a molecular understanding model or a molecular processing method. For example, in some embodiments, the training method of the image recognition model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1208. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 1200 via the ROM 1202 and/or the communication unit 1209. When the computer program is loaded into the RAM 1203 and executed by the computing unit 1201, one or more steps of the molecular understanding model training method or the molecular processing method described above may be performed. Alternatively, in other embodiments, the computing unit 1201 may be configured by any other suitable means (e.g., by means of firmware) to perform a molecular processing method or a training method of a molecular understanding model.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (19)

1. A method of training a molecular understanding model, comprising:
obtaining pre-training data, the pre-training data comprising: a first molecule represents a sequence sample and a second molecule represents a sequence sample, the first molecule represents a sequence sample and the second molecule;
processing the first molecular representation sequence sample by using the molecular understanding model to obtain a pre-training output;
and calculating a pre-training loss function according to the pre-training output and the second molecular representation sequence sample, and updating parameters of the molecular understanding model according to the pre-training loss function.
2. The method of claim 1, wherein,
the molecular understanding model comprises an encoder and a decoder;
the encoder comprises a first self-attention layer, wherein the first self-attention layer adopts a bidirectional self-attention mechanism; and/or the decoder comprises a second self-attention layer, wherein the second self-attention layer adopts a unidirectional self-attention mechanism.
3. The method of claim 2, wherein,
the encoder further comprises a first shared network portion and the decoder further comprises a second shared network portion, the first shared network portion and the second shared network portion having the same network structure and network parameters.
4. The method of claim 3, wherein said processing the first molecular representation sequence samples using the molecular understanding model to obtain a pre-training output comprises:
performing bi-directional self-attention processing on the first molecular representation sequence samples by using the first self-attention layer of the encoder to obtain a bi-directional self-attention processing result;
processing the bi-directional self-attention processing result with the first shared network portion of the encoder to obtain an encoded output;
performing unidirectional self-attention processing on the encoded output and the generated output using the second self-attention layer of the decoder to obtain a unidirectional self-attention processing result;
processing the one-way self-attention processing result using the second shared network portion of the decoder to obtain the pre-training output.
5. The method of any one of claims 1-4,
the first molecule represents that the sequence sample is a SMILES sequence sample; and/or the presence of a gas in the gas,
the second molecule indicates that the sequence sample is a SMILES sequence sample.
6. A molecular processing method based on a molecular model, wherein the molecular model comprises a molecular understanding model and an output network, the molecular understanding model is obtained by adopting two different molecular representation sequence samples of the same molecule, and the molecular processing method comprises the following steps:
processing a molecular application input by using the molecular understanding model to obtain a hidden layer output, wherein the molecular application input comprises a fixed identifier when the output network is a molecular generation network;
and processing the hidden layer output by adopting the output network to obtain the molecular application output.
7. The method of claim 6, wherein the molecular application output comprises a sequence of molecular representations when the output network is a molecular generation network, wherein,
if the molecule generation network is used to generate a new molecule, the molecule representation sequence is a molecule representation sequence of the new molecule; alternatively, the first and second electrodes may be,
if the molecule generation network is used to generate a new molecule with a specific property, the molecule application input further comprises: information of the specific attribute; the molecule representation sequence is a molecule representation sequence of a new molecule having the specific property; alternatively, the first and second electrodes may be,
if the molecule generation network is used to generate optimized molecules, the molecule application input further comprises: the molecule to be optimized represents the sequence; the molecular representation sequence is an optimized molecular representation sequence.
8. The method of claim 6, wherein,
when the output network is a molecular prediction network, the molecular application input comprises: the molecule to be predicted represents a sequence; the molecular application output comprises: and the molecules to be predicted represent predicted values corresponding to the sequences.
9. A training apparatus for a molecular understanding model, comprising:
an obtaining module, configured to obtain pre-training data, where the pre-training data includes: a first molecule represents a sequence sample and a second molecule represents a sequence sample, the first molecule represents a sequence sample and the second molecule;
the processing module is used for processing the first molecular representation sequence sample by adopting the molecular understanding model to obtain pre-training output;
and the updating module is used for calculating a pre-training loss function according to the pre-training output and the second molecular representation sequence sample, and updating the parameters of the molecular understanding model according to the pre-training loss function.
10. The apparatus of claim 9, wherein,
the molecular understanding model comprises an encoder and a decoder;
the encoder comprises a first self-attention layer, wherein the first self-attention layer adopts a bidirectional self-attention mechanism; and/or the decoder comprises a second self-attention layer, wherein the second self-attention layer adopts a unidirectional self-attention mechanism.
11. The apparatus of claim 10, wherein,
the encoder further comprises a first shared network portion and the decoder further comprises a second shared network portion, the first shared network portion and the second shared network portion having the same network structure and network parameters.
12. The apparatus of claim 11, wherein the processing module is specifically configured to:
performing bi-directional self-attention processing on the first molecular representation sequence samples by using the first self-attention layer of the encoder to obtain a bi-directional self-attention processing result;
processing the bi-directional self-attention processing result with the first shared network portion of the encoder to obtain an encoded output;
performing unidirectional self-attention processing on the encoded output and the generated output using the second self-attention layer of the decoder to obtain a unidirectional self-attention processing result;
processing the one-way self-attention processing result using the second shared network portion of the decoder to obtain the pre-training output.
13. The apparatus of any one of claims 9-12,
the first molecule represents that the sequence sample is a SMILES sequence sample; and/or the presence of a gas in the gas,
the second molecule indicates that the sequence sample is a SMILES sequence sample.
14. A molecular processing apparatus based on a molecular model including a molecular understanding model obtained using two different molecular representation sequence samples of the same molecule and an output network, the molecular processing apparatus comprising:
a first processing module, configured to process a molecular application input by using the molecular understanding model to obtain a hidden layer output, where the molecular application input includes a fixed identifier when the output network is a molecular generation network;
and the second processing module is used for processing the hidden layer output by adopting the output network so as to obtain the molecular application output.
15. The apparatus of claim 14, the molecular application output comprising a sequence of molecular representations when the output network is a molecular generation network, wherein,
if the molecule generation network is used to generate a new molecule, the molecule representation sequence is a molecule representation sequence of the new molecule; alternatively, the first and second electrodes may be,
if the molecule generation network is used to generate a new molecule with a specific property, the molecule application input further comprises: information of the specific attribute; the molecule representation sequence is a molecule representation sequence of a new molecule having the specific property; alternatively, the first and second electrodes may be,
if the molecule generation network is used to generate optimized molecules, the molecule application input further comprises: the molecule to be optimized represents the sequence; the molecular representation sequence is an optimized molecular representation sequence.
16. The apparatus of claim 14, wherein,
when the output network is a molecular prediction network, the molecular application input comprises: the molecule to be predicted represents a sequence; the molecular application output comprises: and the molecules to be predicted represent predicted values corresponding to the sequences.
17. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the training method of any one of claims 1-5 or the processing method of any one of claims 6-8.
18. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the training method of any one of claims 1-5 or the processing method of any one of claims 6-8.
19. A computer program product comprising a computer program which, when executed by a processor, implements a training method according to any one of claims 1-5, or a processing method according to any one of claims 6-8.
CN202110082654.3A 2021-01-21 2021-01-21 Training method, device, equipment and medium of molecular understanding model Active CN112786108B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110082654.3A CN112786108B (en) 2021-01-21 2021-01-21 Training method, device, equipment and medium of molecular understanding model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110082654.3A CN112786108B (en) 2021-01-21 2021-01-21 Training method, device, equipment and medium of molecular understanding model

Publications (2)

Publication Number Publication Date
CN112786108A true CN112786108A (en) 2021-05-11
CN112786108B CN112786108B (en) 2023-10-24

Family

ID=75758044

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110082654.3A Active CN112786108B (en) 2021-01-21 2021-01-21 Training method, device, equipment and medium of molecular understanding model

Country Status (1)

Country Link
CN (1) CN112786108B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114937478A (en) * 2022-05-18 2022-08-23 北京百度网讯科技有限公司 Method for training a model, method and apparatus for generating molecules
CN115565607A (en) * 2022-10-20 2023-01-03 抖音视界有限公司 Method, device, readable medium and electronic equipment for determining protein information
CN117153294A (en) * 2023-10-31 2023-12-01 烟台国工智能科技有限公司 Molecular generation method of single system

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105308604A (en) * 2013-04-23 2016-02-03 菲利普莫里斯生产公司 Systems and methods for using mechanistic network models in systems toxicology
CN110534164A (en) * 2019-09-26 2019-12-03 广州费米子科技有限责任公司 Drug molecule generation method based on deep learning
CN110598206A (en) * 2019-08-13 2019-12-20 平安国际智慧城市科技股份有限公司 Text semantic recognition method and device, computer equipment and storage medium
US20200034436A1 (en) * 2018-07-26 2020-01-30 Google Llc Machine translation using neural network models
CN110929869A (en) * 2019-12-05 2020-03-27 同盾控股有限公司 Attention model training method, device, equipment and storage medium
CN111640471A (en) * 2020-05-27 2020-09-08 牛张明 Method and system for predicting activity of drug micromolecules based on two-way long-short memory model
CN111916067A (en) * 2020-07-27 2020-11-10 腾讯科技(深圳)有限公司 Training method and device of voice recognition model, electronic equipment and storage medium
US20200365270A1 (en) * 2019-05-15 2020-11-19 International Business Machines Corporation Drug efficacy prediction for treatment of genetic disease
CN111967224A (en) * 2020-08-18 2020-11-20 深圳市欢太科技有限公司 Method and device for processing dialog text, electronic equipment and storage medium
CN112016300A (en) * 2020-09-09 2020-12-01 平安科技(深圳)有限公司 Pre-training model processing method, pre-training model processing device, downstream task processing device and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105308604A (en) * 2013-04-23 2016-02-03 菲利普莫里斯生产公司 Systems and methods for using mechanistic network models in systems toxicology
US20200034436A1 (en) * 2018-07-26 2020-01-30 Google Llc Machine translation using neural network models
US20200365270A1 (en) * 2019-05-15 2020-11-19 International Business Machines Corporation Drug efficacy prediction for treatment of genetic disease
CN110598206A (en) * 2019-08-13 2019-12-20 平安国际智慧城市科技股份有限公司 Text semantic recognition method and device, computer equipment and storage medium
CN110534164A (en) * 2019-09-26 2019-12-03 广州费米子科技有限责任公司 Drug molecule generation method based on deep learning
CN110929869A (en) * 2019-12-05 2020-03-27 同盾控股有限公司 Attention model training method, device, equipment and storage medium
CN111640471A (en) * 2020-05-27 2020-09-08 牛张明 Method and system for predicting activity of drug micromolecules based on two-way long-short memory model
CN111916067A (en) * 2020-07-27 2020-11-10 腾讯科技(深圳)有限公司 Training method and device of voice recognition model, electronic equipment and storage medium
CN111967224A (en) * 2020-08-18 2020-11-20 深圳市欢太科技有限公司 Method and device for processing dialog text, electronic equipment and storage medium
CN112016300A (en) * 2020-09-09 2020-12-01 平安科技(深圳)有限公司 Pre-training model processing method, pre-training model processing device, downstream task processing device and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MIKOLOV T 等: "Distributed representations of words and phrases and their compositionality", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS *
周奇安;李舟军;: "基于BERT的任务导向对话***自然语言理解的改进模型与调优方法", 中文信息学报, no. 05 *
李舟军;范宇;吴贤杰;: "面向自然语言处理的预训练技术研究综述", 计算机科学, no. 03 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114937478A (en) * 2022-05-18 2022-08-23 北京百度网讯科技有限公司 Method for training a model, method and apparatus for generating molecules
CN114937478B (en) * 2022-05-18 2023-03-10 北京百度网讯科技有限公司 Method for training a model, method and apparatus for generating molecules
CN115565607A (en) * 2022-10-20 2023-01-03 抖音视界有限公司 Method, device, readable medium and electronic equipment for determining protein information
CN115565607B (en) * 2022-10-20 2024-02-23 抖音视界有限公司 Method, device, readable medium and electronic equipment for determining protein information
CN117153294A (en) * 2023-10-31 2023-12-01 烟台国工智能科技有限公司 Molecular generation method of single system
CN117153294B (en) * 2023-10-31 2024-01-26 烟台国工智能科技有限公司 Molecular generation method of single system

Also Published As

Publication number Publication date
CN112786108B (en) 2023-10-24

Similar Documents

Publication Publication Date Title
CN112487173B (en) Man-machine conversation method, device and storage medium
CN112786108B (en) Training method, device, equipment and medium of molecular understanding model
CN112507706B (en) Training method and device for knowledge pre-training model and electronic equipment
CN115309877B (en) Dialogue generation method, dialogue model training method and device
CN113239157B (en) Method, device, equipment and storage medium for training conversation model
CN112861548A (en) Natural language generation and model training method, device, equipment and storage medium
CN112559885A (en) Method and device for determining training model of map interest point and electronic equipment
CN113641805A (en) Acquisition method of structured question-answering model, question-answering method and corresponding device
CN115640520A (en) Method, device and storage medium for pre-training cross-language cross-modal model
CN113642324B (en) Text abstract generation method and device, electronic equipment and storage medium
CN115358243A (en) Training method, device, equipment and storage medium for multi-round dialogue recognition model
CN112989797B (en) Model training and text expansion methods, devices, equipment and storage medium
CN113468857A (en) Method and device for training style conversion model, electronic equipment and storage medium
CN113204616B (en) Training of text extraction model and text extraction method and device
CN115757788A (en) Text retouching method and device and storage medium
CN112905917B (en) Inner chain generation method, model training method, related device and electronic equipment
CN115577705A (en) Method, device and equipment for generating text processing model and storage medium
CN115357710A (en) Training method and device for table description text generation model and electronic equipment
CN115292467A (en) Information processing and model training method, apparatus, device, medium, and program product
CN114841172A (en) Knowledge distillation method, apparatus and program product for text matching double tower model
CN114416941A (en) Generation method and device of dialogue knowledge point determination model fusing knowledge graph
CN113886543A (en) Method, apparatus, medium, and program product for generating an intent recognition model
CN113553413A (en) Dialog state generation method and device, electronic equipment and storage medium
CN113033179A (en) Knowledge acquisition method and device, electronic equipment and readable storage medium
CN113553833A (en) Text error correction method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant