CN110415763B - Method, device, equipment and storage medium for predicting interaction between medicine and target - Google Patents

Method, device, equipment and storage medium for predicting interaction between medicine and target Download PDF

Info

Publication number
CN110415763B
CN110415763B CN201910722794.5A CN201910722794A CN110415763B CN 110415763 B CN110415763 B CN 110415763B CN 201910722794 A CN201910722794 A CN 201910722794A CN 110415763 B CN110415763 B CN 110415763B
Authority
CN
China
Prior art keywords
target
matrix
targets
drug
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910722794.5A
Other languages
Chinese (zh)
Other versions
CN110415763A (en
Inventor
杨旸
俞植淼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Tencent Technology Shenzhen Co Ltd
Original Assignee
Shanghai Jiaotong University
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University, Tencent Technology Shenzhen Co Ltd filed Critical Shanghai Jiaotong University
Priority to CN201910722794.5A priority Critical patent/CN110415763B/en
Publication of CN110415763A publication Critical patent/CN110415763A/en
Application granted granted Critical
Publication of CN110415763B publication Critical patent/CN110415763B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The application discloses a method, a device, equipment and a storage medium for predicting interaction between a drug and a target, and belongs to the technical field of computers. The method comprises the following steps: obtaining a sample set of drug-target pairs; acquiring a first characteristic of a drug and a first characteristic of a target in a sample set; training a projection model from a drug space to a target space based on the first characteristics of the drug in the sample set, the first characteristics of the target, and interaction information of the drug-target pair to obtain a target projection model; when a prediction instruction of the target drug and the target is received, the first characteristics of the target drug and the first characteristics of the target are obtained, the first characteristics of the target drug and the first characteristics of the target are input into a target projection model, and an interaction result of the target drug and the target is output. Based on the process, the training effect of the target projection model is good, and the accuracy of the interaction between the medicine and the target predicted based on the target projection model is high.

Description

Method, device, equipment and storage medium for predicting interaction between medicine and target
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a method, a device, equipment and a storage medium for predicting interaction between a drug and a target.
Background
At present, research and development of medicines with brand new structures have the problems of huge cost, higher risk, long period and low success rate. To address these issues, drug repositioning techniques are an important strategy for drug development. Drug repositioning refers to the process of screening, combining or modifying existing drugs to discover new indications for the existing drugs. During drug repositioning, predicting interactions between drugs and targets helps researchers to find new targets for existing drugs or to discover new drugs that act on known targets. How to predict drug interactions with targets is the focus of research in the field of drug repositioning.
In the related art, in the process of predicting the interaction between a drug and a target, extracting the feature vector of the drug based on the chemical structure of the drug, extracting the feature vector of the target based on the protein sequence of the target, then taking the extracted feature vector as input, training a classifier, and predicting the interaction between the drug and the target according to the trained classifier.
In the process of implementing the present application, the inventors found that the related art has at least the following problems:
in the related art, the feature vector of the drug is extracted based on the chemical structure of the drug, the feature vector of the target is extracted based on the protein sequence of the target, only the features of the drug and the target are concerned in the process of extracting the feature vector, the extracted feature vector can not fully reflect the original features of the drug and the target, so that the training effect of the classifier is poor, and the accuracy of the interaction between the drug and the target predicted based on the classifier is low.
Disclosure of Invention
The embodiment of the application provides a method, a device, equipment and a storage medium for predicting interaction between a drug and a target, which can be used for solving the problems in the related art. The technical scheme is as follows:
in one aspect, embodiments of the present application provide a method of predicting interactions of a drug with a target, the method comprising:
obtaining a sample set of drug-target pairs, the sample set comprising drug-target pairs as positive samples having known interaction information and drug-target pairs as negative samples having no known interaction information;
Acquiring a first characteristic of a drug in the sample set for representing a characteristic of the drug relative to other drugs and a first characteristic of a target for representing a characteristic of the target relative to other targets;
training a projection model from a drug space to a target space based on the first feature of the drug in the sample set, the first feature of the target, and the interaction information of the drug-target pair to obtain a target projection model;
when a prediction instruction of a target drug and a target is received, acquiring a first feature of the target drug and a first feature of the target, inputting the first feature of the target drug and the first feature of the target into the target projection model, and outputting an interaction result of the target drug and the target.
In another aspect, there is provided a drug interaction prediction device with a target, the device comprising:
an acquisition module for acquiring a sample set of drug-target pairs, the sample set comprising drug-target pairs as positive samples having known interaction information and drug-target pairs as negative samples having no known interaction information;
The acquisition module is further configured to acquire a first characteristic of a drug in the sample set and a first characteristic of a target, the first characteristic of the drug being indicative of a characteristic of the drug relative to other drugs, the first characteristic of the target being indicative of a characteristic of the target relative to other targets;
the training module is used for training a projection model from a medicine space to a target space based on the first characteristics of the medicines in the sample set, the first characteristics of the targets and the interaction information of the medicine-target pairs to obtain a target projection model;
the acquisition module is further used for acquiring the first characteristics of the target drug and the first characteristics of the target when a prediction instruction of the target drug and the target is received;
an input module for inputting a first feature of the target drug and a first feature of the target into the target projection model;
and the output module is used for outputting an interaction result of the target drug and the target.
In one possible implementation, the apparatus further includes:
an extraction module for extracting a first feature of the drug and a first feature of the target;
The extraction module comprises:
an acquiring unit, configured to acquire an initial feature matrix of each drug and an initial feature matrix of each target based on basic information of a plurality of drugs and basic information of a plurality of targets, where the initial feature matrix of each drug is used to represent basic features of each drug, and the initial feature matrix of each target is used to represent basic features of each target;
the acquiring unit is further configured to acquire a first similarity matrix of the plurality of drugs and a first similarity matrix of the plurality of targets based on the initial feature matrix of each drug and the initial feature matrix of each target, respectively, where elements in the first similarity matrix of the plurality of drugs are used to represent a degree of similarity between the two drugs, and elements in the first similarity matrix of the plurality of targets are used to represent a degree of similarity between the two targets;
the decomposition unit is used for respectively carrying out singular value decomposition on the first similarity matrix of the plurality of medicines and the first similarity matrix of the plurality of targets to obtain target feature matrices of the plurality of medicines and target feature matrices of the plurality of targets; each row of the target feature matrix of the plurality of drugs is used to represent a first feature of one drug and each row of the target feature matrix of the plurality of targets is used to represent a first feature of one target.
In one possible implementation manner, the obtaining unit is further configured to obtain an initial score matrix of each target based on the basic information of the multiple targets, where an element in the initial score matrix of the target is used to represent a score of the target based on a target scoring tool;
the extraction module further comprises:
the calculation unit is used for correspondingly adding elements with the same positions in the rows where all non-zero elements in the columns are located for each column in the initial scoring matrix of the target to obtain an initial row matrix corresponding to the column; dividing the initial row matrix by the row number of the initial scoring matrix of the target to obtain a target row matrix corresponding to the column;
and the splicing unit is used for splicing the target row matrixes corresponding to all columns of the initial scoring matrix of the target to obtain an initial characteristic matrix of the target.
In one possible implementation manner, the extracting module further includes:
the wander unit is used for restarting wander randomly to the first similarity matrix of the medicines and the first similarity matrix of the targets respectively to obtain a first correlation matrix of the medicines and a first correlation matrix of the targets;
The decomposition unit is used for respectively carrying out singular value decomposition on the first correlation matrix of the medicines and the first correlation matrix of the targets to obtain target feature matrices of the medicines and target feature matrices of the targets.
In one possible implementation manner, the extracting module further includes:
the alignment unit is used for aligning the initial feature matrixes of any two medicines to obtain an alignment matrix of each medicine in the any two medicines;
the acquisition unit is used for acquiring the second characteristic of each of the two arbitrary medicines based on the alignment matrix of each of the two arbitrary medicines;
the calculating unit is used for calculating the similarity between any two medicines based on the second characteristics of each medicine in the any two medicines; obtaining a first similarity matrix of the plurality of medicines based on the similarity between any two medicines;
the acquisition unit is used for acquiring second characteristics of each target in any two targets based on an initial characteristic matrix of each target in the any two targets;
The computing unit is used for computing the similarity between the arbitrary two targets based on the second characteristic of each target in the arbitrary two targets; and obtaining a first similarity matrix of the targets based on the similarity between any two targets.
In one possible implementation manner, the obtaining unit is further configured to obtain an adjacency matrix of the plurality of medicines and an adjacency matrix of the plurality of targets based on the basic information of the plurality of medicines and the basic information of the plurality of targets, where elements in the adjacency matrix are used to indicate whether interactions exist; acquiring a second similarity matrix of the plurality of drugs and a second similarity matrix of the plurality of targets based on the adjacency matrix of the plurality of drugs and the adjacency matrix of the plurality of targets;
the wander unit is further configured to randomly restart wander to the second similarity matrix of the plurality of drugs and the second similarity matrix of the plurality of targets, respectively, to obtain a second correlation matrix of the plurality of drugs and a second correlation matrix of the plurality of targets;
the splicing unit is further used for splicing the second correlation matrixes of the medicines into the first similarity matrixes of the medicines to obtain the spliced first similarity matrixes of the medicines; splicing the second correlation matrixes of the targets into the first similarity matrixes of the targets to obtain the spliced first similarity matrixes of the targets;
The decomposition unit is further configured to perform singular value decomposition on the spliced first similarity matrix of the plurality of medicines and the spliced first similarity matrix of the plurality of targets, so as to obtain a target feature matrix of the plurality of medicines and a target feature matrix of the plurality of targets.
In one possible implementation manner, the calculating unit is further configured to calculate a similarity between any two drugs in the plurality of drugs based on the adjacency matrix of the plurality of drugs; obtaining a second similarity matrix of the plurality of medicines corresponding to the adjacency matrix based on the similarity between any two medicines; calculating the similarity between any two targets in the plurality of targets based on the adjacency matrix of the plurality of targets; and obtaining a second similarity matrix of the targets corresponding to the adjacency matrix based on the similarity between any two targets.
In another aspect, a computer device is provided, the computer device comprising a processor and a memory, the memory storing at least one program code, the at least one program code loaded and executed by the processor to implement any of the above methods of predicting interactions of a drug with a target.
In another aspect, there is provided a computer readable storage medium having stored therein at least one program code loaded and executed by a processor to implement any of the above methods of predicting interactions of a drug with a target.
The technical scheme provided by the embodiment of the application at least brings the following beneficial effects:
a projection model from the drug space to the target space is trained based on the first characteristic of the drug in the sample set, the first characteristic of the target, and the interaction information of the drug-target pair. Because the first characteristics of the medicine refer to the characteristics of the medicine relative to other medicines and the first characteristics of the target refer to the characteristics of the target relative to other targets, the obtained first characteristics are not only concerned with the medicine and the target, but also concerned with the relation between different medicines and different targets, and the obtained first characteristics can fully reflect the original characteristics of the medicine and the target, so that the training effect of the target projection model is better, and the accuracy of the interaction between the medicine and the target predicted based on the target projection model is higher.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic illustration of a drug repositioning provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of an implementation environment provided by embodiments of the present application;
FIG. 3 is a flow chart of a method for predicting interactions of a drug with a target provided in an embodiment of the present application;
FIG. 4 is a flow chart of an extraction process of a first feature of a drug and a first feature of a target provided in an embodiment of the present application;
FIG. 5 is a schematic illustration of a process for predicting drug interactions with a target provided in an embodiment of the present application;
FIG. 6 is a flow chart of an extraction process of a first feature of a drug and a first feature of a target provided in an embodiment of the present application;
FIG. 7 is a schematic illustration of a process for predicting drug interactions with a target provided in an embodiment of the present application;
FIG. 8 is a schematic diagram of a drug interaction prediction device with a target provided in an embodiment of the present application;
FIG. 9 is a schematic diagram of a drug interaction prediction device with a target provided in an embodiment of the present application;
fig. 10 is a schematic structural diagram of an extraction module according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of an extraction module according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
At present, research and development of medicines with brand new structures have the problems of huge cost, higher risk, long period and low success rate. To address these issues, drug repositioning techniques are an important strategy for drug development. Drug repositioning refers to the process of screening, combining or modifying existing drugs to discover new indications for the existing drugs. During drug repositioning, predicting interactions between drugs and targets helps researchers to find new targets for existing drugs or to discover new drugs that act on known targets. For example, as shown in fig. 1, if a known drug binds to a target, it can treat a known disease; by drug repositioning, it can be predicted that the drug will also have interactions with another target, and both combinations can be used to treat new diseases.
In this regard, the embodiments of the present application provide a method for predicting the interaction between a drug and a target, please refer to fig. 2, which is a schematic diagram illustrating an implementation environment of the method provided in the embodiments of the present application. The implementation environment may include: a terminal 210 and a service platform 220. The service platform 220 may include, among other things, a server 2201 and a database 2202. The server 2201 is used for predicting interaction of a drug with a target, and the database 2202 is used for storing basic information of the drug and basic information of the target.
The terminal 210 may send the target drug and the target to be predicted to the service platform 220, and may also receive the interaction prediction result of the target drug and the target returned by the service platform 220. The server 2201 may acquire basic information of the drug and basic information of the target from the database 2202, may also receive the target drug and the target to be predicted sent by the terminal 210, and may also return the interaction prediction result of the target drug and the target to the terminal 210.
In one possible implementation, the terminal 210 may be a smart device such as a cell phone, tablet, personal computer, or the like. The server 2201 may be a server, a server cluster formed by a plurality of servers, or a cloud computing service center. The terminal 210 establishes a communication connection with the service platform 220 through a wired or wireless network.
Those skilled in the art will appreciate that the above-described terminal 210 and service platform 220 are merely examples, and that other existing or future-occurring terminals or service platforms, as applicable, are also included within the scope of the present application and are incorporated herein by reference.
Based on the implementation environment shown in fig. 2, the embodiment of the application provides a method for predicting the interaction between a drug and a target, and the method is applied to a server as an example. As shown in fig. 3, the method provided in the embodiment of the present application may include the following steps:
In step 301, the server obtains a sample set of drug-target pairs, the sample set comprising drug-target pairs as positive samples having known interaction information and drug-target pairs as negative samples having no known interaction information.
A sample set refers to a collection of samples that contains multiple drug-target pairs, each drug-target pair containing one drug and one target. In general, drugs are bioactive small molecules, and targets are sites that can interact with the drug to exert therapeutic effects by blocking aberrant biological processes.
The sample set includes positive samples with known interaction information and negative samples without known interaction information. The interaction information is information indicating interaction between the drug and the target. Having known interaction information indicates that there is an interaction between the drug and the target, and not having known interaction information indicates that there is no interaction between the drug and the target. The interaction between the drug and the target means that the drug and the target are combined to exert the curative effect. In one possible implementation, the number of negative samples in the sample set is the same as the number of positive samples.
In a specific implementation process, the server may obtain a drug-target pair with known interaction information from a drug bank (drug bank) database, take the drug-target pair as a positive sample, then randomly combine the remaining drug and target, and take the drug-target pair generated by random combination as a negative sample. Among them, the drug database is a bioinformatics and chemoinformatics database provided by the university of alberta, which can provide details of numerous drugs.
The information of the drug-target pair interactions in the sample may be represented by a label. For example, the tag may have both 0 and 1 manifestations, the tag of a drug-target pair being 1 when the drug-target pair has known interaction information; when the drug-target pair does not have known interaction information, the label of the drug-target pair is 0. That is, the label of the positive sample is 1, and the label of the negative sample is 0.
In step 302, the server obtains a first characteristic of the drug in the sample set for characterizing the drug relative to other drugs and a first characteristic of the target for characterizing the target relative to other targets.
The first characteristic of the drug may be extracted based on the other drug to represent the characteristic of the drug relative to the other drug; the first characteristic of the target may be extracted based on the other target to represent the characteristic of the target relative to the other target. Since training is based on the features and labels during the training of the model, after the sample set is acquired, the first features of each drug and the first features of each target in the sample set need to be further acquired.
The server needs to first extract the first feature of the drug and the first feature of the target before acquiring the first feature of the drug and the first feature of the target in the sample set.
In one possible implementation, referring to fig. 4, the extraction process of the first feature of the drug and the first feature of the target comprises the steps of:
step 401, acquiring an initial feature matrix of each drug and an initial feature matrix of each target based on the basic information of the plurality of drugs and the basic information of the plurality of targets.
Wherein the initial feature matrix of each drug is used to represent the basic features of each drug, and the initial feature matrix of each target is used to represent the basic features of each target.
Basic information of a drug refers to information related to the drug, including, but not limited to, structural information of the drug and interaction information of the drug. Wherein, the structure information of the medicine includes, but is not limited to, SMILES (Simplified Molecular Input Line Entry Specification, molecular linear input specification) code of the medicine, SMILES is a specification for explicitly describing the molecular structure by character strings, and the SMILES code of the medicine describes the three-dimensional structure of the medicine by a string of characters; the interaction information of the drugs includes, but is not limited to, interaction information of the drugs with the drugs, interaction information of the drugs with side effects, interaction information of the drugs with diseases, and interaction information of the drugs with physicochemical properties.
Basic information of a target refers to information related to the target, including, but not limited to, structural information of the target and interaction information of the target. Wherein, the structural information of the target comprises, but is not limited to, protein sequence information of the target, and the interaction information of the target comprises, but is not limited to, interaction information of the target and interaction information of the target and diseases. It should be noted that the targets generally include enzymes, ion channels and receptors, which are all composed of proteins, that is, most drugs interact with proteins to act on the human body. The structural information of the target can be represented by protein sequence information. The protein sequence information refers to amino acid sequence information contained in a protein.
In a specific implementation process, the manner of obtaining basic information of the drug by the server may be: the method comprises the steps of obtaining structural information of medicines, interaction information of medicines and interaction information of physical and chemical properties from a medicine bank (medicine bank) database, obtaining interaction information of medicines and side effects from a side effect resource (Side Effect Resource, SIDER) database and obtaining interaction information of medicines and diseases from a toxicity and foundation comparison database (Comparative Toxicogenomics Database). The basic information of the target can be obtained by the following ways: structural information of the target, interaction information of the target and the target, and interaction information of the target and the disease are obtained from a human protein-related database (Human Protein Reference Database, HPRD).
In one possible implementation, based on basic information of a plurality of medicines, the process of acquiring the initial feature matrix of each medicine is as follows: based on the SMILES code of each drug, the feature vector corresponding to each character in the SMILES code of each drug is obtained, and the feature vectors of all the characters in the SMILES code of each drug are spliced to obtain the initial feature matrix of each drug.
Specifically, after obtaining the SMILES code of the drug, the feature vector corresponding to each character in the SMILES code of the drug is queried in the corpus of the SMILES code. Assuming that the length of the SMILES code of the drug is m, that is, the number of characters in the SMILES code of the drug is m, m feature vectors can be queried. And splicing the m eigenvectors to obtain an initial eigenvector of the medicine. Assuming that the length of the feature vector corresponding to each character is n, the size of the initial feature matrix of each drug is m×n, where n is a fixed number and m is related to the type of drug. Different drugs have different SMILES codes, and the different SMILES codes may be the same or different in length.
It should be noted that, feature vectors corresponding to different characters of the SMILES code are recorded in the corpus of the SMILES code, and the corpus of the SMILES code utilized in the embodiment of the present application may refer to the corpus of the SMILES code obtained by training in the related technology.
In one possible implementation, based on basic information of multiple targets, the process of acquiring the initial feature matrix of each target can be divided into the following three steps:
step one: based on basic information of a plurality of targets, an initial scoring matrix of each target is obtained, and elements in the initial scoring matrix of the target are used for representing the scores of the targets based on a target scoring tool.
Wherein the initial scoring matrix may be referred to as a PSSM matrix (Position-Specific Scoring Matrix ), and the target scoring tool may be referred to as a PSI-BLAST tool (Position-Specific Iterative Basic Local Alignment Search Tool, position-specific iterative-basic local alignment search tool).
Protein sequence information of a plurality of targets is obtained from basic information of the plurality of targets, and PSSM matrix of each target is obtained by using PSI-BLAST tool based on the protein sequence information. The size of the PSSM matrix is n×20, where N is the length of the protein sequence and 20 is the number of amino acids. The elements in the PSSM matrix represent the scores of amino acids in the protein sequence of the target at specific positions.
Specifically, the PSI-BLAST tool includes a database of protein sequences in which a frequency table is recorded of the frequency of occurrence of each amino acid at each of the 20 positions, the frequency table being generated based on the position of each amino acid in each protein sequence in the database of protein sequences. For any target, its protein sequence is input to a PSI-BLAST tool, which scans the protein sequence, extracts the frequency of occurrence of each amino acid in the protein sequence at each of the 20 positions from a frequency table, and constructs the PSSM matrix for the target from the frequency of occurrence of all amino acids in the protein sequence at each of the 20 positions.
In the process of obtaining PSSM matrix of each target by using PSI-BLAST tool, not only the kind of amino acids in protein sequence, but also the spatial arrangement structure of amino acids in protein sequence are considered.
Step two: for each column in the initial scoring matrix of the target, correspondingly adding elements with the same positions in the row where all non-zero elements in the column are located to obtain an initial row matrix corresponding to the column; dividing the initial row matrix by the row number of the initial scoring matrix of the target to obtain a target row matrix corresponding to the column.
Specifically, assuming that the length of the protein sequence of the target is N, for the ith column of the PSSM matrix of the target, if the non-zero element in the ith column is located in the jth row and the kth row, the elements located identically in the jth row and the kth row are added correspondingly, that is, the 1 st element in the ith row is added to the 1 st element in the kth row, the 2 nd element in the ith row is added to the 2 nd element in the kth row, and so on until the 20 th element in the ith row is added to the 20 th element in the kth row. Thus, an initial row matrix corresponding to the ith column of the PSSM matrix of the target is obtained. And dividing each element in the initial row matrix corresponding to the ith column by N to obtain a target row matrix corresponding to the ith column, wherein the size of the target row matrix is 1 x 20.
Since the initial scoring matrix of targets has a column number of 20, 20 target row matrices of size 1 x 20 can be obtained.
Step three: splicing the target row matrixes corresponding to all columns of the initial scoring matrix of the target to obtain an initial feature matrix of the target.
And (3) performing longitudinal splicing on the 20 target row matrixes with the sizes of 1-20 obtained in the step two, so as to obtain an initial feature matrix with the sizes of 20-20.
In one possible implementation manner, the second step and the third step may be implemented based on the following codes:
Figure BDA0002157824550000111
whether the lengths of the protein sequences of each target are the same or not, the initial feature matrix of each target has the same size (20 x 20) after the treatment of the first step to the third step, so that the similarity between different targets can be conveniently obtained based on the initial feature matrix of the target.
Step 402, acquiring a first similarity matrix of a plurality of drugs and a first similarity matrix of a plurality of targets based on the initial feature matrix of each drug and the initial feature matrix of each target, respectively.
Wherein elements in a first similarity matrix of the plurality of drugs are used to represent the degree of similarity between the two drugs, e.g., element a i,j Representing the degree of similarity between drug i and drug j. The elements in the first similarity matrix of the plurality of targets are used to represent the degree of similarity between the two targets, e.g., element b i,j Indicating the degree of similarity between target i and target j.
Since the initial feature matrix of each drug represents the basic feature of each drug itself and the initial feature matrix of each target represents the basic feature of each target itself, the first similarity matrix of the plurality of drugs and the first similarity matrix of the plurality of targets can be further acquired to study the correlations between the different drugs by the similarity between the different drugs and the correlations between the different targets by the similarity between the different targets.
In one possible implementation, based on the initial feature matrix of each drug, the first similarity matrix of the plurality of drugs is obtained by: for any two medicines, aligning the initial feature matrixes of the any two medicines to obtain an alignment matrix of each medicine in the any two medicines; acquiring a second characteristic of each of the two arbitrary drugs based on the alignment matrix of each of the two arbitrary drugs; calculating the similarity between any two medicaments based on the second characteristic of each medicament in any two medicaments; based on the similarity between any two medicines, a first similarity matrix of the medicines is obtained.
Specifically, for medicine P and medicine Q, assume that the length m of SMILES code of medicine P P Length m of SMILES code of drug q=80 Q When the length n=100 of the feature vector corresponding to each character in the SMILES code is=70, the size of the initial feature matrix of the drug P is 80×100, and the size of the initial feature matrix of the drug Q is 70×100. Aligning the initial feature matrices of the medicine P and the medicine Q, namely cutting a matrix with the size of 70 x 100 from the upper left corner of the initial feature matrices of the medicine P and the medicine Q respectively, and taking the cut matrices as alignment matrices of the medicine P and the medicine Q respectively. The alignment matrix of drug P is then flattened into a second feature of drug P and the alignment matrix of drug Q is flattened into a second feature of drug Q. The flattening of the alignment matrix into the second feature may be performed by stitching each row in the alignment matrix, so as to obtain the second feature. The second feature may be in the form of a vector.
Next, the similarity between the medicine P and the medicine Q is calculated based on the second characteristic of the medicine P and the second characteristic of the medicine Q. When the second feature is in the form of a vector, the similarity between the second feature vector of the medicine P and the second feature vector of the medicine Q can be calculated. The manner of calculating the similarity is not limited in the embodiments of the present application, and for example, the cosine similarity between the second feature vector of the medicine P and the second feature vector of the medicine Q may be calculated according to the following formula:
Figure BDA0002157824550000121
Wherein A, B represents the second eigenvector of drug P and drug Q, respectively, A i 、B i Representing the respective components of the second eigenvector a and the second eigenvector B, respectively, sim (a, B) representing the cosine similarity of the second eigenvectors of the drugs P and Q.
Cosine similarity refers to measuring the similarity between two vectors by calculating the cosine value of the angle between them. It can be determined whether the two vectors are pointing in approximately the same direction based on the cosine value of the angle between the two vectors. When the two vectors have the same direction, the cosine similarity has a value of 1; when the included angle of the two vectors is 90 degrees, the cosine similarity value is 0; when the two vectors point in diametrically opposite directions, the cosine similarity has a value of-1. The result of the cosine similarity calculation is independent of the length of the vector and is only related to the pointing direction of the vector. Cosine similarity is generally used in positive space, so that the cosine similarity generally ranges from 0 to 1, and the closer the cosine similarity is to 1, the closer the direction of the two vectors is, that is, the greater the similarity between the two medicines is.
By the method, the similarity between any two medicines can be calculated, and a first similarity matrix which represents the similarity between medicines can be obtained.
In one possible implementation, based on the initial feature matrix of each target, the first similarity matrix of the plurality of targets is obtained by: for any two targets, acquiring second features of each target in any two targets based on an initial feature matrix of each target in any two targets; calculating the similarity between any two targets based on the second characteristic of each of the any two targets; based on the similarity between any two targets, a first similarity matrix of the plurality of targets is obtained.
Since the initial feature matrix of each target is 20 x 20, each row in the initial feature matrix can be directly spliced to obtain the second feature of each target without alignment. Based on the second characteristics of each target, the process of obtaining the first similarity matrix of the multiple targets is detailed in the process of obtaining the first similarity matrix of the multiple drugs, which is not described herein.
It should be noted that, since the elements in the first similarity matrix of the plurality of medicines only represent the similarity between two medicines, it is independent of other medicines. The first similarity matrix of the multiple drugs may be randomly restarted for wander (Random Walk with Restart, RWR) to obtain a correlation score between the two drugs taking into account all other drugs in combination. Likewise, a random restart walk may also be performed on the first similarity matrix for multiple targets to obtain a correlation score between the two targets with all other targets taken into account.
The random restarting and wandering process is as follows: and randomly restarting the first similarity matrix of the plurality of medicines and the first similarity matrix of the plurality of targets to walk to obtain a first correlation matrix of the plurality of medicines and a first correlation matrix of the plurality of targets. Wherein element a in a first correlation matrix of a plurality of drugs i,j Representing a correlation score for drug j with respect to drug i, element b in a first correlation matrix of multiple targets i,j Representing the correlation score of target j with respect to target i.
In the random restart walk process, two choices are faced at each step of walk starting from a node, the choice is to move to a randomly selected neighbor node, or the choice is to jump back to the origin. The random restart walk algorithm comprises a parameter a representing the restart probability, the probability of moving to the adjacent node is 1-a, the probability distribution obtained after iteration reaches stability, and the probability distribution obtained after the iteration reaches stability can be regarded as the distribution influenced by the starting node. Restarting random walks can capture a multifaceted relationship between two nodes based on global information.
In the process of randomly restarting the migration of the first similarity matrix of the M drugs, randomly selecting an initial drug node for random restarting the migration in each round until all the drug nodes are traversed, and obtaining a column vector of M1, wherein each element in the column vector represents the relevance score of other drugs on the initial drug. After each of the M drugs is used as an initial node to finish random restart wandering, M column vectors of m×1 can be obtained, and the M column vectors are combined to obtain a first correlation matrix of m×m. According to the same manner, a first correlation matrix of n×n corresponding to N targets may be obtained.
Step 403, performing singular value decomposition on the first similarity matrix of the plurality of medicines and the first similarity matrix of the plurality of targets respectively to obtain a target feature matrix of the plurality of medicines and a target feature matrix of the plurality of targets.
Wherein each row of the target feature matrix of the plurality of drugs is used to represent a first feature of one drug and each row of the target feature matrix of the plurality of targets is used to represent a first feature of one target.
The singular value decomposition (SingularValueDecomposition, SVD) is a matrix decomposition method in linear algebra, and the singular value decomposition process is as follows: assuming that M is an M x n matrix, where all elements belong to the field K, which may be the real number field, or the complex number field, there is a decomposition such that: m=c×e×d T
The matrix C is an m×r matrix, the matrix e is a half positive definite r×r diagonal matrix, the matrix D is an n×r matrix, and the decomposition process is singular value decomposition of the matrix M. The elements on the diagonal of the matrix e are the singular values of the matrix M. And multiplying the matrix C by the matrix E to obtain a target feature matrix of the matrix M. In this embodiment of the present application, r may be a preset value, which is used to represent the dimension of the first feature.
According to the process, the target feature matrixes of a plurality of medicines and the target feature matrixes of a plurality of targets can be obtained.
In one possible implementation manner, after the first correlation matrix of the plurality of medicines and the first correlation matrix of the plurality of targets are obtained based on random restarting wander, singular value decomposition can be performed on the first correlation matrix of the plurality of medicines and the first correlation matrix of the plurality of targets respectively to obtain the target feature matrix of the plurality of medicines and the target feature matrix of the plurality of targets.
Since the first feature of each drug can be obtained according to the target feature matrix of the plurality of drugs, and the first feature of each target can be obtained according to the target feature matrix of the plurality of targets, the extraction process of the first feature of the drug and the first feature of the target can be completed according to the steps 401 to 403. It should be noted that, in addition to the first feature of the drug and the first feature of the target extracted according to the steps 401 to 403, the first feature of the drug and the first feature of the target may be extracted according to other processes, which is shown in detail in the embodiment of fig. 6.
After the extraction process of the first feature of the drug and the first feature of the target is completed, the first feature of the drug and the first feature of the target in the sample set can be obtained according to the extracted first feature of the drug and the first feature of the target, so that each sample contains the first feature of one drug and the first feature of one target.
In step 303, the server trains a projection model from the drug space to the target space based on the first feature of the drug in the sample set, the first feature of the target, and the interaction information of the drug-target pair, resulting in a target projection model.
For each sample in the sample set, the first characteristic of one drug, the first characteristic of one target, and the interaction information of the drug-target pair in the sample are included. There are two possibilities for drug-target pair interaction information: with or without known interaction information. The interaction information of a drug-target pair in a sample can be represented by a tag, which can have both 0 and 1 manifestations, the tag of the drug-target pair being 1 when the drug-target pair has known interaction information; when the drug-target pair does not have known interaction information, the label of the drug-target pair is 0. That is, the label of the positive sample is 1, and the label of the negative sample is 0.
The sample set is divided into two subsets, namely a training set and a testing set, wherein the training set is used for training the projection model, and the testing set is used for testing the accuracy of the projection model. The ratio of the training set to the test set is not limited in the embodiment of the present application, for example, 90% of samples may be used as the training set, 10% of samples may be used as the test set, and so on. Both the training set and the test set contain positive samples as well as negative samples. When the number of positive and negative samples in the sample set is the same, the ratio of positive and negative samples in the test set is 1:1, and the ratio of positive and negative samples in the training set is also 1:1.
In particular, the projection model from the drug space to the target space may be in the form of a matrix, and the process of training the projection matrix from the drug space to the target space may be:
for any positive sample in the training set, multiplying the three matrices, namely a matrix formed by the first characteristics of the medicines in the sample, an initial projection matrix and a transposed matrix of a matrix formed by the first characteristics of the targets in the sample, and judging whether elements in the output matrix are larger than a reference threshold value. If the elements in the output matrix are larger than the reference threshold, the initial projection matrix is unchanged, and training is carried out by using other samples in the training set continuously; if the elements in the output matrix are not greater than the reference threshold, adjusting the elements in the initial projection matrix until the elements in the output matrix are greater than the reference threshold, and training the adjusted projection matrix by using other samples in the training set. Wherein the reference threshold may be empirically set, for example, the reference threshold may be set to 0.7.
For any negative sample in the training set, multiplying the three matrices, namely a matrix formed by the first characteristics of the medicines in the sample, an initial projection matrix and a transposed matrix of a matrix formed by the first characteristics of the targets in the sample, and judging whether elements in the output matrix are larger than a reference threshold value. If the elements in the output matrix are not greater than the reference threshold, the initial projection matrix is unchanged, and training is carried out by using other samples in the training set continuously; if the elements in the output matrix are larger than the reference threshold, the elements in the initial projection matrix are adjusted until the elements in the output matrix are not larger than the reference threshold, and then the adjusted projection matrix is trained by using other samples in the training set. The reference threshold may be set empirically, for example, the reference threshold may be set to 0.7.
The projection matrix trained based on the above process is used as a projection model from the drug space to the target space. Testing the prediction accuracy of the trained projection model by using the test set, and if the prediction accuracy does not reach the target accuracy, continuing to train the projection model; and if the prediction accuracy rate reaches the target accuracy rate, taking the projection model obtained through training as a target projection model. Wherein the target accuracy may be empirically set, for example, the target accuracy may be set to 90%.
In the step, the output of the projection model is compared with the label of the sample, so that the projection model is subjected to supervised training, and the training effect of the projection model can be improved.
In one possible implementation, ten-fold cross-validation may be performed on the projection model obtained by training to further improve the training effect of the projection model. The basic process of ten-fold cross validation is as follows: the samples are randomly divided into 10 parts, 1 part of the samples are taken as a test set each time, the rest 9 parts are taken as a training set for training, the training is circulated for 10 times until each part of the samples are tested as the test set, and then the accuracy of the 10 times of testing is integrated to calculate the average value. And when the test accuracy obtained based on ten-fold cross validation reaches the target accuracy, taking the projection model obtained through training as a target projection model.
In step 304, when a prediction instruction of the target drug and the target is received, the server acquires the first feature of the target drug and the first feature of the target, inputs the first feature of the target drug and the first feature of the target into the target projection model, and outputs an interaction result of the target drug and the target.
When a prediction instruction of the target drug and the target is received, the server inputs the first characteristics of the target drug and the first characteristics of the target into a target projection model, and whether interaction exists between the target drug and the target can be judged according to the output interaction result.
Specifically, three matrices, namely a matrix formed by the first features of the target drug, a projection matrix corresponding to the target projection model and a transposed matrix of a matrix formed by the first features of the target, are multiplied, a matrix obtained by multiplying the three matrices is output, and elements in the output matrix represent interaction results of the target drug and the target. If the element in the output matrix is greater than the reference threshold, indicating that interaction exists between the target drug and the target; if the elements in the output matrix are not greater than the reference threshold, it is indicated that there is no interaction between the target drug and the target. Wherein the reference threshold may be empirically set, for example, the reference threshold may be set to 0.7.
The whole process of predicting the interaction of the drug and the target by the server can be as shown in fig. 5, firstly, acquiring an initial feature matrix of the drug based on the SMILES code of the drug, and acquiring the initial feature matrix of the drug based on the protein sequence of the target; then, based on the initial feature matrix of the medicine and the initial feature matrix of the target respectively, acquiring a first similarity matrix of a plurality of medicines and a first similarity matrix of a plurality of targets; and then, randomly restarting the operations such as wandering, singular value decomposition and the like on the first similarity matrix of the medicines and the first similarity matrix of the targets respectively to obtain target feature matrices of the medicines and target feature matrices of the targets. And performing supervised training on the projection model based on the target feature matrixes of the medicines, the target feature matrixes of the targets and the sample set, and obtaining an interaction result of the target medicines and the target targets based on the trained target projection model. Wherein, the size of the target feature matrix of the medicines is m x n, m represents the quantity of the medicines, and n represents the feature dimension of the medicines; the size of the target feature matrix of the multiple targets is p×q, p represents the number of targets, and q represents the feature dimension of the targets.
In an embodiment of the present application, a projection model from a drug space to a target space is trained based on first characteristics of a drug in a sample set, first characteristics of a target, and interaction information of a drug-target pair. Because the first characteristics of the medicine refer to the characteristics of the medicine relative to other medicines and the first characteristics of the target refer to the characteristics of the target relative to other targets, the obtained first characteristics are not only concerned with the medicine and the target, but also concerned with the relation between different medicines and different targets, and the obtained first characteristics can fully reflect the original characteristics of the medicine and the target, so that the training effect of the target projection model is better, and the accuracy of the interaction between the medicine and the target predicted based on the target projection model is higher.
The embodiment of the application provides an extraction process of a first feature of a drug and a first feature of a target, and the extraction process is applied to a server as an example. As shown in fig. 6, the extraction process provided in the embodiment of the present application may include the following steps:
in step 601, the server acquires an initial feature matrix for each drug and an initial feature matrix for each target based on the basic information of the plurality of drugs and the basic information of the plurality of targets.
The implementation of this step is detailed in step 401, and will not be described here again.
In step 602, the server obtains an adjacency matrix for the plurality of drugs and an adjacency matrix for the plurality of targets based on the base information for the plurality of drugs and the base information for the plurality of targets, the elements in the adjacency matrix being used to indicate whether an interaction exists.
The number of the adjacent matrixes of the plurality of drugs may be plural, and the number of the adjacent matrixes of the plurality of targets may be plural. For example, four adjacency matrices regarding drug-drug, drug-side effect, drug-disease, and drug-physicochemical properties can be constructed based on the interaction information of drug to drug, the interaction information of drug to side effect, the interaction information of drug to disease, and the interaction information of drug to physicochemical properties among the basic information of a plurality of drugs. Based on the target-to-target interaction information and the target-to-disease interaction information in the basic information of the multiple targets, two adjacency matrices for target-target, target-disease can be constructed. The elements in the adjacency matrix have only two values of 1 and 0, 1 indicating the presence of an interaction and 0 indicating the absence of an interaction.
In step 603, the server obtains a first similarity matrix of the plurality of drugs and a first similarity matrix of the plurality of targets based on the initial feature matrix of each drug and the initial feature matrix of each target, respectively; and acquiring a second similarity matrix of the plurality of drugs and a second similarity matrix of the plurality of targets based on the adjacency matrix of the plurality of drugs and the adjacency matrix of the plurality of targets, respectively.
The implementation process of acquiring the first similarity matrix of the plurality of drugs and the first similarity matrix of the plurality of targets based on the initial feature matrix of each drug and the initial feature matrix of each target is detailed in step 402, and is not described herein.
In one possible implementation manner, after the first similarity matrix of the plurality of medicines and the first similarity matrix of the plurality of targets are obtained, random restarting migration can be performed on the first similarity matrix of the plurality of medicines and the first similarity matrix of the plurality of targets respectively, so as to obtain the first correlation matrix of the plurality of medicines and the first correlation matrix of the plurality of targets.
In one possible implementation, the process of obtaining the second similarity matrix of the plurality of drugs and the second similarity matrix of the plurality of targets based on the adjacency matrix of the plurality of drugs and the adjacency matrix of the plurality of targets, respectively, is:
Calculating the similarity between any two medicaments in the medicaments based on the adjacency matrix of the medicaments; obtaining a second similarity matrix of the plurality of medicines corresponding to the adjacent matrix based on the similarity between any two medicines; calculating the similarity between any two targets in the plurality of targets based on the adjacency matrix of the plurality of targets; and obtaining a second similarity matrix of the plurality of targets corresponding to the adjacency matrix based on the similarity between any two targets.
The number of second similarity matrices of the plurality of drugs is the same as the number of adjacent matrices of the plurality of drugs, and the number of second similarity matrices of the plurality of targets is the same as the number of adjacent matrices of the plurality of targets. When the plurality of the adjacent matrixes of the plurality of medicines are in a plurality, each adjacent matrix corresponds to a second similarity matrix of one of the plurality of medicines, and when the plurality of the adjacent matrixes of the plurality of targets are in a plurality, each adjacent matrix corresponds to a second similarity matrix of one of the plurality of targets.
In one possible implementation, the similarity between any two drugs or any two targets is calculated based on the adjacency matrix in the following way: for any adjacency matrix, calculating the Jacquard similarity coefficient of any two medicines or any two targets, taking the Jacquard similarity coefficient of any two medicines as the similarity between the any two medicines, and taking the Jacquard similarity coefficient of any two targets as the similarity between the any two targets.
The Jaccard (Jaccard) similarity coefficient is a measure of similarity used in statistics to characterize a set of two objects, and is calculated as follows, taking the adjacency matrix of drug-side effects as an example:
Figure BDA0002157824550000191
wherein SE is i Representing the set of side effects of drug i, SE j Representing the set of side effects of drug j.
For example, side effects of drug i are side effect 1, side effect 2 and side effect 3, side effect of drug j is side effect 2, side effect 3 and side effect 4, and then for the adjacency matrix of drug-side effect, the jekade similarity coefficient of drug i and drug j is S (i, j) =2/4=1/2.
In step 604, the server performs random restart wandering on the second similarity matrices of the plurality of drugs and the second similarity matrices of the plurality of targets, respectively, to obtain a second correlation matrix of the plurality of drugs and a second correlation matrix of the plurality of targets.
The implementation of this step is detailed in step 402, and will not be described here again.
In step 605, the server splices the second correlation matrix of the plurality of medicines into the first similarity matrix of the plurality of medicines, to obtain a spliced first similarity matrix of the plurality of medicines; splicing the second correlation matrixes of the targets into the first similarity matrixes of the targets to obtain the first similarity matrixes of the targets after splicing.
The second correlation matrix of the plurality of drugs may be stitched laterally into the first similarity matrix of the plurality of drugs, as well as the second correlation matrix of the plurality of targets. The number of lines of the first similarity matrix of the spliced multiple medicines obtained by the splicing mode is the number of medicines, and the number of lines of the first similarity matrix of the spliced multiple targets is the number of targets.
In one possible implementation manner, in the case of obtaining the first correlation matrix of the plurality of medicines and the first correlation matrix of the plurality of targets, the second correlation matrix of the plurality of medicines may be spliced into the first correlation matrix of the plurality of medicines, so as to obtain the first correlation matrix of the plurality of medicines after the splicing; splicing the second correlation matrix of the plurality of targets into the first correlation matrix of the plurality of targets to obtain the first correlation matrix of the plurality of targets after splicing.
In step 606, the server performs singular value decomposition on the first similarity matrix of the spliced multiple drugs and the first similarity matrix of the spliced multiple targets to obtain a target feature matrix of the multiple drugs and a target feature matrix of the multiple targets.
The implementation of this step is detailed in step 403, and will not be described here again.
In one possible implementation manner, under the condition that the first correlation matrix of the spliced plurality of medicines and the first correlation matrix of the spliced plurality of targets are obtained, singular value decomposition can be performed on the first correlation matrix of the spliced plurality of medicines and the first correlation matrix of the spliced plurality of targets respectively to obtain the target feature matrix of the plurality of medicines and the target feature matrix of the plurality of targets.
Based on the extraction process of the first feature of the drug and the first feature of the target provided in the embodiments of the present application, the whole process of predicting the interaction between the drug and the target may be as shown in fig. 7, and the process shown in fig. 7 is different from the process shown in fig. 5 in that: in fig. 7, a target feature matrix of the plurality of drugs is obtained based on the first similarity matrix of the plurality of drugs and the adjacency matrix of the plurality of drugs, and a target feature matrix of the plurality of targets is obtained based on the first similarity matrix of the plurality of targets and the adjacency matrix of the plurality of targets. Other processes are the same as those of fig. 5, and will not be described again here.
In the embodiment of the application, in the process of extracting the first feature of the drug and the first feature of the target, in addition to the initial feature matrix of each drug and the initial feature matrix of each target, the adjacency matrix of the plurality of drugs and the adjacency matrix of the plurality of targets are also considered. By comprehensively considering information of different drugs and different targets in multiple aspects, the first characteristics of the extracted drugs and the first characteristics of the targets are more comprehensive.
Based on the same technical concept, referring to fig. 8, an embodiment of the present application provides an interaction prediction device of a drug and a target, the device including:
an acquisition module 801 for acquiring a sample set of drug-target pairs, the sample set comprising drug-target pairs as positive samples having known interaction information and drug-target pairs as negative samples having no known interaction information;
the obtaining module 801 is further configured to obtain a first characteristic of the drug in the sample set and a first characteristic of the target, where the first characteristic of the drug is used to represent a characteristic of the drug relative to other drugs, and the first characteristic of the target is used to represent a characteristic of the target relative to other targets;
a training module 802, configured to train a projection model from a drug space to a target space based on the first feature of the drug in the sample set, the first feature of the target, and interaction information of the drug-target pair, to obtain a target projection model;
the obtaining module 801 is further configured to obtain a first feature of the target drug and a first feature of the target when a prediction instruction of the target drug and the target is received;
An input module 803 for inputting a first feature of the target drug of interest and a first feature of the target of interest into the target projection model;
an output module 804 for outputting interaction information of the target drug and the target.
In one possible implementation, referring to fig. 9, the apparatus further includes:
an extraction module 805 for extracting a first feature of the drug and a first feature of the target.
Referring to fig. 10, the extraction module 805 includes:
an acquiring unit 8051 configured to acquire an initial feature matrix of each drug and an initial feature matrix of each target based on basic information of a plurality of drugs and basic information of a plurality of targets, the initial feature matrix of each drug being used to represent basic features of each drug, the initial feature matrix of each target being used to represent basic features of each target;
the acquiring unit 8051 is further configured to acquire, based on the initial feature matrix of each drug and the initial feature matrix of each target, a first similarity matrix of a plurality of drugs and a first similarity matrix of a plurality of targets, where elements in the first similarity matrix of the plurality of drugs are used to represent a degree of similarity between the two drugs, and elements in the first similarity matrix of the plurality of targets are used to represent a degree of similarity between the two targets;
The decomposition unit 8052 is configured to perform singular value decomposition on the first similarity matrix of the plurality of drugs and the first similarity matrix of the plurality of targets, so as to obtain a target feature matrix of the plurality of drugs and a target feature matrix of the plurality of targets; each row of the target feature matrix of the plurality of drugs is used to represent a first feature of one drug and each row of the target feature matrix of the plurality of targets is used to represent a first feature of one target.
In one possible implementation, the obtaining unit 8051 is further configured to obtain, based on basic information of a plurality of targets, an initial score matrix of each target, where an element in the initial score matrix of the target is used to represent a score of the target based on the target scoring tool;
referring to fig. 11, the extraction module 805 further includes:
the calculating unit 8053 is configured to, for each column in the initial score matrix of the target, add the elements with the same positions in the rows where all the non-zero elements in the column are located, so as to obtain an initial row matrix corresponding to the column; dividing the initial row matrix by the row number of the initial scoring matrix of the target to obtain a target row matrix corresponding to the column;
and the splicing unit 8054 is used for splicing the target row matrixes corresponding to all columns of the initial scoring matrix of the target to obtain an initial characteristic matrix of the target.
In one possible implementation, referring to fig. 11, the extracting module 805 further includes:
the wander unit 8055 is configured to randomly restart wander to the first similarity matrix of the plurality of drugs and the first similarity matrix of the plurality of targets, respectively, to obtain a first correlation matrix of the plurality of drugs and a first correlation matrix of the plurality of targets;
the decomposition unit 8052 is configured to perform singular value decomposition on the first correlation matrix of the plurality of drugs and the first correlation matrix of the plurality of targets, to obtain a target feature matrix of the plurality of drugs and a target feature matrix of the plurality of targets.
In one possible implementation, referring to fig. 11, the extracting module 805 further includes:
an alignment unit 8056, configured to align, for any two drugs, the initial feature matrices of any two drugs, to obtain an alignment matrix of each of the any two drugs;
the obtaining unit 8051 is further configured to obtain a second feature of each of the two arbitrary medicines based on the alignment matrix of each of the two arbitrary medicines;
a calculating unit 8053, configured to calculate a similarity between any two drugs based on the second characteristic of each of the any two drugs; obtaining a first similarity matrix of a plurality of medicines based on the similarity between any two medicines;
The acquiring unit 8051 is further configured to acquire, for any two targets, a second feature of each target in the any two targets based on an initial feature matrix of each target in the any two targets;
the calculating unit 8053 is further configured to calculate a similarity between any two targets based on the second feature of each target in the any two targets; based on the similarity between any two targets, a first similarity matrix of the plurality of targets is obtained.
In one possible implementation, the obtaining unit 8051 is further configured to obtain an adjacency matrix of the plurality of drugs and an adjacency matrix of the plurality of targets based on the basic information of the plurality of drugs and the basic information of the plurality of targets, where elements in the adjacency matrix are used to indicate whether an interaction exists; acquiring a second similarity matrix of the plurality of drugs and a second similarity matrix of the plurality of targets based on the adjacency matrix of the plurality of drugs and the adjacency matrix of the plurality of targets;
the wander unit 8055 is further configured to randomly restart wander to the second similarity matrix of the plurality of drugs and the second similarity matrix of the plurality of targets, respectively, to obtain a second correlation matrix of the plurality of drugs and a second correlation matrix of the plurality of targets;
The stitching unit 8054 is further configured to stitch the second correlation matrix of the plurality of medicines to the first similarity matrix of the plurality of medicines, so as to obtain a stitched first similarity matrix of the plurality of medicines; splicing the second correlation matrixes of the targets into the first similarity matrixes of the targets to obtain the first similarity matrixes of the targets after splicing;
the decomposition unit 8052 is further configured to perform singular value decomposition on the first similarity matrix of the spliced multiple drugs and the first similarity matrix of the spliced multiple targets, to obtain a target feature matrix of the multiple drugs and a target feature matrix of the multiple targets.
In one possible implementation, the calculating unit 8053 is further configured to calculate a similarity between any two drugs in the plurality of drugs based on the adjacency matrix of the plurality of drugs; obtaining a second similarity matrix of the plurality of medicines corresponding to the adjacent matrix based on the similarity between any two medicines; calculating the similarity between any two targets in the plurality of targets based on the adjacency matrix of the plurality of targets; and obtaining a second similarity matrix of the plurality of targets corresponding to the adjacency matrix based on the similarity between any two targets.
In an embodiment of the present application, a projection model from a drug space to a target space is trained based on first characteristics of a drug in a sample set, first characteristics of a target, and interaction information of a drug-target pair. Because the first characteristics of the medicine refer to the characteristics of the medicine relative to other medicines and the first characteristics of the target refer to the characteristics of the target relative to other targets, the obtained first characteristics are not only concerned with the medicine and the target, but also concerned with the relation between different medicines and different targets, and the obtained first characteristics can fully reflect the original characteristics of the medicine and the target, so that the training effect of the target projection model is better, and the accuracy of the interaction between the medicine and the target predicted based on the target projection model is higher.
It should be noted that, when the apparatus provided in the foregoing embodiment performs the functions thereof, only the division of the foregoing functional modules is used as an example, in practical application, the foregoing functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to perform all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.
Fig. 12 is a schematic structural diagram of a computer device according to an embodiment of the present invention, where the device may be a server, and the server may be a separate server or a cluster server. The server may include one or more processors (central processing units, CPU) 1201 and one or more memories 1202, where the one or more memories 1202 store at least one program code that is loaded and executed by the one or more processors 1201 to implement the method for predicting interactions between drugs and targets provided by the above-described method embodiments. Of course, the server may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.
In an exemplary embodiment, a computer readable storage medium having stored therein at least one program code loaded and executed by a processor of a computer device to implement a method of predicting interaction of any of the above drugs with a target is also provided.
Alternatively, the above-mentioned computer readable storage medium may be a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Read-Only optical disk (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.
It should be understood that references herein to "a plurality" are to two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.
The foregoing description of the exemplary embodiments of the present application is not intended to limit the invention to the particular embodiments disclosed, but on the contrary, the intention is to cover all modifications, equivalents, alternatives, and alternatives falling within the spirit and scope of the invention.

Claims (10)

1. A method of predicting interactions of a drug with a target, the method comprising:
obtaining a sample set of drug-target pairs, the sample set comprising drug-target pairs as positive samples having known interaction information and drug-target pairs as negative samples having no known interaction information;
Acquiring a first characteristic of a drug in the sample set for representing a characteristic of the drug relative to other drugs and a first characteristic of a target for representing a characteristic of the target relative to other targets;
training a projection model from a drug space to a target space based on the first feature of the drug in the sample set, the first feature of the target, and the interaction information of the drug-target pair to obtain a target projection model;
when a prediction instruction of a target drug and a target is received, acquiring a first feature of the target drug and a first feature of the target, inputting the first feature of the target drug and the first feature of the target into the target projection model, and outputting an interaction result of the target drug and the target.
2. The method of claim 1, wherein the extraction of the first feature of the drug and the first feature of the target comprises:
acquiring an initial feature matrix of each drug and an initial feature matrix of each target based on the basic information of a plurality of drugs and the basic information of a plurality of targets, wherein the initial feature matrix of each drug is used for representing the basic feature of each drug, and the initial feature matrix of each target is used for representing the basic feature of each target;
Acquiring a first similarity matrix of the plurality of medicines and a first similarity matrix of the plurality of targets based on the initial feature matrix of each medicine and the initial feature matrix of each target respectively, wherein elements in the first similarity matrix of the plurality of medicines are used for representing the similarity degree between the two medicines, and elements in the first similarity matrix of the plurality of targets are used for representing the similarity degree between the two targets;
singular value decomposition is respectively carried out on the first similarity matrix of the medicines and the first similarity matrix of the targets to obtain target feature matrices of the medicines and target feature matrices of the targets; each row of the target feature matrix of the plurality of drugs is used to represent a first feature of one drug and each row of the target feature matrix of the plurality of targets is used to represent a first feature of one target.
3. The method of claim 2, wherein the process of obtaining the initial feature matrix for each target comprises:
acquiring an initial scoring matrix of each target based on the basic information of the targets, wherein elements in the initial scoring matrix of the targets are used for representing the scores of the targets based on a target scoring tool;
For each column in the initial scoring matrix of the target, correspondingly adding elements with the same positions in the rows where all non-zero elements in the column are located to obtain an initial row matrix corresponding to the column; dividing the initial row matrix by the row number of the initial scoring matrix of the target to obtain a target row matrix corresponding to the column;
splicing the target row matrixes corresponding to all columns of the initial scoring matrix of the target to obtain an initial feature matrix of the target.
4. The method of claim 2, wherein after the obtaining the first similarity matrix for the plurality of drugs and the first similarity matrix for the plurality of targets based on the initial feature matrix for each drug and the initial feature matrix for each target, respectively, the method further comprises:
randomly restarting the first similarity matrix of the plurality of medicines and the first similarity matrix of the plurality of targets to walk to obtain a first correlation matrix of the plurality of medicines and a first correlation matrix of the plurality of targets;
singular value decomposition is performed on the first similarity matrix of the plurality of medicines and the first similarity matrix of the plurality of targets to obtain a target feature matrix of the plurality of medicines and a target feature matrix of the plurality of targets, including:
Singular value decomposition is carried out on the first correlation matrix of the medicines and the first correlation matrix of the targets respectively to obtain target feature matrices of the medicines and target feature matrices of the targets.
5. The method of claim 2, wherein the obtaining the first similarity matrix for the plurality of drugs and the first similarity matrix for the plurality of targets based on the initial feature matrix for each drug and the initial feature matrix for each target, respectively, comprises:
for any two medicines, aligning the initial feature matrixes of the any two medicines to obtain an alignment matrix of each medicine in the any two medicines; acquiring a second characteristic of each of the two arbitrary drugs based on the alignment matrix of each of the two arbitrary drugs; calculating the similarity between any two medicaments based on the second characteristic of each medicament in the any two medicaments; obtaining a first similarity matrix of the plurality of medicines based on the similarity between any two medicines;
for any two targets, acquiring second features of each target in the any two targets based on an initial feature matrix of each target in the any two targets; calculating the similarity between the arbitrary two targets based on the second characteristic of each target in the arbitrary two targets; and obtaining a first similarity matrix of the targets based on the similarity between any two targets.
6. The method of claim 2, wherein before performing singular value decomposition on the first similarity matrix of the plurality of drugs and the first similarity matrix of the plurality of targets, respectively, to obtain the target feature matrix of the plurality of drugs and the target feature matrix of the plurality of targets, the method further comprises:
acquiring an adjacency matrix of a plurality of medicines and an adjacency matrix of a plurality of targets based on basic information of the medicines and basic information of the targets, wherein elements in the adjacency matrix are used for representing whether interaction exists or not;
acquiring a second similarity matrix of the plurality of drugs and a second similarity matrix of the plurality of targets based on the adjacency matrix of the plurality of drugs and the adjacency matrix of the plurality of targets;
randomly restarting and wandering the second similarity matrix of the plurality of medicines and the second similarity matrix of the plurality of targets respectively to obtain a second correlation matrix of the plurality of medicines and a second correlation matrix of the plurality of targets;
singular value decomposition is performed on the first similarity matrix of the plurality of medicines and the first similarity matrix of the plurality of targets to obtain a target feature matrix of the plurality of medicines and a target feature matrix of the plurality of targets, including:
Splicing the second correlation matrix of the medicines into the first similarity matrix of the medicines to obtain a spliced first similarity matrix of the medicines; splicing the second correlation matrixes of the targets into the first similarity matrixes of the targets to obtain the spliced first similarity matrixes of the targets;
singular value decomposition is carried out on the spliced first similarity matrix of the plurality of medicines and the spliced first similarity matrix of the plurality of targets respectively to obtain target feature matrixes of the plurality of medicines and target feature matrixes of the plurality of targets.
7. The method of claim 6, wherein the obtaining a second similarity matrix for the plurality of drugs and a second similarity matrix for the plurality of targets based on the adjacency matrix for the plurality of drugs and the adjacency matrix for the plurality of targets comprises:
calculating the similarity between any two medicaments in the medicaments based on the adjacency matrix of the medicaments; obtaining a second similarity matrix of the plurality of medicines corresponding to the adjacency matrix based on the similarity between any two medicines;
Calculating the similarity between any two targets in the plurality of targets based on the adjacency matrix of the plurality of targets; and obtaining a second similarity matrix of the targets corresponding to the adjacency matrix based on the similarity between any two targets.
8. A drug interaction prediction device with a target, the device comprising:
an acquisition module for acquiring a sample set of drug-target pairs, the sample set comprising drug-target pairs as positive samples having known interaction information and drug-target pairs as negative samples having no known interaction information;
the acquisition module is further configured to acquire a first characteristic of a drug in the sample set and a first characteristic of a target, the first characteristic of the drug being indicative of a characteristic of the drug relative to other drugs, the first characteristic of the target being indicative of a characteristic of the target relative to other targets;
the training module is used for training a projection model from a medicine space to a target space based on the first characteristics of the medicines in the sample set, the first characteristics of the targets and the interaction information of the medicine-target pairs to obtain a target projection model;
The acquisition module is further used for acquiring the first characteristics of the target drug and the first characteristics of the target when a prediction instruction of the target drug and the target is received;
an input module for inputting a first feature of the target drug and a first feature of the target into the target projection model;
and the output module is used for outputting an interaction result of the target drug and the target.
9. A computer device comprising a processor and a memory, wherein the memory has stored therein at least one program code that is loaded and executed by the processor to implement a method of predicting interactions of a drug with a target as claimed in any one of claims 1 to 7.
10. A computer readable storage medium having stored therein at least one program code loaded and executed by a processor to implement a method of predicting interactions of a drug with a target as claimed in any one of claims 1 to 7.
CN201910722794.5A 2019-08-06 2019-08-06 Method, device, equipment and storage medium for predicting interaction between medicine and target Active CN110415763B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910722794.5A CN110415763B (en) 2019-08-06 2019-08-06 Method, device, equipment and storage medium for predicting interaction between medicine and target

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910722794.5A CN110415763B (en) 2019-08-06 2019-08-06 Method, device, equipment and storage medium for predicting interaction between medicine and target

Publications (2)

Publication Number Publication Date
CN110415763A CN110415763A (en) 2019-11-05
CN110415763B true CN110415763B (en) 2023-05-23

Family

ID=68366218

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910722794.5A Active CN110415763B (en) 2019-08-06 2019-08-06 Method, device, equipment and storage medium for predicting interaction between medicine and target

Country Status (1)

Country Link
CN (1) CN110415763B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110957002B (en) * 2019-12-17 2023-04-28 电子科技大学 Drug target interaction relation prediction method based on synergistic matrix decomposition
CN111191014A (en) * 2019-12-26 2020-05-22 上海科技发展有限公司 Medicine relocation method, system, terminal and medium
CN111524546B (en) * 2020-04-14 2022-05-03 湖南大学 Drug-target interaction prediction method based on heterogeneous information
CN111755078B (en) * 2020-07-30 2022-09-23 腾讯科技(深圳)有限公司 Drug molecule attribute determination method, device and storage medium
CN112133367B (en) * 2020-08-17 2024-07-12 中南大学 Method and device for predicting interaction relationship between medicine and target point
CN112151128A (en) * 2020-10-16 2020-12-29 腾讯科技(深圳)有限公司 Method, device and equipment for determining interaction information and storage medium
CN112331279A (en) * 2020-11-27 2021-02-05 上海商汤智能科技有限公司 Information processing method and device, electronic equipment and storage medium
CN112331262A (en) * 2021-01-06 2021-02-05 北京百度网讯科技有限公司 Affinity prediction method, model training method, device, equipment and medium
CN113327644B (en) * 2021-04-09 2024-05-14 中山大学 Drug-target interaction prediction method based on deep embedding learning of graph and sequence
CN113160894B (en) * 2021-04-23 2023-10-24 平安科技(深圳)有限公司 Method, device, equipment and storage medium for predicting interaction between medicine and target
CN113345523A (en) * 2021-05-28 2021-09-03 山东师范大学 Microorganism-disease association prediction method and system based on graph attention network
CN115359837A (en) * 2022-08-18 2022-11-18 京东方科技集团股份有限公司 Method and device for predicting interaction between drug and target, and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862179A (en) * 2017-11-06 2018-03-30 中南大学 A kind of miRNA disease association Relationship Prediction methods decomposed based on similitude and logic matrix
GB201804870D0 (en) * 2018-03-27 2018-05-09 Innoplexus Ag System and method for identifying potential targets for pharmaceutical compound
CN108509765A (en) * 2018-03-26 2018-09-07 中山大学 A kind of drug targets interaction prediction method based on FM-N-DNN
CN109887540A (en) * 2019-01-15 2019-06-14 中南大学 A kind of drug targets interaction prediction method based on heterogeneous network insertion

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2877126A1 (en) * 2012-06-21 2013-12-27 Georgetown University Method for predicting drug-target interactions and uses for drug repositioning
WO2016200681A1 (en) * 2015-06-08 2016-12-15 Georgetown University Predicting drug-target interactions and uses for drug repositioning and repurposing
WO2019005946A2 (en) * 2017-06-27 2019-01-03 Leighton Bonnie Berger Secure genome crowdsourcing for large-scale association studies

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862179A (en) * 2017-11-06 2018-03-30 中南大学 A kind of miRNA disease association Relationship Prediction methods decomposed based on similitude and logic matrix
CN108509765A (en) * 2018-03-26 2018-09-07 中山大学 A kind of drug targets interaction prediction method based on FM-N-DNN
GB201804870D0 (en) * 2018-03-27 2018-05-09 Innoplexus Ag System and method for identifying potential targets for pharmaceutical compound
CN109887540A (en) * 2019-01-15 2019-06-14 中南大学 A kind of drug targets interaction prediction method based on heterogeneous network insertion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种多信息融合的药物-靶标关联预测算法;彭利红;李泽军;陈敏;任日丽;;计算机工程(第06期);218-223 *

Also Published As

Publication number Publication date
CN110415763A (en) 2019-11-05

Similar Documents

Publication Publication Date Title
CN110415763B (en) Method, device, equipment and storage medium for predicting interaction between medicine and target
Peng et al. Pocket2mol: Efficient molecular sampling based on 3d protein pockets
US11810648B2 (en) Systems and methods for adaptive local alignment for graph genomes
Ezzat et al. Computational prediction of drug–target interactions using chemogenomic approaches: an empirical survey
Neyshabur et al. NETAL: a new graph-based method for global alignment of protein–protein interaction networks
US10204207B2 (en) Systems and methods for transcriptome analysis
Kalaev et al. Fast and accurate alignment of multiple protein networks
Aguilera-Mendoza et al. Automatic construction of molecular similarity networks for visual graph mining in chemical space of bioactive peptides: an unsupervised learning approach
Che et al. Drug target group prediction with multiple drug networks
US10354745B2 (en) Aligning and clustering sequence patterns to reveal classificatory functionality of sequences
Madaoui et al. Coevolution at protein complex interfaces can be detected by the complementarity trace with important impact for predictive docking
CN111402973B (en) Information matching analysis method, device, computer system and readable storage medium
CN111627494A (en) Protein property prediction method and device based on multi-dimensional features and computing equipment
Saleh et al. A population-based evolutionary search approach to the multiple minima problem in de novo protein structure prediction
CN111429991B (en) Medicine prediction method, medicine prediction device, computer equipment and storage medium
CN117312881B (en) Clinical trial treatment effect evaluation method, device, equipment and storage medium
CN111048145B (en) Method, apparatus, device and storage medium for generating protein prediction model
Materese et al. Hierarchical organization of eglin c native state dynamics is shaped by competing direct and water-mediated interactions
US20230335228A1 (en) Active Learning Using Coverage Score
Takaba et al. Edge expansion parallel cascade selection molecular dynamics simulation for investigating large-amplitude collective motions of proteins
WO2023240720A1 (en) Drug screening model construction method and apparatus, screening method, device, and medium
CN112133367B (en) Method and device for predicting interaction relationship between medicine and target point
CN115359837A (en) Method and device for predicting interaction between drug and target, and storage medium
Liaw et al. QSAR modeling: prediction of biological activity from chemical structure
CN115881211A (en) Protein sequence alignment method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant