CN111785319A - Drug relocation method based on differential expression data - Google Patents

Drug relocation method based on differential expression data Download PDF

Info

Publication number
CN111785319A
CN111785319A CN202010594317.8A CN202010594317A CN111785319A CN 111785319 A CN111785319 A CN 111785319A CN 202010594317 A CN202010594317 A CN 202010594317A CN 111785319 A CN111785319 A CN 111785319A
Authority
CN
China
Prior art keywords
gene
disease
logfc
genes
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010594317.8A
Other languages
Chinese (zh)
Other versions
CN111785319B (en
Inventor
鱼亮
何丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202010594317.8A priority Critical patent/CN111785319B/en
Publication of CN111785319A publication Critical patent/CN111785319A/en
Application granted granted Critical
Publication of CN111785319B publication Critical patent/CN111785319B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Bioethics (AREA)
  • Public Health (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Evolutionary Computation (AREA)
  • Physiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Analytical Chemistry (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Toxicology (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

The invention provides a drug relocation method based on differential expression data, which mainly solves the problems that differential genes are not screened and the accuracy of drug prediction is low in the prior art, and the scheme is as follows: acquiring gene data, calculating a gene expression change value and a significant value thereof, and screening out a gene set with significant differential expression; constructing two inquiry gene sets of diseases by using the gene set and the pathogenic genes; respectively acquiring correlation values of the two gene sets and the medicines and corresponding medicine prediction accuracy; comparing the accuracy of the predicted drugs of the two gene sets; calculating a difference value between the screened gene set and the standard normal distribution, and constructing a prediction model of the optimal disease query gene set according to the difference value and the calculated expression threshold value; the model is used for predicting gene sets of other diseases, calculating the accuracy of the predicted drugs and screening out candidate drugs with potential treatment effects. The prediction accuracy is high, and the method can be used for predicting candidate drugs with potential treatment effects on diseases.

Description

Drug relocation method based on differential expression data
Technical Field
The invention belongs to the technical field of data mining, and particularly relates to a drug relocation method which can be used for predicting candidate drugs with good treatment effects.
Background
It is well known in the art that drug development takes a long time, and it usually takes 11.4-13.5 years to develop new drugs from beginning to end, with costs of each drug being $ 1800 and 1800 billion. Despite the significant time and money invested in drug development, over 90% of drugs end up failing. For example, from 1949 to 2014 the U.S. food and drug administration FDA co-approved 150 drugs for cancer treatment, making the available drugs insufficient due to the existence of cancer subtypes and drug resistance, and it is therefore of great importance to explore new drug development strategies. Drug relocation is a new strategy for old drugs, saving a lot of time and cost required for drug development. Traditional drug development typically involves five stages: drug discovery and preclinical testing, drug safety review, clinical study of the drug, FDA review approval, and post-marketing FDA safety monitoring. While drug relocation has only four steps: compound identification, compound acquisition, drug development, and FDA after-market safety monitoring. The existing drug is used as a potential candidate drug to evaluate the treatment effect on the disease. This has the advantage of making the process of presuming a drug faster and less costly, taking advantage of several important drug properties that have been established including efficacy, pharmacokinetics, pharmacodynamics, and toxicity.
The current methods for drug relocation main stream can be divided into: methods based on drug information, genomics and transcriptomics data.
A method for drug relocation based on drug information.
Drug information-based approaches are primarily directed to predicting new indications for drugs by using four classes of information for drugs to explore common features among compounds. The first category is the use of chemical structure information of drugs to relocate drugs, and this category of methods uses the structural and chemical properties of drug molecules to calculate similarities between drugs based on known quantitative relationships between chemical structures and biochemical activities. Its activity is significantly increased or lost due to a slight change in the structure of the molecule. Therefore, such methods have a high false positive in application. The second method is to use the spatial structure information of the drug to reposition the drug, and this method mainly utilizes the spatial structure modeling of the drug molecule and the protein to simulate their direct physical interaction, which depends on the analysis of the drug molecule and the protein structure, however, some of the protein structures analyzed at present have errors and many important protein structures are not completely decomposed, which makes the modeling of the interaction between the compound and the protein incomplete, and therefore, this method also has high false positive. The third category is to use the side effect information of the drug to relocate the drug, the side effect information of the drug reflects the physiological consequences and the phenotypic expression of the drug, most of the methods are performed on the assumption that the side effect is associated with the disease when the drugs related to the disease share a plurality of side effects, and the accuracy of the prediction of the drug depends on whether the characteristics of the side effect of the drug have definite definitions. It is well known in the art that although strict preclinical assessments of side effects of drugs are performed, years of clinical use and post-market monitoring are required before complete identification of the side effects of a newly approved drug is possible. In addition, data redundancy is also a concern in the side effect field. The fourth category is the use of drug target information to relocate drugs, and such methods generally use drug targets and disease-related genes as drug and disease mediators to predict new drug-disease relationships. However, due to the small number of targets for current drugs, such methods often require the addition of additional information, introducing new noise. In summary, many drug relocation methods based on drug information face the problem of high false positives due to insufficient information.
Secondly, a drug relocation method based on genomics.
Genomics is the study of the structure, function and inheritance of a genome, including the interactions between genes and the human environment. Its main task is to determine the molecular sequences that make up the deoxyribonucleotide DNAs of an organism's genome. Current methods of drug relocation based on genomics can be divided into mathematical statistics and network methods. The mathematical statistics method is mainly used for judging the relationship between the drugs and the diseases by performing enrichment analysis on the gene sets corresponding to the drugs and the diseases. This approach is simple, easy to understand and implement, but depending on the disease or drug related gene set, the accuracy of the gene set may affect the accuracy of the results. The method using the network mainly constructs a genetic-related network or a disease network by using various data such as whole genome association study data, gene interaction data, metabolic data, and electronic medical records. The dependence of the method on the original data is large, the reliability of the result can be directly influenced by the incompleteness of the data set, for example, the existing protein network is incomplete, about 88 percent of the single nucleotide mutations related to diseases are positioned in intergenic and intronic regions, and the genetic effect of the parts can be hardly explained by the existing gene interaction data.
And thirdly, a drug relocation method based on transcriptomics.
Transcriptomes are the necessary links of proteomes that link genomic genetic information with biological functions, and are the main means for studying gene expression. The transcriptomics-based drug relocation method mainly aims at establishing a connection relation between differentially expressed genes or non-coding RNA in diseases and differentially expressed genes under drug perturbation to predict the influence of drugs on the diseases. For example, Chen et al, in the context of up-and down-regulation of genes, relocate the drug by dividing the genes into two parts, up-and down-regulating UCDB in disease and DCUB in disease but up-and down-regulating DCUB in drug. The method comprises the steps of using a human function connection network FLN as a background network, respectively mapping k genes with highest expression in diseases and m genes with lowest expression under the action of drugs into the network, using the k genes as seed genes, calculating prediction scores of the k genes and neighbor nodes, arranging the genes in a descending order according to the prediction scores, representing the condition that the m down-regulated genes are ranked in front through an area AUC under a characteristic curve of a subject, and using the AUC value as the score of the disease prediction drugs. Similarly, the m down-regulated genes are used as seed genes to calculate the fraction of the drug predicted disease. And synthesizing the two prediction scores to obtain the mutual prediction scores of the two groups of genes under the parameters k and m, sequencing the medicines in a descending order through the scores, calculating AUC by taking the medicines approved by FDA and the medicines used clinically as standards, obtaining the prediction result of UCDB, updating the parameters, repeating the experiment, and taking the result with the highest finally calculated AUC value as the final result of UCDB. Similarly, the prediction result of DCUB can be calculated. The accuracy of this method depends largely on the accuracy of the background network. In addition, the parameters of the disease are different in both modes, so a large number of experiments are required, which undoubtedly increases the difficulty of using the algorithm.
In addition, the join graph approach was first proposed by the join graph CMap database. The method utilizes transcriptome data to carry out reverse matching on the expression patterns of the drugs and the diseases, namely the expression patterns of genes in the diseases and the drugs are opposite, and the high expression in the diseases needs to be under the action of the drugs and vice versa. Whether the effect of the candidate drug on the disease is treatment or aggravation can be judged through the positive and negative matching scores so as to achieve the aim of screening the drug. The framework of the connection diagram method is mainly divided into three parts: gene expression profiles, disease query gene sets, and matching algorithms. At present, most of researches based on a connection diagram method framework are focused on gene expression profiles and matching algorithms, and the selection of a disease query gene set has a crucial influence on the accuracy of a prediction result. In addition, the methods select genes with remarkable differential expression, particularly those with high differential expression, as a query gene set of diseases, and completely do not consider pathogenic genes which are closely related to the diseases, so that the accuracy of the predicted result is low.
Disclosure of Invention
The invention aims to overcome the defects of the transcriptomics-based drug relocation method, and provides a drug relocation method based on differential expression data so as to provide a model for predicting an optimal disease query gene set and improve the accuracy of a prediction result.
The technical idea of the invention is as follows: finding an optimal query gene set for the disease through pathogenic genes and differential expression genes of the disease; constructing a linear model for predicting an optimal inquiry gene set of diseases according to experimental results of various diseases; and predicting the optimal query gene set of other diseases through the linear model to calculate the association score of the diseases and the medicines.
According to the technical idea, the implementation scheme of the invention comprises the following steps:
(1) downloading pathogenic gene data by using an OMIM database, and searching a pathogenic gene set L of the disease;
(2) downloading gene expression data, calculating each gene expression change value logFC and corresponding false positive FDR, and screening genes;
(3) two query gene sets were constructed:
(3a) deleting the top m genes and the bottom k genes in the screened gene set with obvious differential expression to obtain a disease de-noising differential gene set A, wherein k is more than or equal to 0, and m is more than or equal to 0;
(3b) sorting the pathogenic gene set L from small to large according to the logFC sequence of each gene, adding the gene with the logFC <0 to the top of the disease de-noising difference gene set A, and adding the gene with the logFC >0 to the bottom of the disease de-noising difference gene set A to obtain a disease gene set B of the difference and pathogenic genes;
(4) respectively calculating correlation value vectors AS of the difference gene set A and each medicine for denoising diseasesAAnd correlation value vector AS of disease gene set B of difference and pathogenic gene and each medicineB
(5) Calculating the accuracy P of the predicted drug of the denoised differential gene set AAAnd calculating the difference and the accuracy P of the disease gene set B for predicting the medicineB
(6) Comparing the accuracy P obtained in the step (5)AAnd PB
If PBHigher than PAIncreasing the values of the parameters k and m, deleting more noises, and repeating the steps (3) - (5) until PB≤PA(ii) a Otherwise, executing the step (7);
(7) recording the current difference gene set A of denoising the disease as an optimal inquiry gene set A 'of the disease, and determining a threshold value T at two ends of the optimal inquiry gene set A' of the diseaseupAnd Tdown
(8) Setting the minimum value of the total threshold number of the sum of the top threshold number and the bottom threshold number as 20, and judging whenWhether the sum of the top threshold number and the bottom threshold number of the front is lower than a minimum value of 20: if yes, continuing to download other disease data from the TCGA database, and obtaining two end threshold values of the newly downloaded disease by adopting the steps (3) to (7)
Figure BDA0002556935170000041
And
Figure BDA0002556935170000042
executing (9) until the total threshold number sum is not less than 20; otherwise, directly executing (9);
(9) constructing a prediction model of the optimal difference gene set threshold of the disease:
(9a) randomly taking 100000 numerical points from standard normal distribution as background values, recording the numerical points as vectors E, using Kolmogorov-Smirnov test statistic J to represent distribution difference between differential expression value of disease genes and 100000 background points, using the distribution difference as independent variable X of model, and using threshold values T at two ends of optimal query gene set of all diseasesupAnd TdownA dependent variable Y as a model;
(9b) according to the parameters obtained in the step (9a), constructing a prediction model as a linear model of | Y | ═ aX + b, wherein a is the slope of the model, and b is the intercept of the model;
(10) predicting the optimal query gene set of other diseases by using a prediction model, and calculating the accuracy P of M drugs before drug ranking through (4) - (5)MAnd then selecting the M medicines to obtain all candidate medicines with potential treatment effects on diseases, wherein M represents the number of medicines and is more than or equal to 10.
Compared with the prior art, the invention has the following advantages:
1. in the invention, in the process of acquiring the disease query gene set, pathogenic gene data are added, the query gene sets of two diseases are constructed by combining the differential genes and the pathogenic genes, the influence of the pathogenic genes on the diseases is fully considered, and the noise removal operation is carried out on the differential genes of the diseases by comparing the query gene sets of the two diseases.
2. The invention uses not less than 10 diseases to denoise the differential genes, uses the threshold value to express the gene set, uses the experimental data of various diseases and the threshold value thereof to construct the prediction model of the optimal differential gene set, uses the model to predict the optimal differential gene set for other diseases, and obtains more accurate medicine prediction accuracy compared with the prior art.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a graph showing the accuracy of the prediction of sarcoma drug relocation using the present invention.
Detailed Description
The embodiments and effects of the present invention will be described in further detail below with reference to the accompanying drawings and specific embodiments.
This example illustrates breast cancer in the TCGA database of cancer genomes, where all diseases in the TCGA database where the causative gene is present are used.
Referring to fig. 1, the drug relocation method based on differential expression data is implemented by the following steps:
step 1, downloading pathogenic gene data.
This example utilizes the OMIM database to search the set of causative genes for breast cancer, for a total of 22 genes, designated set L.
And 2, downloading gene expression data, calculating each gene expression change value logFC and corresponding false positive FDR, and screening genes.
2.1) downloading breast cancer data and calculating the gene expression change value logFC:
2.1.1) downloading gene expression data of the breast cancer data from a TCGA database, and identifying the sample types of the gene expression data in the TCGA according to a Barcode code provided by the TCGA to obtain a cancer patient sample set and a normal person sample set;
2.1.2) calculating the expression change value logFC of the genes in the breast cancer according to the downloaded breast cancer data:
Figure BDA0002556935170000051
wherein e islThe expression value of the gene in a sample l is represented, l represents the sample serial number in the gene expression data, when S +1 is not less than l and not more than S + W, the sample l is a cancer patient sample, when S +1 is not less than l and not more than S + W, the sample l is a normal person sample, S represents the number of the cancer patient samples, and W represents the number of the normal person samples;
2.2) calculating the false positive FDR corresponding to the gene in the breast cancer:
2.2.1) constructing an expression value vector θ for a gene from gene expression data in breast cancer:
θ=(e1,e2,…,el,…,es,es+1,es+2,…,es+f,…,eS+W),
wherein e islExpressing the expression value of the gene in the cancer sample l, and 1. ltoreq. l.ltoreq.S, eS+fExpressing the expression value of the gene in a normal human sample S + f, wherein S + f is more than or equal to 1 and less than or equal to W, f represents the sample serial number of the normal human in the gene expression data, and S + W represents the total sample number of the gene expression data;
2.2.2) randomly scrambling the serial numbers of the samples in the expression value vector theta of the genes in the breast cancer to obtain the expression value vector labeled by the scrambled samples
Figure BDA0002556935170000061
Wherein,
Figure BDA0002556935170000062
expressing the expression value of the gene in the cancer sample l after the label is disturbed,
Figure BDA0002556935170000063
expressing the expression value of the gene in a normal human sample S + f after the label is disturbed;
2.2.3) vector θ*Performing random test, and calculating theta in the v random test*Is calculated as a statistical result value Hv
Figure BDA0002556935170000064
2.2.4) repeating the step (2.2.3) for 1000 times, and calculating the statistical significance value F of each gene expression change value:
Figure BDA0002556935170000065
wherein C represents the checking statistic value, and P represents theta in the random test*Is higher than the set of C, | P { H |V|HVV is more than C and more than or equal to 1 and less than or equal to 1000, and represents the number of elements in the set P;
2.2.5) rank the F from small to large, calculate the false positive FDR of the gene:
Figure BDA0002556935170000066
wherein N represents the total number of genes and i represents the ordering of the genes;
2.3) screening genes from the gene expression variance logFC and false positive FDR corresponding to the genes:
2.3.1) setting the threshold for differential changes in gene expression change value logFC to 1.5 and the threshold for false positive FDR to α to 0.05, respectively;
2.3.2) comparing the gene expression change value logFC and the false positive FDR with respective threshold values, and screening genes of which the logFC | is more than or equal to 1.5 and the FDR is less than or equal to 0.05 as the genes with obvious differential expression of the breast cancer, wherein 3166 genes with obvious differential expression of the breast cancer are selected in the example.
And 3, constructing two query gene sets of the breast cancer.
3.1) deleting the topmost m genes and the bottommost k genes in the screened gene set with obvious differential expression to obtain a de-noised differential gene set A, wherein k is more than or equal to 0, m is more than or equal to 0, and the parameter selected in the de-noised differential gene set A for the breast cancer in the example is k which is 0 and m which is 0;
3.2) sequencing a pathogenic gene set L of the breast cancer from small to large according to the logFC sequence of each gene, adding the gene with the logFC <0 to the top of a differential gene set A for denoising the breast cancer, and adding the gene with the logFC >0 to the bottom of the differential gene set A for denoising the breast cancer to obtain a disease gene set B of the difference of the breast cancer and the pathogenic gene;
step 4, respectively calculating correlation value vectors AS of the difference gene set A for breast cancer denoising and each medicineAAnd the difference of breast cancer and the correlation value vector AS of the disease gene set B of the pathogenic gene and each medicineB
4.1) calculating the correlation value vector AS of the difference gene set A and each medicine for denoising the breast cancerA
4.1.1) taking a difference gene set A for denoising the breast cancer as a query list of a Kolmogorov-Smirnov method, and taking an ordered list of genes under the action of each medicament as a reference list set in the Kolmogorov-Smirnov method;
4.1.2) constructing a position vector V (1.. n) of each gene in the reference list set, wherein n represents the number of genes in the reference list;
4.1.3) will query logFC in the list>0 as a query list Q of Kolmogorov-SmirnovupWill query logFC in the list<0 as a Down-Regulation query List Q of Kolmogorov-SmirnovdownSeparately, for each drug, a set of query genes Q is computationally investigatedupEnrichment score of ESupAnd downregulation of the query Gene set QdownEnrichment score of ESdown
Figure BDA0002556935170000071
Figure BDA0002556935170000072
Wherein,
Figure BDA0002556935170000073
score representing genes ranked top in the query list and reference list
Figure BDA0002556935170000074
Denotes the score of a gene before the query list and after the reference list, p denotes the gene rank, Vup(p) represents the position of the genes ranked as p in the upper query list in the reference list, and s1 represents the number of genes in the upper query list;
Figure BDA0002556935170000081
scores representing genes ranked later in the query list and the reference list;
Figure BDA0002556935170000082
score, V, representing genes ranked later in the query list and earlier in the reference listdown(p) represents the position of the genes ranked as p in the query list below in the reference list, and s2 represents the number of genes in the query list below;
4.1.4) enrichment score ES of the query Gene set according to (4.1.3)upAnd downregulating enrichment scores ES of the query gene setdownCalculating the correlation value as of the difference gene set A and each medicine for denoising the diseaseA
If ESupAnd ESdownThe same sign, then asA=0,
Otherwise, asA=ESup-ESdown
4.1.5) as for all drugs in the differential Gene set A byAThe scores are normalized:
Figure BDA0002556935170000083
wherein,
Figure BDA0002556935170000084
the j is more than or equal to 1 and less than or equal to q, q represents the total number of the medicaments,
Figure BDA0002556935170000085
indicates all the drug scores as in the differential gene set AAThe maximum value of (a) is,
Figure BDA0002556935170000086
indicates all the drug scores as in the differential gene set AAThe minimum value of (a) is determined,
Figure BDA0002556935170000087
representing the correlation value of the j < th > drug and the disease after normalization in the difference gene set A;
4.1.6) combining the medicines with the normalized values of the diseases thereof to obtain the correlation value vector of the difference gene set A for denoising the breast cancer and all medicines:
Figure BDA0002556935170000088
wherein,
Figure BDA0002556935170000089
expressing the normalized correlation value of the jth drug and the disease of the difference gene set A for denoising the breast cancer, wherein j is more than or equal to 1 and less than or equal to q, and q expresses the total number of the drugs;
4.2) calculating correlation value vector AS of disease gene set B of breast cancer difference and pathogenic genes and each medicineB:
4.2.1) taking the disease gene set B as a query list of the Kolmogorov-Smirnov method, taking the ordered list of the genes under the action of each medicament as a reference list set in the Kolmogorov-Smirnov method, and adopting the same steps as (4.1.1) to (4.1.5) to calculate the difference of the breast cancer and the normalized correlation value of the disease gene set B of the pathogenic genes and the medicament j
Figure BDA0002556935170000091
4.2.2) combining the normalized values of the drugs and the breast cancer to obtain the difference of the breast cancer and the correlation value vector of the disease gene set B of the pathogenic genes and all the drugs:
Figure BDA0002556935170000092
wherein,
Figure BDA0002556935170000093
the correlation value of the j drug of the disease gene set B representing the difference of the breast cancer and the pathogenic gene after being normalized with the disease.
Step 5, respectively calculating the predicted medicine accuracy P of two gene sets, namely a difference gene set A for denoising diseases and a disease gene set B for differentiating and causing diseasesAAnd PB
5.1) utilizing the correlation value vector AS of the difference gene set A and all medicines for denoising the breast cancer in the step 4ACalculating the predicted drug accuracy P of the difference gene set A for denoising the breast cancerA
5.1.1) according to ASAOrdering the medicines in a small-to-large order to obtain a first ordered medicine sequence R in a difference gene set A for denoising the breast cancerA
5.1.2) according to the correlation value vector AS of the difference gene set A and all the medicines for denoising the breast cancer in the step 4ACalculating a relevance value vector ASAAbsolute value of (a):
Figure BDA0002556935170000094
wherein,
Figure BDA0002556935170000095
represents the absolute value of the correlation score of the jth drug and the disease in the differential gene set A, wherein j is more than or equal to 1 and less than or equal to q, and q represents the total number of the drugs;
5.1.3) according to absolute value | ASAI rule from big to small, all the medicines are sequenced to obtain a second ordered medicine sequence AR of the difference gene set A for denoising the breast cancerA
5.1.4) downloading a drug set omega related to the breast cancer from a drug toxicity database CTD as a standard drug set related to the breast cancer;
5.1.5) two drug orderings R in the differential Gene set A for Breast cancer De-noisingAAnd ARAAnd the standard drug set omega of breast cancer, calculating the first drug ranking R of breast cancerAAccuracy P ofA1And second drug order AR for breast cancerAAccuracy P ofA2
Figure BDA0002556935170000101
Figure BDA0002556935170000102
Wherein M isA1Is shown in the drug order RANumber of top ranked drugs, UA1Represents MA1Number in drug set Ω, M, of individual drugsA2Expressed in the drug order ARANumber of top ranked drugs, UA2Represents MA2The number of drugs in the drug set Ω;
5.1.6) comparison of PA1And PA2If P is the size ofA1≥PA2The accuracy P of medicine prediction is carried out on the difference gene set A of breast cancer denoisingA=PA1Otherwise, PA=PA2
5.2) correlation value vector AS of disease gene set B and all drugs according to the differences and pathogenic genes in step 4BCalculating the accuracy P of the predicted drugB
5.2.1) correlation value vector AS of disease Gene set B with all drugs based on differences and disease genesBThe same procedure as in (5.1.1) to (5.1.5) was used to obtain the difference in breast cancer and the accuracy P of the disease gene set B of the causative genes in both ranksB1And PB2
5.2.2) comparison of PB1And PB2If P is the size ofB1≥PB2The difference of breast cancerAccuracy P of drug prediction of disease gene set B of heterogenous and pathogenic genesB=PB1Otherwise, PB=PB2
Step 6, comparing the accuracy P of the predicted medicine of the two inquiry gene sets A, B of the breast cancerAAnd PB
Predicting accuracy P of medicine by using difference gene set A for denoising breast cancerAAccuracy P of predicting drugs with disease Gene set B of differences and disease genesBAnd (3) comparison:
if PBHigher than PAIncreasing the values of the parameters k and m, deleting more noises, and repeating the steps 3-5 until PB≤PA
Otherwise, executing step 7;
step 7, recording the current difference gene set A for denoising the breast cancer as an optimal query gene set A 'of the breast cancer, and determining threshold values T at two ends of the optimal query gene set A' of the breast cancerupAnd Tdown
7.1) constructing two normal distributions for the gene set obtained in step 2:
7.1.1) a normal distribution symmetrical about logFC ═ 1.5 was constructed for genes in the gene set with logFC > 0:
(logFCg)down=-logFCg+3,logFCg>0,
therein, logFCgExpression Change value of Gene g, (logFC)g)downRepresents a value symmetrical about logFC of 1.5 constructed by an existing gene g;
7.1.2) constructing a normal distribution for the genes of logFC <0 in the gene set that is symmetric about logFC-1.5:
(logFCg)up=-logFCg-3,logFCg<0,
wherein (logFC)g)upRepresents a value symmetrical about logFC-1.5 constructed by an existing gene g;
7.2) calculating the mean value μ of the normal distribution in (7.1.1) symmetrical about logFC ═ 1.5downHebiaoTolerance sigmadown:
Figure BDA0002556935170000111
Figure BDA0002556935170000112
Wherein G represents the number of genes in a normal distribution;
7.3) calculating the mean value μ of the normal distribution in (7.1.2) symmetrical about logFC-1.5upAnd standard deviation σup
Figure BDA0002556935170000113
Figure BDA0002556935170000114
7.4) the maximum and minimum values of logFC in the Breast cancer optimal query Gene set A' were respectively recorded as (logFC)maxAnd (logFC)minCalculating corresponding threshold value T according to the results of (7.2) - (7.3)upAnd Tdown
Figure BDA0002556935170000115
Figure BDA0002556935170000116
In this example, the bottom gene threshold T of the optimal differential gene set A' determined for breast cancerupTop gene threshold T of optimal differential gene set a ═ 2.03down=-1.56;
And 8, determining whether to download other disease data according to the sum of the current top threshold number and the current bottom threshold number.
8.1) adding the current top threshold quantity and the bottom threshold quantity to obtain the total threshold quantity sum;
8.2) setting the minimum value of the total threshold number of the sum of the top threshold number and the bottom threshold number to 20;
8.3) comparing the total threshold number sum with a set total threshold number minimum value of 20:
if sum<20, continuing to download other disease data from the TCGA database, and adopting the steps 3-7 to obtain two end thresholds of the newly downloaded disease
Figure BDA0002556935170000121
And
Figure BDA0002556935170000122
executing step 9 until the total threshold number sum is not less than 20;
otherwise, step 9 is performed directly.
And 9, constructing a prediction model of the optimal difference gene set threshold value of the disease.
9.1) determining independent variable X and dependent variable Y of the prediction model:
9.1.1) randomly taking 100000 numerical points from the standard normal distribution as background values and marking as a vector E;
9.1.2) characterization of the distribution differences between differential expression values of disease genes and vector E using Kolmogorov-Smirnov test statistic J as independent variable X of the model;
9.1.3) two-terminal threshold T of optimal query Gene set for all diseasesupAnd TdownA dependent variable Y as a model;
9.2) according to the parameters obtained in (9.1), constructing a prediction model as a linear model with | Y | ═ aX + b, wherein a is the slope of the model, b is the intercept of the model, and the values of a and b are calculated by the following formula:
Figure BDA0002556935170000123
Figure BDA0002556935170000124
wherein,
Figure BDA0002556935170000125
means representing the independent variable X
Figure BDA0002556935170000126
Figure BDA0002556935170000127
Means of absolute value of dependent variable Y
Figure BDA0002556935170000128
D represents the number of independent variables and dependent variables used to construct the prediction model, XdDenotes the d-th value, Y, in the argumentdRepresenting the d-th value in the dependent variable.
In this example, the prediction model is: y | ═ 14.889X + 1.182.
And step 10, predicting the optimal query gene set of the sarcoma and carrying out drug prediction.
10.1) predicting the optimal query gene set of the sarcoma by using the prediction model constructed in the step 9, taking the predicted optimal query gene set of the sarcoma as an input set of Kolmogorov-Smirnov, and calculating the accuracy rates P of M drugs before drug ranking through the steps 4-5M
10.2) selecting the first M medicaments of the medicament ranking in the step (10.1) to obtain all candidate medicaments with potential treatment effect on diseases, wherein M represents the number of the medicaments, and M is more than or equal to 10.
In this example, the accuracy is P20All the drugs listed in the top 20, M-20, were selected to obtain drug candidates with potential therapeutic effects on sarcomas.
The technical effects of the present invention will be described below with reference to simulation experiments.
1. Simulation conditions are as follows:
the computer hardware CPU of the simulation experiment is Intel Core (TM) i5, the memory of the computer hardware is 4G, and the computer software: RStudio integrated development software on a WINDOWS 7V system.
2. Simulation content:
screening differentially expressed genes of colorectal adenocarcinoma, breast cancer, lung adenocarcinoma, lung squamous cell carcinoma, hepatocellular carcinoma, prostate cancer, pancreatic ductal adenocarcinoma, bladder urothelial carcinoma, glioblastoma and head and neck squamous cell carcinoma by using the method to obtain upper and lower regulated gene thresholds of an optimal differential gene set, and constructing a model; and then the model is used for carrying out simulation experiments on the accuracy distribution of each ranking section of the prediction results of the potential therapeutic drugs for the sarcoma with the disease, and the results are shown in figure 2. The vertical axis in FIG. 2 represents the accuracy of each rank of the sarcoma drug-relocation prediction result, and the horizontal axis represents the number of drugs.
As can be seen from FIG. 2, the accuracy of the predicted results of the top 10 ranked drugs reaches 90%, and the accuracy of the predicted results of the top 20 ranked drugs reaches 70%. As can be seen from the accuracy of the sarcoma drug prediction result, the invention effectively improves the accuracy of the drug relocation prediction result.

Claims (8)

1. A method of drug relocation based on differential expression data comprising the steps of:
(1) downloading pathogenic gene data by using an OMIM database, and searching a pathogenic gene set L of the disease;
(2) downloading gene expression data, calculating each gene expression change value logFC and corresponding false positive FDR, and screening genes;
(3) two query gene sets were constructed:
(3a) deleting the top m genes and the bottom k genes in the screened gene set with obvious differential expression to obtain a disease de-noising differential gene set A, wherein k is more than or equal to 0, and m is more than or equal to 0;
(3b) sorting the pathogenic gene set L from small to large according to the logFC sequence of each gene, adding the gene with the logFC <0 to the top of the disease de-noising difference gene set A, and adding the gene with the logFC >0 to the bottom of the disease de-noising difference gene set A to obtain a disease gene set B of the difference and pathogenic genes;
(4) respectively calculating correlation value vectors AS of the difference gene set A and each medicine for denoising diseasesAAnd correlation value vector AS of disease gene set B of difference and pathogenic gene and each medicineB
(5) Calculating the accuracy P of the predicted drug of the denoised differential gene set AAAnd calculating the difference and the accuracy P of the disease gene set B for predicting the medicineB
(6) Comparing the accuracy P obtained in the step (5)AAnd PB
If PBHigher than PAIncreasing the values of the parameters k and m, deleting more noises, and repeating the steps (3) - (5) until PB≤PA(ii) a Otherwise, executing the step (7);
(7) recording the current difference gene set A of denoising the disease as an optimal inquiry gene set A 'of the disease, and determining a threshold value T at two ends of the optimal inquiry gene set A' of the diseaseupAnd Tdown
(8) Setting the minimum value of the total threshold number of the sum of the top threshold number and the bottom threshold number as 20, and judging whether the sum of the current top threshold number and the current bottom threshold number is lower than the minimum value of 20: if yes, continuing to download other disease data from the TCGA database, and obtaining two end threshold values of newly downloaded diseases by adopting the steps (3) - (7)
Figure FDA0002556935160000011
And
Figure FDA0002556935160000012
executing (9) until the total threshold number sum is not less than 20; otherwise, directly executing (9);
(9) constructing a prediction model of the optimal difference gene set threshold of the disease:
(9a) randomly taking 100000 numerical points from standard normal distribution as background values, recording the numerical points as vectors E, using Kolmogorov-Smirnov test statistic J to represent distribution difference between differential expression value of disease genes and 100000 background points, using the distribution difference as independent variable X of model, and using threshold values T at two ends of optimal query gene set of all diseasesupAnd TdownA dependent variable Y as a model;
(9b) according to the parameters obtained in the step (9a), constructing a prediction model as a linear model of | Y | ═ aX + b, wherein a is the slope of the model, and b is the intercept of the model;
(10) predicting the optimal query gene set of other diseases by using a prediction model, and calculating the accuracy P of M drugs before drug ranking through (4) - (5)MAnd then selecting the M medicines to obtain all candidate medicines with potential treatment effects on diseases, wherein M represents the number of medicines and is more than or equal to 10.
2. The method according to claim 1, wherein the expression change value logFC of each gene is calculated in (2) by the following formula:
Figure FDA0002556935160000021
wherein e islThe expression value of the gene in the sample l is shown, l is the sample number in the gene expression data, when S +1 is not less than l and not more than S + W, the sample l is the cancer patient sample, when S +1 is not less than l and not more than S + W, the sample l is the normal human sample, S is the number of the cancer patient samples, and W is the number of the normal human samples.
3. The method of claim 1, wherein the false positive FDR for gene correspondence is calculated in (2) as follows:
(2a) constructing an expression value vector θ of the gene from the gene expression data:
θ=(e1,e2,…,el,…,eS,eS+1,eS+2,…,eS+f,…,eS+W),
wherein l represents the sample number of the cancer patient in the gene expression data, elExpressing the expression value of the gene in the cancer sample l, and 1. ltoreq. l.ltoreq.S, S representing the number of cancer patient samples, es+fExpressing the expression value of the gene in a normal human sample S + f, wherein S + f is more than or equal to 1 and less than or equal to W, f represents the sample number of the normal human in the gene expression data, W represents the number of normal human samples, and S + W represents a baseTotal number of samples due to expression data;
(2b) randomly scrambling the serial numbers of the samples in the expression value vector theta of the gene to obtain the expression value vector labeled by the scrambled samples
Figure FDA0002556935160000031
For vector theta*Performing random test, and calculating theta in the v random test*Is calculated as a statistical result value Hv
Figure FDA0002556935160000032
(2c) Repeating the step (2b) for 1000 times, and calculating the statistical significance value F of each gene expression change value:
Figure FDA0002556935160000033
wherein C represents the checking statistic value, and P represents theta in the random test*Is higher than the set of C, | P { H |V|HVV is more than C and more than or equal to 1 and less than or equal to 1000, and represents the number of elements in the set P;
(2e) sorting the F from small to large, calculating the false positive FDR of the gene:
Figure FDA0002556935160000034
where N represents the total number of genes and i represents the ordering of the genes.
4. The method according to claim 1, wherein the screening of the gene from the gene expression change value logFC and the false positive FDR corresponding to the gene in (2) comprises setting a threshold for differential change of the gene expression change value logFC and a threshold for false positive FDR as α, respectively, and comparing the gene expression change value logFC and its false positive FDR with their respective thresholds to screen the gene with | logFC | ≧ and FDR ≦ α as the gene whose differential expression is significant.
5. The method according to claim 1, wherein (4) the correlation value vector AS of the denoised differential gene set A and each drug is calculated separatelyAAnd correlation value vector AS of disease gene set B of difference and pathogenic gene and each medicineBThe implementation is as follows:
(4a) taking a difference gene set A subjected to disease denoising AS a query list of a Kolmogorov-Smirnov method, taking an ordered list of genes under the action of each drug AS a reference list set in the Kolmogorov-Smirnov method, and calculating a correlation value vector AS of the disease query gene set A and all drugsA
(4a1) Constructing a position vector V (1.. n) of each gene in the reference list set, wherein n represents the number of genes in the reference list;
(4a2) query logFC in list>0 as a query list Q of Kolmogorov-SmirnovupWill query logFC in the list<0 as a Down-Regulation query List Q of Kolmogorov-SmirnovdownSeparately, for each drug, a set of query genes Q is computationally investigatedupAnd downregulation of the query Gene set QdownEnrichment score of ESupAnd ESdown
Figure FDA0002556935160000041
Figure FDA0002556935160000042
Wherein,
Figure FDA0002556935160000043
a score representing the gene ranked top in both the query list and the reference list,
Figure FDA0002556935160000044
denotes the score of a gene before the query list and after the reference list, p denotes the gene rank, Vup(p) represents the position of the genes ranked as p in the upper query list in the reference list, s1 represents the number of genes in the upper query list,
Figure FDA0002556935160000045
representing the scores of genes ranked further down in both the query list and the reference list,
Figure FDA0002556935160000046
score, V, representing genes ranked later in the query list and earlier in the reference listdown(p) represents the position of the genes ranked as p in the query list below in the reference list, and s2 represents the number of genes in the query list below;
(4a3) calculating the correlation value as of the difference gene set A and each medicine for denoising the diseaseAIf ES in (4c)upAnd ESdownThe same sign, then asA0, otherwise, asA=ESup-ESdownAs for all drugs by the following formulaAAnd (3) score normalization:
Figure FDA0002556935160000047
wherein,
Figure FDA0002556935160000048
represents the correlation value of the jth drug and the disease, j is more than or equal to 1 and less than or equal to q, q represents the total number of the drugs,
Figure FDA0002556935160000049
indicates the all-drug score asAThe maximum value of (a) is,
Figure FDA00025569351600000410
indicates the all-drug score asAThe minimum value of (a) is determined,
Figure FDA00025569351600000411
expressing the correlation value of the j < th > drug and the disease after normalization;
(4a4) combining the normalized values of the drugs and the diseases to obtain a correlation value vector:
Figure FDA0002556935160000051
(4b) calculating correlation value vector AS of disease gene set B of difference and pathogenic genes and all medicinesB
(4b1) The disease gene set B is used as a query list of a Kolmogorov-Smirnov method, an ordered list of genes under the action of each medicament is used as a reference list set in the Kolmogorov-Smirnov method, and the normalized correlation values of the disease gene set B of the difference and pathogenic genes and the medicament j are calculated by adopting the same steps as (4a1) - (4a3)
Figure FDA0002556935160000052
(4b2) Combining the normalized values of the drugs and the diseases to obtain the correlation value vectors of the disease gene set B of the difference and the pathogenic genes and all the drugs:
Figure FDA0002556935160000053
6. the method of claim 1, wherein the accuracy P of the disease de-noised differential gene set A predictive drug is calculated in (5)AAnd calculating the difference and the accuracy P of the disease gene set B for predicting the medicineBThe implementation is as follows;
(5a) using the correlation value vector AS in (4)ACalculating the accuracy P of the predicted drug of the denoised differential gene set AA
(5a1) The medicines are sequenced according to the sequence from small to large to obtain a first ordered medicine sequence R of the differential gene set A for denoising diseasesACalculating a correlation value vector ASAAbsolute value of (a):
Figure FDA0002556935160000054
wherein,
Figure FDA0002556935160000055
j is more than or equal to 1 and less than or equal to q, and q represents the total number of the medicaments;
(5a2) sequencing all the medicaments according to the rule that the absolute value is from large to small to obtain a second ordered medicament sequence AR of the differential gene set A for denoising the diseaseA
(5a3) Downloading a medicine set omega related to the disease from a related database, and respectively sorting R two medicines of a difference gene set A for denoising the diseaseAAnd ARACalculation accuracy PA1And PA2
Figure FDA0002556935160000056
Figure FDA0002556935160000061
Wherein M isA1Is shown in the drug order RANumber of top ranked drugs, UA1Is shown in the drug order RAMiddle front MA1The number of known drugs in the drug set omega, MA2Expressed in the drug order ARANumber of top ranked drugs, UA2Expressed in the drug order ARAMiddle front MA2The number of drugs known in the drug set Ω;
(5a4) comparison PA1And PA2If P is the size ofA1≥PA2The accuracy P of the drug prediction of the disease de-noised difference gene set AA=PA1Otherwise, PA=PA2
(5b) Using the correlation value vector AS in (4)BCalculating the accuracy P of the drug predicted by the disease gene set B of differences and causative genesB
(5b1) Correlation value vector AS of disease gene set B and all drugs according to difference and pathogenic genesBThe accuracies P of the disease gene sets B for which the differences and causative genes were obtained in the two ranks were calculated in the same steps as (5a1) - (5a3)B1And PB2
(5b2) Comparison PB1And PB2If P is the size ofB1≥PB2Accuracy P of drug prediction of disease Gene set B of differences and disease genesB=PB1Otherwise, PB=PB2
7. The method of claim 1, wherein (7) a threshold T is determined across the set A' of disease-optimal query genesupAnd TdownThe implementation is as follows:
(7a) genes of logFC >0 and logFC <0 in the gene set whose differential expression was significant, obtained for (2), respectively, constructed normal distributions symmetrical with respect to logFC and symmetrical with respect to logFC:
(logFCg)up=-logFCg-2*,logFCg<0,
(logFCg)down=-logFCg+2*,logFCg>0,
therein, logFCgExpression Change value of Gene g, (logFC)g)upRepresents a value that is symmetric about logFC (logFC) constructed by an existing gene gg)downA threshold value indicating a difference change in gene expression change value logFC, which is a value symmetrical about logFC constructed by an existing gene g;
(7b) calculating the mean value μ of the normal distribution in (7a) symmetric about logFCdownAnd standard deviation σdown:
Figure FDA0002556935160000062
Figure FDA0002556935160000071
Wherein G represents the number of genes in a normal distribution;
(7c) calculating the mean value μ of the normal distribution in (7a) about logFC-symmetryupAnd standard deviation σup
Figure FDA0002556935160000072
Figure FDA0002556935160000073
(7d) The maximum value and the minimum value of logFC in the disease optimal query gene set A' are respectively recorded as (logFC)maxAnd (logFC)minCalculating corresponding threshold value T according to the results of (7b) - (7c)upAnd Tdown
Figure FDA0002556935160000074
Figure FDA0002556935160000075
8. The method of claim 1, wherein the values of slope a and intercept b of the prediction model | Y | ═ aX + b are calculated in (9b) by the following formula:
Figure FDA0002556935160000076
Figure FDA0002556935160000077
wherein,
Figure FDA0002556935160000078
means representing the independent variable X
Figure FDA0002556935160000079
Figure FDA00025569351600000710
Means of absolute value of dependent variable Y
Figure FDA00025569351600000711
D represents the number of independent variables and dependent variables used to construct the prediction model, XdDenotes the d-th value, Y, in the argumentdRepresenting the d-th value in the dependent variable.
CN202010594317.8A 2020-06-28 2020-06-28 Drug repositioning method based on differential expression data Active CN111785319B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010594317.8A CN111785319B (en) 2020-06-28 2020-06-28 Drug repositioning method based on differential expression data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010594317.8A CN111785319B (en) 2020-06-28 2020-06-28 Drug repositioning method based on differential expression data

Publications (2)

Publication Number Publication Date
CN111785319A true CN111785319A (en) 2020-10-16
CN111785319B CN111785319B (en) 2023-07-28

Family

ID=72761515

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010594317.8A Active CN111785319B (en) 2020-06-28 2020-06-28 Drug repositioning method based on differential expression data

Country Status (1)

Country Link
CN (1) CN111785319B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023001178A1 (en) * 2021-07-23 2023-01-26 高雄医学大学 Clinical therapeutic drug prediction and recommendation system and method for evaluating efficacy of second-generation hormone drug in treatment of prostate cancer

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005067640A2 (en) * 2004-01-07 2005-07-28 University Of Hawaii Methods for enhanced detection & analysis of differentially expressed genes using gene chip microarrays
CN107194203A (en) * 2017-06-09 2017-09-22 西安电子科技大学 Medicine method for relocating based on miRNA data and tissue specificity network
CN107292130A (en) * 2017-06-09 2017-10-24 西安电子科技大学 Medicine method for relocating based on gene mutation and gene expression
CN108664769A (en) * 2017-03-31 2018-10-16 中国科学院上海生命科学研究院 Large-scale medicine method for relocating based on cancer gene group and non-specific gene label
CN108694991A (en) * 2018-05-14 2018-10-23 武汉大学中南医院 It is a kind of to integrate the reorientation drug discovery method with drug targets information based on multiple transcription group data sets

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005067640A2 (en) * 2004-01-07 2005-07-28 University Of Hawaii Methods for enhanced detection & analysis of differentially expressed genes using gene chip microarrays
CN108664769A (en) * 2017-03-31 2018-10-16 中国科学院上海生命科学研究院 Large-scale medicine method for relocating based on cancer gene group and non-specific gene label
CN107194203A (en) * 2017-06-09 2017-09-22 西安电子科技大学 Medicine method for relocating based on miRNA data and tissue specificity network
CN107292130A (en) * 2017-06-09 2017-10-24 西安电子科技大学 Medicine method for relocating based on gene mutation and gene expression
CN108694991A (en) * 2018-05-14 2018-10-23 武汉大学中南医院 It is a kind of to integrate the reorientation drug discovery method with drug targets information based on multiple transcription group data sets

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郝逸凡;杨光;: "基于熵和Fold Change算法的抗乳腺癌药物重定位分析", 生物数学学报, no. 02 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023001178A1 (en) * 2021-07-23 2023-01-26 高雄医学大学 Clinical therapeutic drug prediction and recommendation system and method for evaluating efficacy of second-generation hormone drug in treatment of prostate cancer

Also Published As

Publication number Publication date
CN111785319B (en) 2023-07-28

Similar Documents

Publication Publication Date Title
Amaratunga et al. Exploration and analysis of DNA microarray and protein array data
Shah et al. Data mining and genetic algorithm based gene/SNP selection
Wen et al. A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network
US20020095260A1 (en) Methods for efficiently mining broad data sets for biological markers
CN112927757B (en) Gastric cancer biomarker identification method based on gene expression and DNA methylation data
CN113597645A (en) Methods and systems for reconstructing drug response and disease networks and uses thereof
Sun et al. A comprehensive comparison of supervised and unsupervised methods for cell type identification in single-cell RNA-seq
US20210090686A1 (en) Single cell rna-seq data processing
CN112131399A (en) Old medicine new use analysis method and system based on knowledge graph
CN107292130B (en) Drug method for relocating based on gene mutation and gene expression
Marko et al. Why is there a lack of consensus on molecular subgroups of glioblastoma? Understanding the nature of biological and statistical variability in glioblastoma expression data
Wu et al. DeepRetention: a deep learning approach for intron retention detection
CN114913919A (en) Intelligent reading and reporting method, system and server for genetic variation of single-gene disease
CN111785319B (en) Drug repositioning method based on differential expression data
KR20210110241A (en) Prediction system and method of cancer immunotherapy drug Sensitivity using multiclass classification A.I based on HLA Haplotype
Yan et al. Machine learning in brain imaging genomics
CN115035951A (en) Mutation signature prediction method and device, terminal equipment and storage medium
Cutigi et al. A proposal of a graph-based computational method for ranking significant set of related genes in cancer
Madjar Survival models with selection of genomic covariates in heterogeneous cancer studies
Guo et al. Recent advances in differential expression analysis for single-cell RNA-seq and spatially resolved transcriptomic studies
Su et al. DA-SRN: Omics data analysis based on the sample network optimization for complex diseases
CN110993031B (en) Analysis method, analysis device, apparatus and storage medium for autism candidate gene
Zheng et al. A structural variation genotyping algorithm enhanced by CNV quantitative transfer
TWI650664B (en) Method for establishing assessment model for protein loss of function and risk assessment method and system using the assessment model
Trajkovski Functional interpretation of gene expression data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant