WO2019232748A1 - 靶向rna的化学小分子药物计算机筛选方法 - Google Patents

靶向rna的化学小分子药物计算机筛选方法 Download PDF

Info

Publication number
WO2019232748A1
WO2019232748A1 PCT/CN2018/090267 CN2018090267W WO2019232748A1 WO 2019232748 A1 WO2019232748 A1 WO 2019232748A1 CN 2018090267 W CN2018090267 W CN 2018090267W WO 2019232748 A1 WO2019232748 A1 WO 2019232748A1
Authority
WO
WIPO (PCT)
Prior art keywords
rna
computer
chemical
small molecule
small
Prior art date
Application number
PCT/CN2018/090267
Other languages
English (en)
French (fr)
Inventor
崔庆华
周源
曾攀
Original Assignee
北京大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京大学 filed Critical 北京大学
Publication of WO2019232748A1 publication Critical patent/WO2019232748A1/zh

Links

Images

Definitions

  • the invention relates to a computer computer screening method for drugs, in particular to a computer small cell drug screening method for targeting RNA.
  • DNA is the store of genetic material, which is responsible for guiding the construction of proteins. Proteins are considered to be the molecules that ultimately perform specific biological functions, while RNA is considered to be the intermediate molecule that connects DNA to proteins. Therefore, traditionally, people focus more on the research of proteins (including protein-encoding DNA), and there is not much research on RNA and not enough attention. Traditional drug development is also focused on targeted proteins. For example, more than 95% of the drugs listed in the DrugBank database target proteins. However, most proteins do not have drugaable targets. At present, there are only about 400 proteins that can be targeted, so drug development that targets other types of molecules is an urgent task for disease prevention.
  • RNA-Seq noncoding RNA
  • RNA-Seq a large number of non-coding RNAs have been found. For example, nearly 40,000 miRNAs (microRNAs) have been found in humans. Long noncoding RNAs (lncRNA) Nearly 100,000. Studies have shown that these RNA molecules have important biological functions and are closely related to diseases. Even messenger RNA's functions are not limited to communicating DNA and proteins, but have a variety of important functions at the RNA level. People are beginning to realize that RNA is becoming Potential key targets for disease intervention, drug development targeting RNA is attracting widespread attention.
  • RNA targets such as small interfering RNA (siRNA), antisense oligonucleosides Acids (antisenseoligonucleotide (ASO), miRNA, aptamer, etc.
  • siRNA small interfering RNA
  • ASO antisense oligonucleosides Acids
  • miRNA miRNA
  • aptamer aptamer, etc.
  • Rovi's Mirvirasen a nucleic acid drug for the treatment of hepatitis C, specific to human liver-specific miRNA miR-122, has begun a phase 2 clinical trial.
  • nucleic acid drugs naturally have some disadvantages, such as off-target effects, exogenous macromolecules are prone to cause the body's immune response, poor stability, and difficulty in entering cells.
  • RNA such as streptomycin and tetracycline.
  • Bacterial RNA is targeted.
  • a major bottleneck that currently severely hinders the development of this field is the insufficient calculation methods for targeted chemical small molecule screening of RNA.
  • the international research group including the applicant has made attempts in this field.
  • miRNA transcriptome or miRNA-environment factor based on mRNA transcriptome (mostly chemical small molecules) bioinformatics database and prediction platform miREnvironment, small molecule and miRNA association database SM2miR and prediction algorithms, but these methods are essentially predictive Both are "functional" associations between miRNAs and small molecules, not real drug predictions that target miRNAs.
  • the object of the present invention is to provide a method for computer-based screening of small chemical drugs for RNA targeted at the shortcomings of the existing technologies.
  • the method uses the information of RNA sequence sources and the physical and chemical properties of small chemical molecules to construct a random forest model, which can be more convenient and effective Helps screen for small chemical molecules that target RNA.
  • the chemical small molecules of the present invention refer to organic molecules having a molecular weight of less than 900 Daltons.
  • a method for computer-based screening of RNA-targeted chemical small molecule drugs including the following steps: (1) collecting and organizing data sets, (2) mining features used for training prediction methods, (3) creating prediction methods and models, (4 ) Verify prediction methods and models.
  • the step (1) collects and organizes a data set
  • the steps include:
  • the information described in step (a) includes the situation where the RNA interacts with the small molecule and the specific location where the RNA interacts with the small molecule.
  • step (b) is to collect data on interactions between RNA and small molecules outside the PDB database from the SMMRNA database and literature reports, as a test data set.
  • the step (2) mining features for training prediction methods includes:
  • the related features in step (a) include sequence, structure, and function.
  • the physicochemical properties in step (b) include the number of hydrogen bonding acceptors, the number of hydrogen bonding donors, the octanol / water distribution coefficient, the molar refractive index, the molecular weight, and the topologically polar surface area.
  • step (a) includes: nucleotide class, functional site, nucleotide distance and NDS curve, nucleotide frequency and pairing status.
  • the step (3) creates a prediction method and model, and the step includes: using a balanced random forest model to create a calculation method of RNA-chemical small molecule interaction prediction.
  • an equilibrium random forest model is used to create a calculation method for predicting RNA-chemical small molecule interactions.
  • the steps include: dividing the negative samples in the training data set into multiple portions to reduce the gap between each negative sample and the positive sample, and separately Match the positive samples for model training and summarize the output of these models.
  • the step (4) verifying the prediction method and model includes: performing a performance evaluation on the model obtained through the step (3).
  • the performance evaluation includes: cross-validation using a training data set.
  • the performance evaluation includes: independent verification using a test data set.
  • the performance evaluation includes: selecting 5 positive prediction results and 5 negative prediction results for biological verification.
  • the present invention also adopts the following scheme, the application of a computer-based screening method for chemical small molecule drugs targeting RNA in a high-throughput screening platform.
  • the invention also adopts the following scheme.
  • the application of a computer-based screening method for small chemical drugs targeting RNA to computer screening using RNA as a targeting compound.
  • the invention also adopts the following scheme, the application of a computer-based screening method for RNA-targeted chemical small molecule drugs in a PDB database.
  • the present invention also adopts the following scheme, the application of a computer-based screening method for small chemical drugs targeting RNA in the SMMRNA database.
  • the present invention also adopts the following scheme, the application of a computer-based screening method for RNA-targeted chemical small molecule drugs in a miRNA-based environmental factor development platform miREnvironment.
  • the present invention also adopts the following scheme, the application of a computer-based screening method for targeted chemical small molecule drugs in targeted drug screening.
  • the invention also adopts the following scheme, the application of a computer-based screening method for small chemical drugs targeting RNA in the prevention and treatment of major diseases.
  • the invention also adopts the following scheme, the application of a computer-based screening method for targeted small chemical drugs of RNA in the selection of targeted drugs. Using this method, we predicted that the small chemical molecules kaempferol and quercetin targeted to lncSHGL were obtained.
  • the invention also adopts the following scheme, the application of a computer-based screening method for targeted small chemical drugs of RNA in the prevention and treatment of major diseases.
  • a new lncRNA, lncSHGL which plays a key role in liver glucose and lipid metabolism, and a new drug target for intervention in metabolic diseases such as fatty liver and diabetes.
  • kaempferol and Quercetin combined with lncSHGL, these two small chemical molecules are potential drugs for the prevention and treatment of fatty liver and diabetes.
  • the present invention addresses the important problem of screening of small chemical drugs for drugs, which is an emerging disease intervention target. Due to the limitations of current RNA spatial structure data, structural flexibility, unknown force fields, etc., the analysis of RNA sequence characteristics and small molecule physicochemical properties On the basis of this, a computational method for the screening of small chemical molecules targeted to RNA based on machine learning (the random forest method of the present invention) was created.
  • the invention can be used for computer screening of small chemical molecules targeting RNA; and provides a new solution strategy for the prevention and treatment of major diseases based on RNA.
  • the invention provides new ideas, new strategies and new methods for drug targeting RNA.
  • the structure of the PDB database consisting only of RNA strands and small molecules was retrieved.
  • the downloaded PDB structure data was cleaned and used as the source of the training data set. If all the small molecules contained in the PDB structure are metal ions or solvent molecules in buffers commonly used in structural biology research, or the RNA chain length does not exceed 20, they are not retained.
  • information on RNA-small molecule interactions was extracted from the retained PDB structure. Because 4.0 Angstrom is about the turning point of the weakest hydrogen bond and the strongest van der Waals force, 4.0 Angstrom is adopted as the threshold for judging the interaction between small molecules and RNA. If the closest distance between a nucleotide and a small molecule atom is less than 4.0 Angstroms, then there is an interaction between the two.
  • RNA-small molecule pairs with no interaction in the PDB structure as the source of the training data set, all the small molecules involved in the PDB structure are first sorted out to calculate the Euclidean distance between their physical and chemical properties, and then According to one or more small molecules that interact with the contained RNA strand in each PDB structure, sort the remaining small molecules according to the Euclidean distance between the physical and chemical properties of each small molecule contained in the structure. Reduce the possibility of generating false-negative RNA-small molecule pairs, and choose the intersection of small molecules with Euclidean distances ranked between 80th and 90th quantiles for the artificial generation of "non-interacting" RNA- Small molecule interaction pairs.
  • this article extracts RNA-related features from multiple perspectives such as sequence, structure, and function. Specifically, for each nucleotide, the following features are extracted in turn:
  • NNDS ⁇ dist (nt i -nt j ) / ⁇ dist (nt centroid -nt j )
  • nt i , nt j , nt centroid are the coordinate vectors of the nucleotide to be tested, any nucleotide in RNA, and the center of RNA, respectively. Euclidean distance is used when calculating the nucleotide distance.
  • RNA is fragmented
  • (1)-(3) of the above features will be placed into the vector of the corresponding fragment, and (4) is converted to the average value and assigned to the corresponding fragment.
  • the effect is determined according to whether the nucleotide located at the center of the fragment interacts with the small molecule.
  • the deletion value in the fragment beyond the two ends of the RNA sequence is (1) N (2) No (3) No (4) the first or The normalized NDS value of the last nucleotide is populated.
  • the frequency of each nucleotide and the frequency of nucleotide triplets are also counted within the range of each fragment.
  • RNA secondary structure used to determine the nucleotide pairing status is generated from multiple pathways, including using RNApdbee (http://rnapdbee.cs.put.poznan.pl/) to extract from the PDB structure, and manually annotate according to relevant literature reports And use RNAfold to predict RNA sequences.
  • RNApdbee http://rnapdbee.cs.put.poznan.pl/
  • chemical small molecule structure files include Structure data format (SDF) files obtained directly from the PDB database and PubChem database (https://pubchem.ncbi.nlm.nih.gov/) from the NCBI A simplified molecular input line entry specification (SMILES) format file retrieved from.
  • SDF Structure data format
  • PubChem database https://pubchem.ncbi.nlm.nih.gov/
  • SMILES molecular input line entry specification
  • the Open Babel software package was used to calculate the physical and chemical properties of the chemical small molecule structure file obtained above, including the number of hydrogen bond acceptors HBA, number of hydrogen bond donors HDA, octanol / water distribution coefficient MW, molar refractive index MR, and Topological polar surface area TPSA and so on.
  • RNA Since RNA only interacts with small chemical molecules locally, the applicant proposes the idea of fragmenting the RNA. Therefore, the RNA-related characteristics of the input model are obtained based on the RNA sequence fragments. The model directly predicts whether the RNA sequence fragments interact with small chemical molecules. It is also necessary to further integrate the prediction results at the fragment level to the RNA molecule level and to analyze the RNA molecules. The tendency to interact with small chemical molecules makes a holistic assessment. To this end, first find the fragments in the RNA sequence that are predicted to interact with small chemical molecules, and calculate that these fragments include themselves and their up to 5 adjacent fragments that are predicted to interact with small chemical molecules.
  • the proportion of these fragments is then sorted according to the ratio, and the top 5 of them are taken, and the ratio of the average of their proportions to the average of the distance between the center sequences of the fragments can be calculated. It can be inferred that a higher ratio means chemical
  • the denser the RNA sequence fragments that small molecules may act on are distributed more densely on the RNA molecule, and this interaction tendency score is referred to as the DRIP (Drug-RNA interaction predictor) score.
  • BRF balanced random forest
  • Random forest is a ensemble classifier. It is actually composed of several decision trees. A decision tree is trained from a part of the samples. The path from the root node to the leaf nodes indicates different characteristics. How the value condition ⁇ (xi) should be combined according to the weight w to achieve the classification of the selected part of the sample. Finally, the random forest model integrates a series of decision trees to achieve the prediction of the classification vector y:
  • a step-by-step approach is used to optimize it.
  • GI Gini Importance
  • Purity i ( ⁇ ) is expressed, and then the Gini impureness in all trees T is summarized to obtain the overall importance score of the feature:
  • the gap is taken from the existing negative fragments. Randomly sample and randomly mutate one of the nucleotides to generate the pseudo-negative fragments to fill.
  • the artificially produced pseudo-negative fragments have the same characteristics as the original negative fragments except for the sequence, and for the case that the number of negative fragments corresponding to small molecules is more than enough, Use the CD-HIT tool to sequence these RNAs within the negative segment and between the negative and positive segments.
  • MCC Matthews correlation coefficient
  • RNA-chemical small molecule interaction data is the basis for creating computational methods for targeted RNA chemical small molecule screening.
  • the applicant downloads relevant data from the PDB database and analyzes it as a training data set.
  • new RNA-chemical small molecule interaction pairs not included in the PDB were obtained from the SMMRNA (Small Small Molecule Modulators of RNA) database, and the newly obtained new manuals were manually retrieved from the published literature.
  • SMMRNA Small Molecule Modulators of RNA
  • RNA-related features such as nucleotide class, functional site, nucleotide distance, and (Nucleotide distance, (NDS) curve, nucleotide frequency and pairing status, etc .; extract structural files from chemical small molecule structure data, calculate physical and chemical properties, including number of hydrogen bond acceptors (HBA), number of hydrogen bond donors ( Number of hydrobonds (HBD), Octanol / water partition coefficient (logP), Molar refractive index (MR), molecular weight (MW), and topological polar surface area (Topological polar surface (TPSA), etc.
  • HBA hydrogen bond acceptors
  • HBA number of hydrogen bond donors
  • HBD Number of hydrobonds
  • logP Octanol / water partition coefficient
  • MR Molar refractive index
  • MW molecular weight
  • TPSA topological polar surface area
  • RNA is created using a balanced random forest (BRF) model that divides the negative sample into multiple copies that match the positive sample, respectively.
  • BRF balanced random forest
  • -Calculation method for chemical small molecule interaction prediction is used.
  • a step-by-step approach is used to optimize it.
  • RNA-chemical small molecule interaction prediction method To verify the accuracy of the RNA-chemical small molecule interaction prediction method, a 50-fold cross-validation was performed on the training data set. The prediction performance of the random forest model was evaluated using AUC values. It was then run on an independent test data set and evaluated using AUC values. Finally, this method was applied to the vascular smooth muscle-specific lncRNA-AK098656 found by the applicant earlier, and 5 positive prediction results and 5 negative prediction results were selected for biological verification.
  • miREnvironment Cui * et. Bioinformatics 2011
  • miREnvironment a research and development platform for miRNA-based environmental factors (mostly small molecules).
  • Small molecule intervention target protein is generally a functional site on the intervention protein, so determining the functional site of RNA is an important basis for intervention target RNA.
  • the applicant has successively proposed Rsite, Rsite2 (Cui * et, Scientific Reports 2015, 2016), SRAMP (Cui * et, Nucleic Acids, Res 2016, m6A methylation site prediction) and PPUS (Cui * et, Bioinformatics 2015, Pseudouracil Site Prediction) and other RNA functional site prediction methods.
  • RNA sequence and the spatial structure have significant identity and correlation (Figure 1), which indicates that the RNA sequence contains RNA spatial structure information, which further suggests that there is an extreme lack of data on RNA spatial structure and When the RNA force field is unknown, it is possible to use the characteristics of the RNA sequence to predict the small chemical molecules that interact with it.
  • the applicant has collected more than 300 pairs of RNA-chemical small molecule interactions from the PDB database. Get more than 100 RNA-chemical small molecule interaction pairs from the SMMRNA database and literature. After analysis, it was found that some RNA sequence characteristics are related to the interaction of small chemical molecules, such as triplet frequency, Rsite2 site, and so on. The physical and chemical properties of some small molecules were also related to RNA interaction, such as octanol / water distribution coefficient, topology Polar surface area, etc. The prediction method DRIP was initially constructed based on a random forest.
  • the computer-selected method for targeting small chemical medicines for RNA of the present invention can be used in industry and possesses industrial applicability.

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Pharmaceuticals Containing Other Organic And Inorganic Compounds (AREA)

Abstract

靶向RNA的化学小分子药物计算机筛选方法,包含以下步骤:(1)收集和整理训练数据集,(2)挖掘预测方法特征,(3)创建预测方法与模型,(4)验证预测方法与模型。可为靶向RNA的化学小分子的计算机筛选;基于RNA的重大疾病防治提供新的解决策略。

Description

靶向RNA的化学小分子药物计算机筛选方法 技术领域
本发明涉及一种药物计算机筛选方法,尤其涉及一种靶向RNA的化学小分子药物计算机筛选方法。
背景技术
基因(DNA)是遗传物质的存储者,它负责指导建造蛋白,蛋白被认为是最终执行具体生物学功能的分子,而RNA则被认为是连接DNA和蛋白的中间分子。因此,传统上,人们更多的集中在研究蛋白(包括蛋白编码DNA)上,对RNA研究不多,重视不够。传统的药物研发亦是以靶向蛋白为主,如DrugBank数据库中收录的95%以上的药物其作用靶点是蛋白,但是,绝大多数蛋白并不具备成靶性(drugaable target),截止到目前能够成靶的蛋白只有400个左右,因此靶向其他种类分子的药物研发对于疾病防治已是当务之急。近年,随着人类基因组计划和ENCODE(DNA元件百科全书)计划的实施,人们吃惊的发现人体中能够编码蛋白的DNA只占全部DNA的2%左右,其余98%的DNA大部分能转录成RNA,但并不翻译成蛋白,因此叫做非编码RNA(noncoding RNA,ncRNA)。随着RNA-Seq等高通量技术的飞速发展,目前已发现大量非编码RNA,如人体内已发现miRNA(微小RNA,microRNA)近4万个,长非编码RNA(long noncoding RNA,lncRNA)近10万个。研究表明这些RNA分子具有重要的生物功能,和疾病关系密切,即使信使RNA其功能也不限于沟通DNA和蛋白,而是在RNA层面有各种各样的重要功能,人们开始意识到RNA正在成为疾病干预的潜在关键靶点,靶向RNA的药物研发正在引起广泛的关注。
具有成药潜能的一大类靶向RNA的分子是RNA或DNA(为和“RNA靶点”区分,在此称为核酸),如小干扰RNA(small interfering RNA,siRNA)、反义寡核苷酸(antisense oligonucleotide,ASO)、miRNA、核酸适体(aptamer)等等。比如,罗氏(Roche)公司的针对人肝脏特异miRNA miR-122的用于治疗丙肝的核酸药物Mirvirasen已开始了2期临床试验。然而,核酸药物天然有一些缺点,如脱靶效应(off-target)、外源大分子易引起机体免疫反应、稳定性差和进细胞困难等。这些缺点尤其是后两点,严重阻碍了核酸的成药性。例如,siRNA进入血液循环后,短短几分钟就会被降解掉,稳定性极差,这是核酸成药的主要障碍之一。此外,细胞经过数亿年的进化,为了抵御外部不良物质的侵害, 进化出了双层脂质的细胞膜,阻碍了外源性核酸进入细胞,从而难以调控靶RNA,这是核酸成药的另一主要障碍。因此,除了继续深入研究基于核酸的RNA靶向药物,国际科学界亦开始把眼光投向其他可能的靶向RNA的策略,其中化学小分子开始展露头角。药物研发中的化学小分子指分子量小于900道尔顿的有机分子。
化学小分子稳定性好、容易进入细胞,极大克服了核酸药物的上述缺陷,并且历史上小分子在靶向RNA方面曾取得过成功,如链霉素(streptomycin)和四环素(tetracycline)就以细菌的RNA为靶。但目前严重阻碍本领域发展的一大瓶颈是靶向RNA的化学小分子筛选计算方法不足。国际上包括申请人课题组在该领域已有尝试。如基于miRNA转录组或mRNA转录组的miRNA-环境因子(绝大多数为化学小分子)生物信息学数据库和预测平台miREnvironment,小分子与miRNA关联数据库SM2miR以及预测算法,但这些方法本质上预测的都是miRNA和小分子“功能”上的关联性,并不是真正的靶向miRNA的药物预测。Kuntz实验室虽然尝试将“蛋白-小分子”对接软件“Dock 6.0”应用于“RNA-小分子”对接,但是该方法亦存在如下重大缺陷:1)它依赖于RNA三级结构,但是大多数RNA三级结构未知,并且RNA三级结构与蛋白三级结构不同,前者刚性更差,柔性更强;2)Dock 6是设计用于“蛋白-小分子”对接,RNA理化性质、力场参数和蛋白相去甚远,因此Dock 6并不能套用于RNA。最近Disney实验室首先用生物学手段确定了一些带有发卡环(hairpin)和脊(bulge)的小RNA片段结合的化学小分子,然后利用这种相互作用数据,他们设计了预测算法Inforna,但这种算法只适用于小的RNA片断,不适合大的RNA分子,而后者数量更多,更复杂,作用机制和小RNA亦不同。另外,因为Inforna数据、程序和服务器均没有公开,其准确性如何亦不清楚。综合以上分析可知,当前的这些初步尝试,均存在不足,靶向RNA药物筛选问题依然任重道远,需要更新的计算方法补充进来。
根据以上分析可知,直接靶向分子药物的筛选,分子的空间结构以及力场似乎不可或缺,目前已知空间结构的RNA分子则少之又少,RNA力场亦不清楚,这似乎是一对难以调和的矛盾。
发明内容
本发明的目的在于针对上述已有技术的不足,提供一靶向RNA的化学小分子药物计算机筛选方法,该方法利用RNA序列来源信息和化学小分子理化性质构建随机森林模型,能够更便捷有效地帮助筛选靶向RNA的化学小分子。本发明的化学小分子指分子量小于 900道尔顿的有机分子。
本发明的目的是通过以下技术方案实现的:
一种靶向RNA的化学小分子药物计算机筛选方法,包含以下步骤:(1)收集和整理数据集,(2)挖掘用于训练预测方法的特征,(3)创建预测方法与模型,(4)验证预测方法与模型。
优选的,所述步骤(1)收集和整理数据集,步骤包括:
(a)从PDB数据库检索并获取仅由RNA和小分子组成的结构,并从这些结构当中提取相应的信息,作为训练数据集。
(b)从SMMRNA数据库以及文献报道中搜集RNA与小分子相互作用的数据,作为测试数据集。
优选的,步骤(a)中所述的信息包括RNA与小分子相互作用的情况和RNA与小分子相互作用的具***置。
优选的,步骤(b)为从SMMRNA数据库以及文献报道中搜集PDB数据库之外的RNA与小分子相互作用的数据,作为测试数据集。
优选的,所述步骤(2)挖掘用于训练预测方法的特征,包括:
(a)提取RNA的相关特征;
(b)计算小分子的理化性质,包括氢键受体数量、氢键供体数量、辛醇/水分布系数、摩尔折射率、分子量、拓扑极性表面面积。
优选的,步骤(a)中所述相关特征包括序列、结构、功能。
优选的,步骤(b)中所述理化性质包括氢键受体数量、氢键供体数量、辛醇/水分布系数、摩尔折射率、分子量、拓扑极性表面面积。
进一步的,步骤(a)中所述的相关特征包括:核苷酸类别、功能位点、核苷酸距离和NDS曲线、核苷酸频率和配对状态。
优选的,所述步骤(3)创建预测方法与模型,步骤包括:采用均衡随机森林模型创建RNA-化学小分子相互作用预测的计算方法。
优选的,采用均衡随机森林模型创建RNA-化学小分子相互作用预测的计算方法,步骤包括:将训练数据集中的负样本分成多份以缩小每一份负样本与正样本的数量差距,并分别与正样本匹配进行模型训练,并汇总这些模型的输出结果。
优选的,所述步骤(4)验证预测方法与模型,包括:对经过步骤(3)得到的模型进行性能评测。
优选的,所述的性能评测包括:使用训练数据集进行交叉验证。
优选的,所述的性能评测包括:使用测试数据集进行独立验证。
进一步优选的,所述的性能评测包括:选择5个阳性预测结果和5个阴性预测结果进行生物学验证。
本发明还采用了如下方案,靶向RNA的化学小分子药物计算机筛选方法在高通量筛选平台的应用。
本发明还采用了如下方案,靶向RNA的化学小分子药物计算机筛选方法在以RNA为靶向化合物计算机筛选的应用。
本发明还采用了如下方案,靶向RNA的化学小分子药物计算机筛选方法在PDB数据库的应用。
本发明还采用了如下方案,靶向RNA的化学小分子药物计算机筛选方法在SMMRNA数据库的应用。
本发明还采用了如下方案,靶向RNA的化学小分子药物计算机筛选方法在基于miRNA的环境因子研发平台miREnvironment的应用。
本发明还采用了如下方案,靶向RNA的化学小分子药物计算机筛选方法在靶向药物筛选中的应用。
本发明还采用了如下方案,靶向RNA的化学小分子药物计算机筛选方法在重大疾病防治中的应用。
本发明还采用了如下方案,靶向RNA的化学小分子药物计算机筛选方法在靶向药物帅选中的应用。应用本方法,我们预测得到了靶向lncSHGL的化学小分子山奈酚(kaempferol)和槲皮素(Quercetin)。
本发明还采用了如下方案,靶向RNA的化学小分子药物计算机筛选方法在在重大疾病防治中的应用。我们前期发现一个新的lncRNA,lncSHGL,在肝脏糖脂代谢中起关键作用,在脂肪肝、糖尿病等代谢性疾病干预的新的药物靶点。利用本方法,我们预测到山奈酚(kaempferol)和槲皮素(Quercetin)结合lncSHGL,这两个化学小分子,是脂肪肝和糖尿病的潜在防治药物。
本发明的有益效果为:
本发明针对RNA这一新兴疾病干预靶点的化学小分子药物筛选这一重要问题,由于到目前RNA空间结构数据少、结构柔性、力场未知等局限,在分析RNA序列特征和小分子理化性质的基础上,创建基于机器学习(本发明用随机森林法)的靶向RNA的化学 小分子筛选的计算方法。本发明可为靶向RNA的化学小分子的计算机筛选;基于RNA的重大疾病防治提供新的解决策略。
本发明为靶向RNA的药物筛选提供新思路、新策略和新方法。
附图说明
图1.由RNA序列(先用序列预测二级结构,再计算距离)计算的核苷酸距离与空间结构计算的核苷酸距离高度相关;
图2.AK098656在血管平滑肌细胞表达具有高特异性;
图3.转入AK098656基因后,大鼠的收缩压(a)和舒张压(b)均显著升高;
图4.所创建的计算方法交叉验证的结果(a)和在独立SMMRNA和文献来源的独立数据集上的测试结果(b)。
本发明的较佳实施方式
以下实施例和实验例用于说明本发明,但不用来限制本发明的范围。下面结合具体实施例和实验例对本发明作进一步说明。
实施例1:
1、收集和整理RNA-化学小分子相互作用数据
1)训练数据集
检索PDB数据库中仅由RNA链和小分子组成的结构,下载的PDB结构数据经过清理后作为训练数据集的来源。如果PDB结构当中含有的小分子全部为金属离子或者是结构生物学研究中常用的缓冲液中的溶剂分子,或者所含的RNA链长度不超过20,则不予保留。接下来,从保留下来的PDB结构中提取RNA-小分子相互作用的信息。因为4.0埃(Angstrom)大约是最弱的氢键和最强的范德华力(van der Waals force)的转折点,因此采取4.0埃作为判断小分子和RNA相互作用的阈值。如果核苷酸与小分子的原子之间的最近距离小于4.0埃,则认为两者存在相互作用。由于到作为训练数据集来源的PDB结构中鲜有无相互作用的RNA-小分子对,文中先将所有的PDB结构中涉及的小分子整理出来计算它们的理化性质之间的欧氏距离,接着,根据每个PDB结构中与所含RNA链相互作用的一个或多个小分子,将剩余的小分子根据与结构中所含各个小分子之间理化性质的欧氏距离分别进行排序,为了尽量降低生成假阴性RNA-小分子对的可能性,选取理化性质的欧氏距离排序在第80到第90分位数之间的小分子的交集用于人为地产生“无 相互作用”的RNA-小分子相互作用对。
2)独立测试数据集
从文献手动收集RNA-小分子相互作用以及可能存在的无相互作用的RNA-小分子对作为测试数据集,从SMMRNA数据库获取未包括在PDB数据库的新的RNA小分子相互作用数据。
2、计算RNA相关特征及小分子理化性质
一方面,本文从序列、结构和功能等多个角度提取RNA相关的特征,具体地来说,对于每个核苷酸,依次分别提取以下特征:
(1)该核苷酸本身种类(A、U、C、G和N);
(2)是否与另外的核苷酸形成配对;
(3)是否为申请人前期提出的Rsite2算法所预测的功能位点;
(4)该核苷酸在二级结构中标准化的几何距离打分NNDS值:
NNDS=∑dist(nt i-nt j)/∑dist(nt centroid-nt j)
其中nt i,nt j,nt centroid分别是待测核苷酸,RNA中任意核苷酸以及RNA中心的坐标向量,计算核苷酸距离时采用欧氏距离。
随后,由于对RNA进行片段化处理,上述特征中的(1)-(3)会被置入相应片段的向量,而(4)被转换为平均值赋给对应片段,片段与小分子是否相互作用根据位于片段中心位置的核苷酸是否与小分子相互作用确定,超出RNA序列两端的片段中的缺失值默认分别使用(1)N(2)否(3)否(4)第一个或最后一个核苷酸的标准化NDS值进行填充。此外,还在各个片段范围内统计各个核苷酸的频率以及核苷酸三联体的频率。用于判断核苷酸配对状态的RNA二级结构产生自多个途径,包括利用RNApdbee(http://rnapdbee.cs.put.poznan.pl/)从PDB结构中提取,根据相关文献报道手动注释以及使用RNAfold对RNA序列进行预测。
另一方面,化学小分子结构文件包括从PDB数据库中直接获取的结构数据格式(Structure data format,SDF)文件和从NCBI的PubChem数据库(https://pubchem.ncbi.nlm.nih.gov/)中检索得到的简化分子线性输入规范(Simplified molecular input line entry specification,SMILES)格式文件。随后,运用Open Babel软件包根据上面所得化学小分子结构文件计算出其理化性质,包括氢键受体数量HBA、氢键供体数量HDA、辛醇/水分布系数MW、摩尔折射率MR、以及拓扑极性表面面积TPSA等。这些指标可以直接计数获得或者通过已知的小分子片段的理化性质整合而成。比如对于含有n种片段的小分子,可以查询每种片段的TPSA,并根据片段数目加权求和算得:
Figure PCTCN2018090267-appb-000001
3、创建RNA-化学小分子相互作用预测方法
1)计算RNA-化学小分子相互作用倾向性分数
由于到RNA只是局部和化学小分子相互作用,申请人提出将RNA片段化处理的思路。因此,输入模型的RNA相关特征是基于RNA序列片段得到的,模型直接对RNA序列片段是否与化学小分子相互作用进行预测,还需要将片段水平的预测结果进一步整合到RNA分子水平,对RNA分子与化学小分子相互作用的倾向性做出整体的评估。为此,先找出RNA序列中被预测可能和化学小分子存在相互作用的片段,计算出这些片段包括自身及其左右至多各5个邻近片段当中被预测为可能与化学小分子相互作用的片段所占的比例,然后根据该比例对这些片段进行排序,取其中的至多前5个,计算出它们的比例平均值和片段中心序列距离平均值的比值,可以推知,该比值越高意味着化学小分子可能作用的RNA序列片段在RNA分子上分布得越密集,在此将此相互作用倾向性分数记作DRIP(Drug-RNA interaction predictor)分数。
2)创建RNA-化学小分子相互作用预测模型
由于数据集中与小分子无相互作用的片段要远远多于有相互作用的片段,采用将负样本分成多份分别与正样本匹配的均衡随机森林(Balanced random forest,BRF)模型,另外在尽可能缩小每一份中负样本与正样本的数量差距的同时为避免过度增加模型复杂度,限制负样本最多被分为10份。随机森林模型使用R包randomForest构建。
随机森林是一种系宗分类模型(ensemble classifier),它实际上由若干决策树集合而成,一棵决策树训练自一部分样本,其中从根节点到叶子节点之间的路径指出了不同特征的取值条件θ(xi)应该如何按照权重w组合,以实现对选出的这一部分样本的分类。最终,随机森林模型通过整合一系列决策树,实现对分类向量y的预测:
Figure PCTCN2018090267-appb-000002
鉴于模型中整合的特征以及构建流程中可调整的参数较多,采用分步的方式予以优化。首先,因为对RNA进行了片段化处理,测试不同RNA序列片段长度对模型性能的影响;调整好RNA序列片段长度之后,进行特征的筛选。在训练好的随机森林模型中,单个特征的重要性打分以基尼重要性(Gini importance,GI)表示,它将每个树中对特征的分割(split)方式κ造成的分类优度以基尼不纯性i(κ)表示,然后汇总所有树T中的基尼不纯性,以得到特征总体的重要性打分:
Figure PCTCN2018090267-appb-000003
测试了不同特征组合对模型性能的影响,这些特征组合中包括有保留全部特征,分别剔除每一组RNA相关特征,以及将小分子理化性质用分子量进行标准化然后保留或剔除分子量;选好特征组合之后,紧接着对数据集中的正负片段比例进行调整,由于每个小分子对应的正负片段比例各不相同,可能导致模型预测结果产生偏倚,通过操作未与小分子相互作用的负片段将小分子对应负片段与正片段的比例控制到同一水平,从10比1开始一直翻倍至640比1为止,对于小分子对应负片段的数量不足的情况,缺口由从现有的负片段当中随机采样并随机突变其中一个核苷酸所产生的伪负片段填补,人为制造的伪负片段除序列外其余特征与原负片段的保持一致,而对于小分子对应负片段的数量有余的情况,使用CD-HIT工具在负片段内部以及负片段与正片段之间将这些RNA序列片段进行聚类,然后依据聚类结果优先保留与正片段相似的负片段同时减少负片段内部的冗余,尽量保证保留下的负片段的代表性;随后,在控制好小分子对应的正负片段比例的条件下,再一次比较不同的RNA序列长度对模型性能的影响;最后,设置随机森林模型中分类树的数量为从100开始每次递增100直至1000,对分类树的个数进行比较选择。
4、验证所创建的RNA-化学小分子相互作用预测方法
在训练数据集上进行5折交叉验证,主要采用敏感性(sensitivity),特异性(specificity)和马修相关系数(Matthews correlation coefficient,MCC)评价预测表现,这些评价指标的定义如下:
Figure PCTCN2018090267-appb-000004
由于这些评价指标依赖于特定分类器阈值,为全面评估预测器,我们还将绘制ROC曲线,并采用曲线下面积AUC值进行评价。
在独立测试数据集上运行所创建方法,评估其准确性。
从DrugBank数据库(https://www.drugbank.ca/)下载所有的药物小分子结构数据,将优化过程中设置了不同参数的模型应用于在DrugBank药物库中筛选可能同AK098656相互作用的小分子。选择阳性预测和阴性预测各5个进行进一步的生物学验证。因为GE公司的BIACORE生物分子间相互作用分析仪具有适用样品类型广泛(包括化学小分子和RNA)、分子无需标记、实时、具有超高灵敏度(可监测到微弱、瞬时的分子相互作用) 等优点,用GE公司的BIACORE分析仪验证预测的阳性和阴性结果。
实施例2:
1、RNA-化学小分子相互作用数据的收集和整理
一套可靠的已获证实的RNA-化学小分子相互作用数据是创建靶向RNA化学小分子筛选计算方法的基础。为此,申请人从PDB数据库下载相关数据,并分析整理,作为训练数据集。另外,为验证所提出的预测方法,从SMMRNA(Small molecule modulators of RNA)数据库获取新的不包含在PDB内的RNA-化学小分子相互作用对,从已发表的文献中人工检索新的已获实验证实的RNA-化学小分子相互作用对,将SMMRNA和文献检索结果共同作为独立测试数据集。
2、RNA相关特征及小分子理化性质的计算
基于训练数据集的RNA-化学小分子相互作用数据,从序列、结构和功能等多个角度提取RNA相关的特征,如核苷酸类别、功能位点、核苷酸距离和(Nucleotide distance sum,NDS)曲线、核苷酸频率和配对状态等;从化学小分子结构数据中提取结构文件,计算理化性质,包括氢键受体数量(Number of hydrogen bond acceptors,HBA)、氢键供体数量(Number of hydrogen bond donors,HBD)、辛醇/水分布系数(Octanol/water partition coefficient,logP)、摩尔折射率(Molar refractivity,MR)、分子量(Molecular weight,MW)、拓扑极性表面面积(Topological polar surface area,TPSA)等。
3、RNA-化学小分子相互作用预测方法的创建
由于数据集中与化学小分子无相互作用的RNA片段要远远多于有相互作用的片段,采用将负样本分成多份分别与正样本匹配的均衡随机森林(Balanced random forest,BRF)模型创建RNA-化学小分子相互作用预测的计算方法。另外,鉴于模型中整合的特征以及构建流程中可调整的参数较多,采用分步的方式予以优化。
4、RNA-化学小分子相互作用预测方法的验证
为验证所创建RNA-化学小分子相互作用预测方法的准确性,在训练数据集上进行5折交叉验证,随机森林模型的预测性能采用AUC值进行评价。然后在独立测试数据集上运行,亦采用AUC值进行评价。最后,将该方法应用于申请人前期发现的血管平滑肌特异的lncRNA-AK098656,选择5个阳性预测结果和5个阴性预测结果进行生物学验证。
实施例3:
针对靶向RNA的化学小分子药物筛选的计算方法研究,申请人已创建基于miRNA 的环境因子(大部分为化学小分子)研发平台miREnvironment(Cui*et al.Bioinformatics 2011)。小分子干预靶蛋白一般是干预蛋白上的功能位点,因此确定RNA的功能位点是干预靶RNA的重要基础。申请人相继提出了Rsite、Rsite2(Cui*et al.Scientific Reports 2015,2016)、SRAMP(Cui*et al.Nucleic Acids Res 2016,m6A甲基化位点预测)和PPUS(Cui*et al.Bioinformatics 2015,假尿嘧啶化位点预测)等RNA功能位点预测方法。申请人公开了通过RNA序列与空间结构获得的功能位点具有显著一致性和相关性(图1),这表明RNA序列蕴含着RNA空间结构信息,这进一步提示,在RNA空间结构数据极度缺乏以及RNA力场未知的情况下,利用RNA序列特征有可能达到预测与其相互作用的化学小分子的目的。
实施例4:
验证了一个血管平滑肌特异lncRNA-AK098656(图2),在高血压病人血液中显著升高,且给大鼠转入AK098656基因后,其血压明显升高(Jin L et al.Hypertension 2018,71(2):262-272),表明AK098656高血压防治的潜在新RNA靶点(图3)。
实施例5:
申请人已从PDB数据库收集整理了300多对RNA-化学小分子相互作用。从SMMRNA数据库和文献中获取100多对RNA-化学小分子相互作用对。经分析,初步发现一些RNA序列特征与化学小分子相互作用有关,如三联体频率、Rsite2位点等,亦发现一些小分子的理化性质与RNA相互作用有关,如辛醇/水分布系数、拓扑极性表面面积等。基于随机森林初步构建了预测方法DRIP,5折交叉验证结果表明,AUC达到0.818,在SMMRNA和文献来源的独立测试数据集上AUC达到0.829(图4),表明所创建的方法在预测RNA-化学小分子相互作用上具有了一定的准确率
工业实用性
本发明的靶向RNA的化学小分子药物计算机筛选方法可以在工业上使用,具备工业实用性。

Claims (21)

  1. 一种靶向RNA的化学小分子药物计算机筛选方法,其特征在于:包含以下步骤:(1)收集和整理数据集,(2)挖掘用于训练预测方法的特征,(3)创建预测方法与模型,(4)验证预测方法与模型。
  2. 如权利要求1所述的靶向RNA的化学小分子药物计算机筛选方法,其特征在于:所述步骤(1)收集和整理数据集,步骤包括:
    (a)从PDB数据库检索并获取仅由RNA和小分子组成的结构,并从这些结构当中提取相应的信息,作为训练数据集。
    (b)从SMMRNA数据库以及文献报道中搜集RNA与小分子相互作用的数据,作为测试数据集。
  3. 如权利要求2所述的靶向RNA的化学小分子药物计算机筛选方法,其特征在于:步骤(a)中所述的信息包括RNA与小分子相互作用的情况和RNA与小分子相互作用的具***置。
  4. 如权利要求2所述的靶向RNA的化学小分子药物计算机筛选方法,其特征在于:步骤(b)为从SMMRNA数据库以及文献报道中搜集PDB数据库之外的RNA与小分子相互作用的数据,作为测试数据集。
  5. 如权利要求1所述的靶向RNA的化学小分子药物计算机筛选方法,其特征在于:所述步骤(2)挖掘用于训练预测方法的特征,包括:
    (a)提取RNA的相关特征;
    (b)计算小分子的理化性质,包括氢键受体数量、氢键供体数量、辛醇/水分布系数、摩尔折射率、分子量、拓扑极性表面面积。
  6. 如权利要求5所述的靶向RNA的化学小分子药物计算机筛选方法,其特征在于:步骤(a)中所述相关特征包括序列、结构、功能。
  7. 如权利要求5所述的靶向RNA的化学小分子药物计算机筛选方法,其特征在于:步骤(b)中所述理化性质包括氢键受体数量、氢键供体数量、辛醇/水分布系数、摩尔折射率、分子量、拓扑极性表面面积。
  8. 如权利要求5所述的靶向RNA的化学小分子药物计算机筛选方法,其特征在于:步骤(a)中所述的相关特征包括:核苷酸类别、功能位点、核苷酸距离和NDS曲 线、核苷酸频率和配对状态。
  9. 如权利要求1所述的靶向RNA的化学小分子药物计算机筛选方法,其特征在于:所述步骤(3)创建预测方法与模型,步骤包括:采用均衡随机森林模型创建RNA-化学小分子相互作用预测的计算方法。
  10. 如权利要求9所述的靶向RNA的化学小分子药物计算机筛选方法,其特征在于:采用均衡随机森林模型创建RNA-化学小分子相互作用预测的计算方法,步骤包括:将训练数据集中的负样本分成多份以缩小每一份负样本与正样本的数量差距,并分别与正样本匹配进行模型训练,并汇总这些模型的输出结果。
  11. 如权利要求1所述的靶向RNA的化学小分子药物计算机筛选方法,其特征在于:所述步骤(4)验证预测方法与模型,包括:对经过步骤(3)得到的模型进行性能评测。
  12. 如权利要求11所述的靶向RNA的化学小分子药物计算机筛选方法,其特征在于:所述的性能评测包括:使用训练数据集进行交叉验证。
  13. 如权利要求11所述的靶向RNA的化学小分子药物计算机筛选方法,其特征在于:所述的性能评测包括:使用测试数据集进行独立验证。
  14. 如权利要求11所述的靶向RNA的化学小分子药物计算机筛选方法,其特征在于:所述的性能评测包括:选择5个阳性预测结果和5个阴性预测结果进行生物学验证。
  15. 权利要求1所述的靶向RNA的化学小分子药物计算机筛选方法在高通量筛选平台的应用。
  16. 权利要求1所述的靶向RNA的化学小分子药物计算机筛选方法在以RNA为靶向化合物计算机筛选的应用。
  17. 权利要求1所述的靶向RNA的化学小分子药物计算机筛选方法在PDB数据库的应用。
  18. 权利要求1所述的靶向RNA的化学小分子药物计算机筛选方法在SMMRNA数据库的应用。
  19. 权利要求1所述的靶向RNA的化学小分子药物计算机筛选方法在基于miRNA的环境因子研发平台miREnvironment的应用。
  20. 权利要求1所述的靶向RNA的化学小分子药物计算机筛选方法在靶向药物筛选中的应用。
  21. 权利要求1所述的靶向RNA的化学小分子药物计算机筛选方法在重大疾病防治中的应用。
PCT/CN2018/090267 2018-06-06 2018-06-07 靶向rna的化学小分子药物计算机筛选方法 WO2019232748A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810573816.1 2018-06-06
CN201810573816.1A CN108959843B (zh) 2018-06-06 2018-06-06 靶向rna的化学小分子药物计算机筛选方法

Publications (1)

Publication Number Publication Date
WO2019232748A1 true WO2019232748A1 (zh) 2019-12-12

Family

ID=64493024

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/090267 WO2019232748A1 (zh) 2018-06-06 2018-06-07 靶向rna的化学小分子药物计算机筛选方法

Country Status (2)

Country Link
CN (1) CN108959843B (zh)
WO (1) WO2019232748A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111081316A (zh) * 2020-03-25 2020-04-28 元码基因科技(北京)股份有限公司 用于筛选新冠肺炎候选药物的方法及装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101587510A (zh) * 2008-05-23 2009-11-25 中国科学院上海药物研究所 基于复杂抽样和改进决策森林算法的化合物致癌毒性预测方法
US20100138205A1 (en) * 2008-10-10 2010-06-03 Los Alamos National Security, Llc Stochastic molecular binding simulation
CN106548196A (zh) * 2016-10-20 2017-03-29 中国科学院深圳先进技术研究院 一种针对非平衡数据的随机森林抽样方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102222178B (zh) * 2011-03-31 2014-12-10 清华大学深圳研究生院 一种筛选和/或设计针对多靶标的药物的方法
DK3071696T3 (da) * 2013-11-22 2019-10-07 Mina Therapeutics Ltd C/ebp alfa kort aktiverings-rna-sammensætninger og fremgangsmåder til anvendelse
CN107058521B (zh) * 2017-03-17 2019-12-27 中国科学院北京基因组研究所 一种检测人体免疫状态的检测***
CN107893078B (zh) * 2017-11-28 2021-01-29 西安交通大学 靶向突触结合蛋白-11的siRNA、表达载体和病毒颗粒及其制药应用

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101587510A (zh) * 2008-05-23 2009-11-25 中国科学院上海药物研究所 基于复杂抽样和改进决策森林算法的化合物致癌毒性预测方法
US20100138205A1 (en) * 2008-10-10 2010-06-03 Los Alamos National Security, Llc Stochastic molecular binding simulation
CN106548196A (zh) * 2016-10-20 2017-03-29 中国科学院深圳先进技术研究院 一种针对非平衡数据的随机森林抽样方法及装置

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
CUI, QINGHUA ET AL.: "Analysis and Modeling of Interaction Between miRNA and Environmental Factors Based on Network Pharmacology", CHINESE PHARMACOLOGIST, vol. 29, no. 3, 31 December 2012 (2012-12-31), pages 18 *
HAN, S. ET AL.: "Long Noncoding RNA Identification: Comparing Machine Learning Based Tools for Long Noncoding Transcripts Discrimination", BIOMED RESEARCH INTERNATIONAL, vol. 2016, 31 December 2016 (2016-12-31), pages 1 - 14, XP055665425 *
HE, BING ET AL.: "Predicting and Virtually Screening Breast Cancer Targeting Protein HEC1 Inhibitors by Molecular Descriptors and Machine Learning Methods", ACTA PHYSICO-CHIMICA SINICA, vol. 31, no. 9, 30 September 2015 (2015-09-30), pages 1795 - 1802, XP055665422 *
MULLARD, A. ET AL.: "Small Molecules Against RNA Targets Attract Big Backers", NATURE REVIEWS DRUG DISCOVERY, vol. 16, no. 12, 28 November 2017 (2017-11-28), pages 813 - 815, XP055665403 *
REAUTSCHNIG, P. ET AL.: "The Notorious R.N.A. in the Spotlight - Drug or Target for the Treatment of Disease", RNA BIOLOGY, vol. 14, no. 5, 17 August 2016 (2016-08-17), pages 651 - 668, XP055586061, DOI: 10.1080/15476286.2016.1208323 *
RIZVI, N.F. ET AL.: "RNA as a Small Molecule Druggable Target", BIOORGANIC & MEDICINAL CHEMISTRY LETTERS, vol. 27, no. 23, 23 October 2017 (2017-10-23), pages 5083 - 5088, XP085261303, DOI: 10.1016/j.bmcl.2017.10.052 *
WANG, YING ET AL.: "Research and Progress of microRNA Prediction Methods Based on Machine Learning", COMPUTER SCIENCE, vol. 42, no. 2, 28 February 2015 (2015-02-28), pages 7 - 13 *

Also Published As

Publication number Publication date
CN108959843B (zh) 2021-07-06
CN108959843A (zh) 2018-12-07

Similar Documents

Publication Publication Date Title
Pan et al. Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks
Zakov et al. Rich parameterization improves RNA structure prediction
Bandyopadhyay et al. MBSTAR: multiple instance learning for predicting specific functional binding sites in microRNA targets
Kleftogiannis et al. Where we stand, where we are moving: surveying computational techniques for identifying miRNA genes and uncovering their regulatory role
Heller et al. ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data
US20220336049A1 (en) Systems and methods for determining effects of therapies and genetic variation on polyadenylation site selection
US20220130541A1 (en) Disease-gene prioritization method and system
WO2023197718A1 (zh) 一种预测环状rna ires的方法
CN113488104A (zh) 基于局部和全局的网络中心性分析的癌症驱动基因预测方法及***
CN107679367A (zh) 一种基于网络节点关联度的共调控网络功能模块识别方法及***
Bugnon et al. Deep Learning for the discovery of new pre-miRNAs: Helping the fight against COVID-19
Jindal et al. A machine learning method for predicting disease-associated microRNA connections using network internal topology data
WO2019232748A1 (zh) 靶向rna的化学小分子药物计算机筛选方法
CN107653309A (zh) Mir135hg在调控心血管***中的应用
AU2021387426A9 (en) Artificial-intelligence-based cancer diagnosis and cancer type prediction method
Kuang et al. Machine Learning Approaches for Plant miRNA Prediction: Challenges, Advancements, and Future Directions
Segovia-Juarez et al. Identifying DNA splice sites using hypernetworks with artificial molecular evolution
Li et al. A robust hybrid approach based on estimation of distribution algorithm and support vector machine for hunting candidate disease genes
Iqbal et al. A framework for the RNA-Seq based classification and prediction of disease
Gupta et al. DAVI: Deep learning-based tool for alignment and single nucleotide variant identification
CN109256215B (zh) 一种基于自回避随机游走的疾病关联miRNA预测方法及***
Reddy et al. Planted (l, d)-motif finding using particle swarm optimization
KR20160132223A (ko) 10,000개 이상 유전자 간의 전사조절 네트워크 구축 알고리즘과 이를 이용한 약물반응 원인 유전자 발굴 방법
Ihalagedara et al. mirnafinder: A pre-microrna classifier for plants and analysis of feature impact
US20220246235A1 (en) System and method for gene editing cassette design

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18921370

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18921370

Country of ref document: EP

Kind code of ref document: A1