CN106650317B - A method of tumour latent gene target is found by collaborative filtering public database - Google Patents

A method of tumour latent gene target is found by collaborative filtering public database Download PDF

Info

Publication number
CN106650317B
CN106650317B CN201610879877.1A CN201610879877A CN106650317B CN 106650317 B CN106650317 B CN 106650317B CN 201610879877 A CN201610879877 A CN 201610879877A CN 106650317 B CN106650317 B CN 106650317B
Authority
CN
China
Prior art keywords
tumor type
target
gene
database
tumour
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610879877.1A
Other languages
Chinese (zh)
Other versions
CN106650317A (en
Inventor
江经纬
孙媛媛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shuangyun biomedical technology (Suzhou) Co.,Ltd.
Original Assignee
Nanjing Double Transport Biotechnology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Double Transport Biotechnology Co Ltd filed Critical Nanjing Double Transport Biotechnology Co Ltd
Priority to CN201610879877.1A priority Critical patent/CN106650317B/en
Publication of CN106650317A publication Critical patent/CN106650317A/en
Application granted granted Critical
Publication of CN106650317B publication Critical patent/CN106650317B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The present invention provides a kind of method for finding tumour latent gene target by collaborative filtering public database, the process employs based on known oncogene target database and belong to the Jaccard coefficient between different tumours, excavate the gene target for having been found that in certain tumour but having not found in another tumour.Method of the invention finds tumour latent gene target calibration method compared to traditional manual retrieval, and scientific research efficiency can be substantially improved, and improves the research hit rate of potential tumor target, utilizes the resource of a variety of public databases to the full extent, saves many experiments cost.

Description

A method of tumour latent gene target is found by collaborative filtering public database
Technical field
The present invention relates to the utilization technical fields of oncogene public database, more particularly to one kind to pass through collaborative filtering public affairs The method of database discovery tumour latent gene target altogether.
Background technique
Currently, the discovery of oncogene target, which mainly passes through, reads related scientific research document progress reasonable assumption and corresponding reality Verifying.It is constantly reformed, under the overall background that technology platform is constantly brought forth new ideas in laboratory facilities, a large amount of high-flux sequence experiment is also answered For finding oncogene target.Therefore, the result based on a large amount of high-flux sequence data is effectively stored in corresponding swollen On tumor database, such as COSMIC, TCGA etc..
In the tumour lane database of various different emphasis, about the database (such as COSMIC) of gene mutation, related In different tumor types and the database of clinical manifestation (such as TCGA), about tumour medicine sensibility database (such as GDSC), about the tumour database (such as CIViC) of prognosis.Based on the pure artificial a large amount of Relational database of lookup, it is usual because The problems such as huge for data volume, causes to propose that reasonable assumption and corresponding experimental design are very time-consuming and inefficient.Therefore, it develops A set of method using the public database discovery potential gene target of tumour seems very urgent out.
The object of oncogene target database includes tumor type and relevant gene target, the database of this result Structure is suitble to Collaborative Filtering Recommendation Algorithm.In general, the principle of Collaborative Filtering Recommendation Algorithm is according between user and user Similitude find and the potential interest of user and then make reasonable recommended, principle is two points for establishing user and commodity Scheme (bipartite graph), the algorithm be chiefly used in electric business website according between client similitude Recommendations such as Jingdone district, wash in a pan Treasured etc..However, Collaborative Filtering Recommendation Algorithm can also regard user and commodity as tumor type and corresponding gene target, it is different The tumour of type can also carry out Similarity measures, to recommend some gene targets out.
Currently, for each scientific research personnel of the traditional artificial search of application, it be in quantity and information content blowout Lane database finds potential gene target and carries out arrangement and contrived experiment, and the time that this process needs to spend is big at present About 3-6 months, it is current and it is foreseeable over the next several years in scientific documents no matter quantitatively or all presentation refers in information content The growth trend of numerical expression, so it is necessary to develop it is corresponding based on the collaborative filtering recommending method of database to solve tradition Manual search process needs time-consuming too long problem.
Summary of the invention
In view of the above problems, can substantially shorten scientific research personnel it is an object of that present invention to provide one kind to carry out rationally Assuming that, time of contrived experiment whole process, pass through the method that collaborative filtering public database finds tumour latent gene target.
In order to achieve the above object, The technical solution adopted by the invention is as follows: oncogene database is generally all comprising swollen Tumor type and associated gene mutation these two types information, these two types of information are generally by GWAS and high throughput sequencing technologies to massive tumor Sample analysis obtains, and has a degree of hereditary meaning.However, due to heterogeneity of tumor sample etc., in general Only some high-frequency mutated genes could be found by above method, and low frequency mutation is then difficult to be identified.In fact, The different mutated genes of same type tumour are to have certain relevance from hereditary meaning, in addition between different type tumour It is often found that publicly-owned metabolic pathway.
Our bipartite graphs according to oncogene Database based on tumor type and corresponding mutated gene, two points herein Collaborative filtering is carried out on the basis of figure, can find the latent gene target of certain tumor types.
A kind of method that tumour latent gene target is found by collaborative filtering public database provided by the invention, it is described Method include following operating procedure:
1) principle for utilizing graph theory is established in oncogene database and is owned according to the information in oncogene database Tumor type and its bipartite graph for corresponding to mutated gene, wherein during establishing bipartite graph, define two kinds of nodes, Yi Zhongjie Point is tumor type, and another node is mutated gene;Definition: tumor type X and oncogene warehouse publication this is swollen Have side between the corresponding mutated gene of tumor type X, define: there is no side between different tumor types, define: different mutated genes it Between also without side.
2) specified tumor type A is selected from bipartite graph, other all tumor types are target tumor type B, are calculated Jaccard value between specified tumor type A and target tumor type B, specific formula for calculation are as follows:
Wherein, | A | it is the mutated gene quantity of tumor type A, | B | it is the mutated gene quantity of tumor type B, | A ∩ B | it is the publicly-owned mutated gene quantity of tumor type A and tumor type B, | A ∪ B | to be present in tumor type A or tumor type B Publicly-owned mutated gene quantity, the Jaccard value calculated obtain between specified tumor type A and target tumor type B Similarity.
3) step 2 is repeated, other all target tumors in specified tumor type A and oncogene database are calculated separately The Jaccard value of type B chooses the target tumor type B that Jaccard value is greater than 01、B2、B3……Bn。
4) from selecting step 3 in oncogene database) in Jaccard value be greater than 0 target tumor type B1、B2、 B3... the mutated gene B of specified tumor type A is not present in corresponding to Bn11’… B1i’、B21’ … B2j’、B31’ … B3j’ 、…、Bn1' … Bnm';By target tumor type BiImparting target corresponding with the Jaccard value of specified tumor type A is swollen Tumor type BiAll related mutation gene Bi1’… Biq’。
5) by mutated gene B corresponding in step 4)11’… B1i’、B21’ … B2j’、 B31’ … B3j’ 、…、 Bn1' ... the Jaccard value of the mutated gene of the same name of Bnm ' is added.
6) mutated gene corresponding to the height arrangement target tumor type A according to Jaccard value after addition, according to Jaccard value height judges whether mutated gene is the latent gene target for specifying tumor type.
7) search document, determine latent gene target in step 6) whether in the field of specified tumor type A not by It studied.
In the calculating process of step 2 of the present invention, the value range of Jaccard value is 0~1, wherein Jaccard Value is bigger, then represents tumor type A and tumor type B is more similar.
In step 6) of the present invention, Jaccard value is higher after addition, and corresponding mutated gene is specified tumour class The probability of the latent gene target of type is higher.
Oncogene database of the present invention includes all tumor types and its common data for corresponding to gene mutation Library.
The present invention has the advantages that present invention employs the collaborative filtering method discovery tumour based on tumour database is potential Gene target substitute traditional artificial searching method.It is compared with the traditional method, search time is greatly decreased simultaneously in the present invention Reasonable design experiment.
Detailed description of the invention
Fig. 1 is the two subnetwork figure of tumor type-mutated gene of foundation in the present invention by taking COSMIC database as an example.
Specific embodiment
The present invention is described in further detail with specific embodiment for explanation with reference to the accompanying drawing.
Oncogene database of the present invention is public database all on the market, including all tumour classes Type and its information for corresponding to gene mutation.
Embodiment 1: a method of tumour latent gene target, the side are found by collaborative filtering public database Method includes following operating procedure:
1) principle for utilizing graph theory is established in oncogene database and is owned according to the information in oncogene database Tumor type and its bipartite graph for corresponding to mutated gene, wherein during establishing bipartite graph, define two kinds of nodes, Yi Zhongjie Point is tumor type, and another node is mutated gene;Definition: tumor type X and oncogene warehouse publication this is swollen Have side between the corresponding mutated gene of tumor type X, define: there is no side between different tumor types, define: different mutated genes it Between also without side.
2) specified tumor type A is selected from bipartite graph, other all tumor types are target tumor type B, are calculated Jaccard value between specified tumor type A and target tumor type B, specific formula for calculation are as follows:
Wherein, | A | it is the mutated gene quantity of tumor type A, | B | it is the mutated gene quantity of tumor type B, | A ∩ B | it is the publicly-owned mutated gene quantity of tumor type A and tumor type B, | A ∪ B | to be present in tumor type A or tumor type B Publicly-owned mutated gene quantity, the Jaccard value calculated obtain between specified tumor type A and target tumor type B Similarity.The value range of Jaccard value is 0~1, and wherein Jaccard value is bigger, then represents tumor type A and tumour class Type B is more similar.
3) step 2 is repeated, other all target tumors in specified tumor type A and oncogene database are calculated separately The Jaccard value of type B chooses the target tumor type B that Jaccard value is greater than 01、B2、B3……Bn。
4) from selecting step 3 in oncogene database) in Jaccard value be greater than 0 target tumor type B1、B2、 B3... the mutated gene B of specified tumor type A is not present in corresponding to Bn11’… B1i’、B21’ … B2j’、B31’ … B3j’、…、Bn1' … Bnm';By target tumor type BiImparting target corresponding with the Jaccard value of specified tumor type A is swollen Tumor type BiAll related mutation gene Bi1’… Biq’。
5) by mutated gene B corresponding in step 4)11’… B1i’、B21’ … B2j’、 B31’ … B3j’、…、Bn1’ ... the Jaccard value of the mutated gene of the same name of Bnm ' is added.
6) mutated gene corresponding to the height arrangement target tumor type A according to Jaccard value after addition, according to Jaccard value height judges whether mutated gene is the latent gene target for specifying tumor type.
7) search document, determine latent gene target in step 6) whether in the field of specified tumor type A not by It studied.
Embodiment 2: as shown in Figure 1, using method of the invention, by taking COSMIC database as an example, the tumor type-of foundation Two subnetwork figure of mutated gene.
Wherein dark node is tumor type, white nodes are the corresponding mutated gene of tumor type;Dark node is bigger The mutated gene quantity that representative participates in the tumour is more, white nodes are bigger represents tumor type number relevant to the mutated gene It measures more.
Embodiment 3: with COSMIC Database tumor type-two subnetwork figure of mutated gene, specify tumor type latent New gene target (by taking non-small cell lung cancer NSCLC as an example, recommended according to COSMIC database in 2014, choose first 10, Quantity statistics of publishing an article derive from 2015-2016 NCBI Pubmed database)
As seen from the above table: the present invention calculates similar between different tumor types on the basis of public tumour database Property, it finds some potential gene targets, substantially reduces the time cost obtained by manual search.With COSMIC in 2014 For database, using present invention discover that the new latent gene target of NSCLC before 10, occur between -2016 years 2015 The article of 3 potential targets, is PIK3CA(27 articles respectively), MLH1(1 articles) and EP300(3 articles).It compares In the method for traditional manual search, the present invention is more efficient, more acurrate, while a large amount of saving experimental costs and reduction experiment are blindly Property.
It should be noted that above-mentioned is only presently preferred embodiments of the present invention, protection model not for the purpose of limiting the invention It encloses, any combination or equivalents made on the basis of the above embodiments all belong to the scope of protection of the present invention.

Claims (4)

1. a kind of method for finding tumour latent gene target by collaborative filtering public database, which is characterized in that described Method includes following operating procedure:
1) all tumours in oncogene database are established according to the information in oncogene database using the principle of graph theory Type and its bipartite graph for corresponding to mutated gene,
Wherein, two kinds of nodes are defined during establishing bipartite graph, a kind of node is tumor type, and another node is mutation Gene;Definition: tumor type X and having side between the corresponding mutated gene of tumor type X of oncogene warehouse publication, Definition: not having side between different tumor types, defines: without side between different mutated genes yet;
2) specified tumor type A is selected from bipartite graph, other all tumor types are target tumor type B, are calculated specified Jaccard value between tumor type A and target tumor type B, specific formula for calculation are as follows:
Wherein, | A | it is the mutated gene quantity of tumor type A, | B | it is the mutated gene quantity of tumor type B, | A ∩ B | be Tumor type A and tumor type B publicly-owned mutated gene quantity, | A ∪ B | to be present in tumor type A or tumor type B is publicly-owned Mutated gene quantity,
The Jaccard value calculated obtains the similarity between specified tumor type A and target tumor type B;
3) step 2 is repeated, other all target tumor type Bs in specified tumor type A and oncogene database are calculated separately Jaccard value, choose Jaccard value be greater than 0 target tumor type B1、B2、B3……Bn;
4) from selecting step 3 in oncogene database) in Jaccard value be greater than 0 target tumor type B1、B2、B3…… The mutated gene B of specified tumor type A is not present in corresponding to Bn11’… B1i’、B21’ … B2j’、 B31’ … B3j’、 …、Bn1' … Bnm';By target tumor type BiImparting target tumor type corresponding with the Jaccard value of specified tumor type A BiAll related mutation gene Bi1’… Biq';
5) by mutated gene B corresponding in step 4)11’… B1i’、B21’ … B2j’、B31’ … B3j’、…、Bn1’ … The Jaccard value of the mutated gene of the same name of Bnm ' is added;
6) mutated gene corresponding to the height arrangement target tumor type A according to Jaccard value after addition, according to Jaccard Value height judges whether mutated gene is the latent gene target for specifying tumor type;
7) document is searched, determines whether latent gene target is no studied in the field of specified tumor type A in step 6) It crosses.
2. the method according to claim 1 that tumour latent gene target is found by collaborative filtering public database, It is characterized in that, in the calculating process of the step 2), the value range of Jaccard value is 0~1, and wherein Jaccard value is got over Greatly, then it represents tumor type A and tumor type B is more similar.
3. the method according to claim 1 that tumour latent gene target is found by collaborative filtering public database, It is characterized in that, in the step 6), Jaccard value is higher after addition, and corresponding mutated gene is the latent of specified tumor type It is higher in the probability of gene target.
4. the side according to claim 1 or 2 or 3 for finding tumour latent gene target by collaborative filtering public database Method, which is characterized in that the oncogene database includes all tumor types and its common data for corresponding to gene mutation Library.
CN201610879877.1A 2016-10-09 2016-10-09 A method of tumour latent gene target is found by collaborative filtering public database Active CN106650317B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610879877.1A CN106650317B (en) 2016-10-09 2016-10-09 A method of tumour latent gene target is found by collaborative filtering public database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610879877.1A CN106650317B (en) 2016-10-09 2016-10-09 A method of tumour latent gene target is found by collaborative filtering public database

Publications (2)

Publication Number Publication Date
CN106650317A CN106650317A (en) 2017-05-10
CN106650317B true CN106650317B (en) 2019-04-16

Family

ID=58853670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610879877.1A Active CN106650317B (en) 2016-10-09 2016-10-09 A method of tumour latent gene target is found by collaborative filtering public database

Country Status (1)

Country Link
CN (1) CN106650317B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663214A (en) * 2012-05-09 2012-09-12 四川大学 Construction and prediction method of integrated drug target prediction system
CN105659087A (en) * 2013-06-13 2016-06-08 比奥德赛公司 Method of screening candidate biochemical entities targeting a target biochemical entity

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9690844B2 (en) * 2014-01-24 2017-06-27 Samsung Electronics Co., Ltd. Methods and systems for customizable clustering of sub-networks for bioinformatics and health care applications

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663214A (en) * 2012-05-09 2012-09-12 四川大学 Construction and prediction method of integrated drug target prediction system
CN105659087A (en) * 2013-06-13 2016-06-08 比奥德赛公司 Method of screening candidate biochemical entities targeting a target biochemical entity

Also Published As

Publication number Publication date
CN106650317A (en) 2017-05-10

Similar Documents

Publication Publication Date Title
Nguyen et al. TIPP: taxonomic identification and phylogenetic profiling
Mirarab et al. ASTRAL: genome-scale coalescent-based species tree estimation
Newton et al. TumorMap: exploring the molecular similarities of cancer samples in an interactive portal
Ronen et al. netSmooth: Network-smoothing based imputation for single cell RNA-seq
Wu et al. GAERF: predicting lncRNA-disease associations by graph auto-encoder and random forest
Stegle et al. Predicting and understanding the stability of G-quadruplexes
Liu et al. deSALT: fast and accurate long transcriptomic read alignment with de Bruijn graph-based index
Zhang et al. Mining heterogeneous causal effects for personalized cancer treatment
Chen et al. Identifying protein complexes in protein–protein interaction networks by using clique seeds and graph entropy
Babaei et al. Detecting recurrent gene mutation in interaction network context using multi-scale graph diffusion
Herath et al. CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision
Yang et al. MDICC: novel method for multi-omics data integration and cancer subtype identification
Huang et al. Disease characterization using a partial correlation-based sample-specific network
Zhou et al. Maximum parsimony analysis of gene copy number changes
Kundu et al. Efficient Bayesian regularization for graphical model selection
Swanson et al. A Bayesian two-way latent structure model for genomic data integration reveals few pan-genomic cluster subtypes in a breast cancer cohort
Sonpatki et al. Recursive consensus clustering for novel subtype discovery from transcriptome data
Chen et al. Optimization of deep learning models for the prediction of gene mutations using unsupervised clustering
Höglund et al. The Lund taxonomy for bladder cancer classification–from gene expression clustering to cancer cell molecular phenotypes, and back again
Dong et al. Integrating single-cell datasets with ambiguous batch information by incorporating molecular network features
CN106650317B (en) A method of tumour latent gene target is found by collaborative filtering public database
Li et al. Assisted gene expression‐based clustering with AWNCut
Shi et al. Integration of Cancer Genomics Data for Tree‐based Dimensionality Reduction and Cancer Outcome Prediction
Qin et al. Gene biomarker prediction in glioma by integrating scRNA-seq data and gene regulatory network
Zhou et al. Analysis of gene copy number changes in tumor phylogenetics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20211123

Address after: 215100 Room 405, building A7, Suzhou nano Park, No. 218, Xinghu street, Suzhou Industrial Park, Jiangsu Province

Patentee after: Shuangyun biomedical technology (Suzhou) Co.,Ltd.

Address before: Room 306, building F7, No. 9, Weidi Road, Xianlin street, Qixia District, Nanjing, Jiangsu 210000

Patentee before: NANJING SHUANGYUN BIOTECHNOLOGY CO.,LTD.

TR01 Transfer of patent right