CN106650317B - A method of tumour latent gene target is found by collaborative filtering public database - Google Patents
A method of tumour latent gene target is found by collaborative filtering public database Download PDFInfo
- Publication number
- CN106650317B CN106650317B CN201610879877.1A CN201610879877A CN106650317B CN 106650317 B CN106650317 B CN 106650317B CN 201610879877 A CN201610879877 A CN 201610879877A CN 106650317 B CN106650317 B CN 106650317B
- Authority
- CN
- China
- Prior art keywords
- tumor type
- target
- gene
- database
- tumour
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B45/00—ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioethics (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The present invention provides a kind of method for finding tumour latent gene target by collaborative filtering public database, the process employs based on known oncogene target database and belong to the Jaccard coefficient between different tumours, excavate the gene target for having been found that in certain tumour but having not found in another tumour.Method of the invention finds tumour latent gene target calibration method compared to traditional manual retrieval, and scientific research efficiency can be substantially improved, and improves the research hit rate of potential tumor target, utilizes the resource of a variety of public databases to the full extent, saves many experiments cost.
Description
Technical field
The present invention relates to the utilization technical fields of oncogene public database, more particularly to one kind to pass through collaborative filtering public affairs
The method of database discovery tumour latent gene target altogether.
Background technique
Currently, the discovery of oncogene target, which mainly passes through, reads related scientific research document progress reasonable assumption and corresponding reality
Verifying.It is constantly reformed, under the overall background that technology platform is constantly brought forth new ideas in laboratory facilities, a large amount of high-flux sequence experiment is also answered
For finding oncogene target.Therefore, the result based on a large amount of high-flux sequence data is effectively stored in corresponding swollen
On tumor database, such as COSMIC, TCGA etc..
In the tumour lane database of various different emphasis, about the database (such as COSMIC) of gene mutation, related
In different tumor types and the database of clinical manifestation (such as TCGA), about tumour medicine sensibility database (such as
GDSC), about the tumour database (such as CIViC) of prognosis.Based on the pure artificial a large amount of Relational database of lookup, it is usual because
The problems such as huge for data volume, causes to propose that reasonable assumption and corresponding experimental design are very time-consuming and inefficient.Therefore, it develops
A set of method using the public database discovery potential gene target of tumour seems very urgent out.
The object of oncogene target database includes tumor type and relevant gene target, the database of this result
Structure is suitble to Collaborative Filtering Recommendation Algorithm.In general, the principle of Collaborative Filtering Recommendation Algorithm is according between user and user
Similitude find and the potential interest of user and then make reasonable recommended, principle is two points for establishing user and commodity
Scheme (bipartite graph), the algorithm be chiefly used in electric business website according between client similitude Recommendations such as Jingdone district, wash in a pan
Treasured etc..However, Collaborative Filtering Recommendation Algorithm can also regard user and commodity as tumor type and corresponding gene target, it is different
The tumour of type can also carry out Similarity measures, to recommend some gene targets out.
Currently, for each scientific research personnel of the traditional artificial search of application, it be in quantity and information content blowout
Lane database finds potential gene target and carries out arrangement and contrived experiment, and the time that this process needs to spend is big at present
About 3-6 months, it is current and it is foreseeable over the next several years in scientific documents no matter quantitatively or all presentation refers in information content
The growth trend of numerical expression, so it is necessary to develop it is corresponding based on the collaborative filtering recommending method of database to solve tradition
Manual search process needs time-consuming too long problem.
Summary of the invention
In view of the above problems, can substantially shorten scientific research personnel it is an object of that present invention to provide one kind to carry out rationally
Assuming that, time of contrived experiment whole process, pass through the method that collaborative filtering public database finds tumour latent gene target.
In order to achieve the above object, The technical solution adopted by the invention is as follows: oncogene database is generally all comprising swollen
Tumor type and associated gene mutation these two types information, these two types of information are generally by GWAS and high throughput sequencing technologies to massive tumor
Sample analysis obtains, and has a degree of hereditary meaning.However, due to heterogeneity of tumor sample etc., in general
Only some high-frequency mutated genes could be found by above method, and low frequency mutation is then difficult to be identified.In fact,
The different mutated genes of same type tumour are to have certain relevance from hereditary meaning, in addition between different type tumour
It is often found that publicly-owned metabolic pathway.
Our bipartite graphs according to oncogene Database based on tumor type and corresponding mutated gene, two points herein
Collaborative filtering is carried out on the basis of figure, can find the latent gene target of certain tumor types.
A kind of method that tumour latent gene target is found by collaborative filtering public database provided by the invention, it is described
Method include following operating procedure:
1) principle for utilizing graph theory is established in oncogene database and is owned according to the information in oncogene database
Tumor type and its bipartite graph for corresponding to mutated gene, wherein during establishing bipartite graph, define two kinds of nodes, Yi Zhongjie
Point is tumor type, and another node is mutated gene;Definition: tumor type X and oncogene warehouse publication this is swollen
Have side between the corresponding mutated gene of tumor type X, define: there is no side between different tumor types, define: different mutated genes it
Between also without side.
2) specified tumor type A is selected from bipartite graph, other all tumor types are target tumor type B, are calculated
Jaccard value between specified tumor type A and target tumor type B, specific formula for calculation are as follows:
Wherein, | A | it is the mutated gene quantity of tumor type A, | B | it is the mutated gene quantity of tumor type B, | A ∩ B
| it is the publicly-owned mutated gene quantity of tumor type A and tumor type B, | A ∪ B | to be present in tumor type A or tumor type B
Publicly-owned mutated gene quantity, the Jaccard value calculated obtain between specified tumor type A and target tumor type B
Similarity.
3) step 2 is repeated, other all target tumors in specified tumor type A and oncogene database are calculated separately
The Jaccard value of type B chooses the target tumor type B that Jaccard value is greater than 01、B2、B3……Bn。
4) from selecting step 3 in oncogene database) in Jaccard value be greater than 0 target tumor type B1、B2、
B3... the mutated gene B of specified tumor type A is not present in corresponding to Bn11’… B1i’、B21’ … B2j’、B31’ …
B3j’ 、…、Bn1' … Bnm';By target tumor type BiImparting target corresponding with the Jaccard value of specified tumor type A is swollen
Tumor type BiAll related mutation gene Bi1’… Biq’。
5) by mutated gene B corresponding in step 4)11’… B1i’、B21’ … B2j’、 B31’ … B3j’ 、…、
Bn1' ... the Jaccard value of the mutated gene of the same name of Bnm ' is added.
6) mutated gene corresponding to the height arrangement target tumor type A according to Jaccard value after addition, according to
Jaccard value height judges whether mutated gene is the latent gene target for specifying tumor type.
7) search document, determine latent gene target in step 6) whether in the field of specified tumor type A not by
It studied.
In the calculating process of step 2 of the present invention, the value range of Jaccard value is 0~1, wherein Jaccard
Value is bigger, then represents tumor type A and tumor type B is more similar.
In step 6) of the present invention, Jaccard value is higher after addition, and corresponding mutated gene is specified tumour class
The probability of the latent gene target of type is higher.
Oncogene database of the present invention includes all tumor types and its common data for corresponding to gene mutation
Library.
The present invention has the advantages that present invention employs the collaborative filtering method discovery tumour based on tumour database is potential
Gene target substitute traditional artificial searching method.It is compared with the traditional method, search time is greatly decreased simultaneously in the present invention
Reasonable design experiment.
Detailed description of the invention
Fig. 1 is the two subnetwork figure of tumor type-mutated gene of foundation in the present invention by taking COSMIC database as an example.
Specific embodiment
The present invention is described in further detail with specific embodiment for explanation with reference to the accompanying drawing.
Oncogene database of the present invention is public database all on the market, including all tumour classes
Type and its information for corresponding to gene mutation.
Embodiment 1: a method of tumour latent gene target, the side are found by collaborative filtering public database
Method includes following operating procedure:
1) principle for utilizing graph theory is established in oncogene database and is owned according to the information in oncogene database
Tumor type and its bipartite graph for corresponding to mutated gene, wherein during establishing bipartite graph, define two kinds of nodes, Yi Zhongjie
Point is tumor type, and another node is mutated gene;Definition: tumor type X and oncogene warehouse publication this is swollen
Have side between the corresponding mutated gene of tumor type X, define: there is no side between different tumor types, define: different mutated genes it
Between also without side.
2) specified tumor type A is selected from bipartite graph, other all tumor types are target tumor type B, are calculated
Jaccard value between specified tumor type A and target tumor type B, specific formula for calculation are as follows:
Wherein, | A | it is the mutated gene quantity of tumor type A, | B | it is the mutated gene quantity of tumor type B, | A ∩ B
| it is the publicly-owned mutated gene quantity of tumor type A and tumor type B, | A ∪ B | to be present in tumor type A or tumor type B
Publicly-owned mutated gene quantity, the Jaccard value calculated obtain between specified tumor type A and target tumor type B
Similarity.The value range of Jaccard value is 0~1, and wherein Jaccard value is bigger, then represents tumor type A and tumour class
Type B is more similar.
3) step 2 is repeated, other all target tumors in specified tumor type A and oncogene database are calculated separately
The Jaccard value of type B chooses the target tumor type B that Jaccard value is greater than 01、B2、B3……Bn。
4) from selecting step 3 in oncogene database) in Jaccard value be greater than 0 target tumor type B1、B2、
B3... the mutated gene B of specified tumor type A is not present in corresponding to Bn11’… B1i’、B21’ … B2j’、B31’ …
B3j’、…、Bn1' … Bnm';By target tumor type BiImparting target corresponding with the Jaccard value of specified tumor type A is swollen
Tumor type BiAll related mutation gene Bi1’… Biq’。
5) by mutated gene B corresponding in step 4)11’… B1i’、B21’ … B2j’、 B31’ … B3j’、…、Bn1’
... the Jaccard value of the mutated gene of the same name of Bnm ' is added.
6) mutated gene corresponding to the height arrangement target tumor type A according to Jaccard value after addition, according to
Jaccard value height judges whether mutated gene is the latent gene target for specifying tumor type.
7) search document, determine latent gene target in step 6) whether in the field of specified tumor type A not by
It studied.
Embodiment 2: as shown in Figure 1, using method of the invention, by taking COSMIC database as an example, the tumor type-of foundation
Two subnetwork figure of mutated gene.
Wherein dark node is tumor type, white nodes are the corresponding mutated gene of tumor type;Dark node is bigger
The mutated gene quantity that representative participates in the tumour is more, white nodes are bigger represents tumor type number relevant to the mutated gene
It measures more.
Embodiment 3: with COSMIC Database tumor type-two subnetwork figure of mutated gene, specify tumor type latent
New gene target (by taking non-small cell lung cancer NSCLC as an example, recommended according to COSMIC database in 2014, choose first 10,
Quantity statistics of publishing an article derive from 2015-2016 NCBI Pubmed database)
As seen from the above table: the present invention calculates similar between different tumor types on the basis of public tumour database
Property, it finds some potential gene targets, substantially reduces the time cost obtained by manual search.With COSMIC in 2014
For database, using present invention discover that the new latent gene target of NSCLC before 10, occur between -2016 years 2015
The article of 3 potential targets, is PIK3CA(27 articles respectively), MLH1(1 articles) and EP300(3 articles).It compares
In the method for traditional manual search, the present invention is more efficient, more acurrate, while a large amount of saving experimental costs and reduction experiment are blindly
Property.
It should be noted that above-mentioned is only presently preferred embodiments of the present invention, protection model not for the purpose of limiting the invention
It encloses, any combination or equivalents made on the basis of the above embodiments all belong to the scope of protection of the present invention.
Claims (4)
1. a kind of method for finding tumour latent gene target by collaborative filtering public database, which is characterized in that described
Method includes following operating procedure:
1) all tumours in oncogene database are established according to the information in oncogene database using the principle of graph theory
Type and its bipartite graph for corresponding to mutated gene,
Wherein, two kinds of nodes are defined during establishing bipartite graph, a kind of node is tumor type, and another node is mutation
Gene;Definition: tumor type X and having side between the corresponding mutated gene of tumor type X of oncogene warehouse publication,
Definition: not having side between different tumor types, defines: without side between different mutated genes yet;
2) specified tumor type A is selected from bipartite graph, other all tumor types are target tumor type B, are calculated specified
Jaccard value between tumor type A and target tumor type B, specific formula for calculation are as follows:
Wherein, | A | it is the mutated gene quantity of tumor type A, | B | it is the mutated gene quantity of tumor type B, | A ∩ B | be
Tumor type A and tumor type B publicly-owned mutated gene quantity, | A ∪ B | to be present in tumor type A or tumor type B is publicly-owned
Mutated gene quantity,
The Jaccard value calculated obtains the similarity between specified tumor type A and target tumor type B;
3) step 2 is repeated, other all target tumor type Bs in specified tumor type A and oncogene database are calculated separately
Jaccard value, choose Jaccard value be greater than 0 target tumor type B1、B2、B3……Bn;
4) from selecting step 3 in oncogene database) in Jaccard value be greater than 0 target tumor type B1、B2、B3……
The mutated gene B of specified tumor type A is not present in corresponding to Bn11’… B1i’、B21’ … B2j’、 B31’ … B3j’、
…、Bn1' … Bnm';By target tumor type BiImparting target tumor type corresponding with the Jaccard value of specified tumor type A
BiAll related mutation gene Bi1’… Biq';
5) by mutated gene B corresponding in step 4)11’… B1i’、B21’ … B2j’、B31’ … B3j’、…、Bn1’ …
The Jaccard value of the mutated gene of the same name of Bnm ' is added;
6) mutated gene corresponding to the height arrangement target tumor type A according to Jaccard value after addition, according to Jaccard
Value height judges whether mutated gene is the latent gene target for specifying tumor type;
7) document is searched, determines whether latent gene target is no studied in the field of specified tumor type A in step 6)
It crosses.
2. the method according to claim 1 that tumour latent gene target is found by collaborative filtering public database,
It is characterized in that, in the calculating process of the step 2), the value range of Jaccard value is 0~1, and wherein Jaccard value is got over
Greatly, then it represents tumor type A and tumor type B is more similar.
3. the method according to claim 1 that tumour latent gene target is found by collaborative filtering public database,
It is characterized in that, in the step 6), Jaccard value is higher after addition, and corresponding mutated gene is the latent of specified tumor type
It is higher in the probability of gene target.
4. the side according to claim 1 or 2 or 3 for finding tumour latent gene target by collaborative filtering public database
Method, which is characterized in that the oncogene database includes all tumor types and its common data for corresponding to gene mutation
Library.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610879877.1A CN106650317B (en) | 2016-10-09 | 2016-10-09 | A method of tumour latent gene target is found by collaborative filtering public database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610879877.1A CN106650317B (en) | 2016-10-09 | 2016-10-09 | A method of tumour latent gene target is found by collaborative filtering public database |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106650317A CN106650317A (en) | 2017-05-10 |
CN106650317B true CN106650317B (en) | 2019-04-16 |
Family
ID=58853670
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610879877.1A Active CN106650317B (en) | 2016-10-09 | 2016-10-09 | A method of tumour latent gene target is found by collaborative filtering public database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106650317B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102663214A (en) * | 2012-05-09 | 2012-09-12 | 四川大学 | Construction and prediction method of integrated drug target prediction system |
CN105659087A (en) * | 2013-06-13 | 2016-06-08 | 比奥德赛公司 | Method of screening candidate biochemical entities targeting a target biochemical entity |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9690844B2 (en) * | 2014-01-24 | 2017-06-27 | Samsung Electronics Co., Ltd. | Methods and systems for customizable clustering of sub-networks for bioinformatics and health care applications |
-
2016
- 2016-10-09 CN CN201610879877.1A patent/CN106650317B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102663214A (en) * | 2012-05-09 | 2012-09-12 | 四川大学 | Construction and prediction method of integrated drug target prediction system |
CN105659087A (en) * | 2013-06-13 | 2016-06-08 | 比奥德赛公司 | Method of screening candidate biochemical entities targeting a target biochemical entity |
Also Published As
Publication number | Publication date |
---|---|
CN106650317A (en) | 2017-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Nguyen et al. | TIPP: taxonomic identification and phylogenetic profiling | |
Mirarab et al. | ASTRAL: genome-scale coalescent-based species tree estimation | |
Newton et al. | TumorMap: exploring the molecular similarities of cancer samples in an interactive portal | |
Ronen et al. | netSmooth: Network-smoothing based imputation for single cell RNA-seq | |
Wu et al. | GAERF: predicting lncRNA-disease associations by graph auto-encoder and random forest | |
Stegle et al. | Predicting and understanding the stability of G-quadruplexes | |
Liu et al. | deSALT: fast and accurate long transcriptomic read alignment with de Bruijn graph-based index | |
Zhang et al. | Mining heterogeneous causal effects for personalized cancer treatment | |
Chen et al. | Identifying protein complexes in protein–protein interaction networks by using clique seeds and graph entropy | |
Babaei et al. | Detecting recurrent gene mutation in interaction network context using multi-scale graph diffusion | |
Herath et al. | CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision | |
Yang et al. | MDICC: novel method for multi-omics data integration and cancer subtype identification | |
Huang et al. | Disease characterization using a partial correlation-based sample-specific network | |
Zhou et al. | Maximum parsimony analysis of gene copy number changes | |
Kundu et al. | Efficient Bayesian regularization for graphical model selection | |
Swanson et al. | A Bayesian two-way latent structure model for genomic data integration reveals few pan-genomic cluster subtypes in a breast cancer cohort | |
Sonpatki et al. | Recursive consensus clustering for novel subtype discovery from transcriptome data | |
Chen et al. | Optimization of deep learning models for the prediction of gene mutations using unsupervised clustering | |
Höglund et al. | The Lund taxonomy for bladder cancer classification–from gene expression clustering to cancer cell molecular phenotypes, and back again | |
Dong et al. | Integrating single-cell datasets with ambiguous batch information by incorporating molecular network features | |
CN106650317B (en) | A method of tumour latent gene target is found by collaborative filtering public database | |
Li et al. | Assisted gene expression‐based clustering with AWNCut | |
Shi et al. | Integration of Cancer Genomics Data for Tree‐based Dimensionality Reduction and Cancer Outcome Prediction | |
Qin et al. | Gene biomarker prediction in glioma by integrating scRNA-seq data and gene regulatory network | |
Zhou et al. | Analysis of gene copy number changes in tumor phylogenetics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20211123 Address after: 215100 Room 405, building A7, Suzhou nano Park, No. 218, Xinghu street, Suzhou Industrial Park, Jiangsu Province Patentee after: Shuangyun biomedical technology (Suzhou) Co.,Ltd. Address before: Room 306, building F7, No. 9, Weidi Road, Xianlin street, Qixia District, Nanjing, Jiangsu 210000 Patentee before: NANJING SHUANGYUN BIOTECHNOLOGY CO.,LTD. |
|
TR01 | Transfer of patent right |