CN103778350B - Somatic cell copy number based on Two-dimensional Statistical model variation significance detection method - Google Patents
Somatic cell copy number based on Two-dimensional Statistical model variation significance detection method Download PDFInfo
- Publication number
- CN103778350B CN103778350B CN201410010002.9A CN201410010002A CN103778350B CN 103778350 B CN103778350 B CN 103778350B CN 201410010002 A CN201410010002 A CN 201410010002A CN 103778350 B CN103778350 B CN 103778350B
- Authority
- CN
- China
- Prior art keywords
- scna
- statistic
- random
- dimensional
- amplitude
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 15
- 210000001082 somatic cell Anatomy 0.000 title claims abstract description 10
- 238000013179 statistical model Methods 0.000 title claims abstract description 10
- 238000010276 construction Methods 0.000 claims abstract description 17
- 238000006073 displacement reaction Methods 0.000 claims abstract description 17
- 239000004744 fabric Substances 0.000 claims abstract description 13
- 206010028980 Neoplasm Diseases 0.000 claims abstract description 8
- 201000011510 cancer Diseases 0.000 claims abstract description 8
- 210000000349 chromosome Anatomy 0.000 claims abstract description 7
- 238000000034 method Methods 0.000 claims description 13
- 238000012360 testing method Methods 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 6
- 230000003321 amplification Effects 0.000 claims description 5
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 5
- 230000006870 function Effects 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 3
- 238000000638 solvent extraction Methods 0.000 claims description 3
- 108700019961 Neoplasm Genes Proteins 0.000 description 3
- 102000048850 Neoplasm Genes Human genes 0.000 description 3
- 210000004027 cell Anatomy 0.000 description 3
- 238000007689 inspection Methods 0.000 description 3
- 238000009826 distribution Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000008506 pathogenesis Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000001568 sexual effect Effects 0.000 description 1
- 230000000392 somatic effect Effects 0.000 description 1
Landscapes
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Apparatus Associated With Microorganisms And Enzymes (AREA)
Abstract
A kind of somatic cell copy number based on Two-dimensional Statistical model variation significance detection method, it includes, S1 gathers SCNA data, and SCNA data are carried out pretreatment;S2 calculates SCNA coefficient of relationship adjacent between site, and chromosome is divided into multiple relatively independent SCNA construction unit;S3 calculates the statistic of each SCNA construction unit, and implements two-dimensional random displacement on full-length genome;S4, for the different length L of SCNA construction unit, is the statistic of the SCNA pattern of L by calculating random length in displacement sample, constructs zero cloth D based on L in two-dimensional spaceL;By statistic and the D of corresponding SCNALContrast, by the statistic of described SCNA and described DLIt is designated as p value;If p value is less than the threshold value set, then corresponding SCNA is notable, has potential cancer function.
Description
Technical field
A kind of somatic cell copy number based on Two-dimensional Statistical model of present invention variation significance detection method.
Background technology
Somatic cell copy number variation (somatic copy number alteration, SCNA) is the important phenomenon in cancer gene group.It mainly shows as amplification and the disappearance two states of copy number, and generation, development with cancerous cell have close ties.Therefore, SCNA carries out system analyzing is to study the pathogenesis of cancer from molecular level to provide important channel, and its bottom, most crucial problem are the SCNA patterns and the random SCNA occurred how distinguished and have cancer function.
Numerous researchs show, SCNA functional mode is often implied in the consistent variation region of cancer gene group sample, so set up the computational methods based on theory of statistics, detection SCNA repeats (Recurrent) significance level occurred in multiple samples, for identifying SCNA functional mode and finding that potential cancer gene provides direct, feasible technological means, and then provide important information for biological physician to prediction and the diagnosis of cancer.Therefore, rationally and effectively statistical inspection model is set up most important.
The intensive in high flux full-length genome SCNA site and the complexity of structure thereof, bring challenge greatly to the foundation of statistical inspection model and the detection of SCNA significance, be mainly reflected in following two aspect.First, the difficult point of problem itself: a) number of loci is up to more than 180 ten thousand and sample number is the most less, defines the data general layout of a kind of high latitude small sample;B) there is stronger relatedness between SCNA site, and dependent so that there is reciprocal effect between detecting factor;C) copy number amplification or miss status include both sides feature, i.e. variation frequency and variation amplitude, and this requires the mechanism of a rational balance the two feature;D) length of SCNA structural models is not quite similar, and this requires that the SCNA considering different length has different background distributions.Second, solve theory and the challenge of method of problem: a) data scale is big, the effectively control to calculating Time & Space Complexity is a challenge;B) how to take into full account the relatedness between SCNA site, reduce the conservative that SCNA significance level is estimated, be a difficulties;C) how to set up null hypothesis consistent with statistic distribution, strengthen the statistical significance that significance level is estimated, be an emphasis and the problem not yet broken through at present.
Summary of the invention
In order to solve the problems referred to above, a kind of somatic cell copy number based on Two-dimensional Statistical model of present invention variation significance detection method, it is characterised in that: it includes,
S1 gathers SCNA data, and SCNA data are carried out pretreatment;
S2 calculates SCNA coefficient of relationship adjacent between site, and chromosome is divided into multiple relatively independent SCNA construction unit;
S3 calculates the statistic of each SCNA construction unit, and implements two-dimensional random displacement on full-length genome;
S4, for the different length L of SCNA construction unit, is the statistic of the SCNA pattern of L by calculating random length in displacement sample, constructs zero cloth D based on L in two-dimensional spaceL;By statistic and the D of corresponding SCNALContrast, by the statistic of described SCNA and described DLIt is designated as p value;If p value is less than the threshold value set, then corresponding SCNA is notable, has potential cancer function.
On the basis of technique scheme, described step S1 includes:
SCNA signal is processed, the SCNA signal that can contrast with acquisition;Utilize partitioning algorithm that noise is processed, and define SCNA amplification and miss status.
On the basis of technique scheme, described step S2 includes: utilize Pearson formula to calculate SCNA coefficient of relationship adjacent between site, and chromosome is divided into multiple relatively independent SCNA construction unit.
On the basis of technique scheme, step S3 includes
Utilize known SCNA functional mode structure training set, learn frequency w1Weight w with amplitude2, counting statistics amount,
Stest=w1*f+w2*a
Wherein, f, a, StestRefer to the value of the frequency of SCNA functional mode, amplitude, and statistic in training set respectively.
On the basis of technique scheme, described step S3 also includes:
Described two-dimensional random displacement detailed process is as follows:
A) frequency occurred for SCNA, its position occurred in full-length genome of random permutation;For each displacement sample set, calculate the occurrence frequency of random SCNA, set up zero cloth D based on frequencyf;
B) for the variation amplitude of SCNA, the position that random permutation amplitude occurs in full-length genome;For each displacement sample set, calculate the amplitude of random SCNA, set up zero cloth D based on amplitudea;
C) weight of supervised learning, w are utilized1And w2, construct zero cloth D, with the significance level of detection statistic:
Wherein D=w1*Df+w2*Da。
Compared with prior art, feature of both copy number of the present invention variation: variation frequency and variation amplitude, all there is important biological meaning, then construct statistic based on the two feature and significance level that statistical inspection model is conducive to objective estimation copy number to make a variation;And prior art the most only emphasizes copy number variation frequency, easily ignore the importance of variation amplitude;For this, the present invention is on the feature space of these two aspects, set up Two-dimensional Statistical testing model, and by supervised learning strategy balance the two feature with reasonably counting statistics amount, this not only makes hypothesis testing model and statistic have concordance, and can strengthen statistics and the biological double meaning that significance level is estimated.
Accompanying drawing explanation
Fig. 1 is the flow chart of the present invention.
Detailed description of the invention
Refer to Fig. 1, a kind of somatic cell copy number based on Two-dimensional Statistical model variation significance detection method, it is characterised in that: it includes,
S1 gathers SCNA data, and SCNA data are carried out pretreatment;
S2 calculates SCNA coefficient of relationship adjacent between site, and chromosome is divided into multiple relatively independent SCNA construction unit;
S3 calculates the statistic of each SCNA construction unit, and implements two-dimensional random displacement on full-length genome;
S4, for the different length L of SCNA construction unit, is the statistic of the SCNA pattern of L by calculating random length in displacement sample, constructs zero cloth D based on L in two-dimensional spaceL;By statistic and the D of corresponding SCNALContrast, by the statistic of described SCNA and described DLIt is designated as p value;If p value is less than the threshold value set, then corresponding SCNA is notable, has potential cancer function.
On the basis of technique scheme, described step S1 includes:
SCNA signal is processed, the SCNA signal that can contrast with acquisition;Utilize partitioning algorithm that noise is processed, and define SCNA amplification and miss status.SCNA Signal Pretreatment refers to be standardized signal and Logarithm conversion, i.e. for each cancer sample, the copy number variability signals of the normal structure matched with it by its copy number signal is compared, and set up a sample for reference based on the sample set analyzed, so that all of sample to be standardized.So can weaken the Batch effect existed between different sample, eliminate the sexual cell impact on SCNA signal simultaneously.
On the basis of technique scheme, described step S2 includes: utilize Pearson formula to calculate SCNA coefficient of relationship adjacent between site, and chromosome is divided into multiple relatively independent SCNA construction unit.
On the basis of technique scheme, step S3 includes
Utilize known SCNA functional mode structure training set, learn frequency w1Weight w with amplitude2, counting statistics amount,
Stest=w1*f+w2*a
Wherein, f, a, StestRefer to the value of the frequency of SCNA functional mode, amplitude, and statistic in training set respectively.
On the basis of technique scheme, described step S3 also includes:
Described two-dimensional random displacement detailed process is as follows:
A) frequency occurred for SCNA, its position occurred in full-length genome of random permutation;For each displacement sample set, calculate the occurrence frequency of random SCNA, set up zero cloth D based on frequencyf;
B) for the variation amplitude of SCNA, the position that random permutation amplitude occurs in full-length genome;For each displacement sample set, calculate the amplitude of random SCNA, set up zero cloth D based on amplitudea
;
C) weight of supervised learning, w are utilized1And w2, construct zero cloth D, with the significance level of detection statistic:
Wherein D=w1*Df+w2*Da。
Meanwhile, the performance of algorithm is evaluated by three below aspect of the present invention: a) can evaluation algorithm in the case of false positive rate (FPR) be controlled, it is thus achieved that higher valid positive rate (TPR);B) whether evaluation algorithms can accurately estimate p value (Type I Error Rate), i.e. whether the statistical model of algorithm has stronger statistical significance;C) computation complexity of parser.For this, we intend with the normal cell copy number of Affymetrix full-length genome SNP6.0 chip detection as background, with theory of probability and nonstationary model basis, build markov SCNA emulation mode, simulate large-scale SCNA data, the method performance of the present invention is tested.For c), analyzing theoretically, SCNA construction unit number is more much smaller than number of sites, therefore Replacement Strategy based on construction unit spends calculating time much less than Replacement Strategy based on site, and therefore the time complexity of algorithm is relatively low.
In sum, only the preferred embodiments of the invention, do not limit protection scope of the present invention with this, all equivalence changes made according to the scope of the claims of the present invention and description with modify, be all within the scope of patent of the present invention contains.
Claims (3)
1. somatic cell copy number based on a Two-dimensional Statistical model variation significance detection method, it is characterised in that: it includes,
S1 gathers SCNA data, and SCNA data are carried out pretreatment;
S2 calculates SCNA coefficient of relationship adjacent between site, and chromosome is divided into multiple relatively independent SCNA construction unit;
S3 calculates the statistic of each SCNA construction unit, and implements two-dimensional random displacement on full-length genome;Utilize known SCNA functional mode structure training set, weight w of study frequency1Weight w with amplitude2, counting statistics amount,
Stest=w1*f+w2*a
Wherein, f, a, StestRefer to the value of the frequency of SCNA functional mode, amplitude, and statistic in training set respectively;
Described two-dimensional random displacement detailed process is as follows:
A) frequency occurred for SCNA, its position occurred in full-length genome of random permutation;For each displacement sample set, calculate the occurrence frequency of random SCNA, set up zero cloth D based on frequencyf;
B) for the variation amplitude of SCNA, the position that random permutation amplitude occurs in full-length genome;For each displacement sample set, calculate the amplitude of random SCNA, set up zero cloth D based on amplitudea;
C) weight of supervised learning, w are utilized1And w2, construct zero cloth D, with the significance level of detection statistic:
Wherein D=w1*Df+w2*Da;
S4, for the different length L of SCNA construction unit, is the statistic of the SCNA pattern of L by calculating random length in displacement sample, constructs zero cloth D based on L in two-dimensional spaceL;By statistic and the D of corresponding SCNALContrast, by the statistic of described SCNA and described DLIt is designated as p value;If p value is less than the threshold value set, then corresponding SCNA is notable, has potential cancer function.
A kind of somatic cell copy number based on Two-dimensional Statistical model variation significance detection method, it is characterised in that: described step S1 includes:
SCNA signal is carried out pretreatment, the SCNA signal that can contrast with acquisition;Utilize partitioning algorithm that noise is processed, and define SCNA amplification and miss status.
A kind of somatic cell copy number based on Two-dimensional Statistical model variation significance detection method, it is characterized in that: described step S2 includes: utilize Pearson formula to calculate SCNA coefficient of relationship adjacent between site, and chromosome is divided into multiple relatively independent SCNA construction unit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410010002.9A CN103778350B (en) | 2014-01-09 | 2014-01-09 | Somatic cell copy number based on Two-dimensional Statistical model variation significance detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410010002.9A CN103778350B (en) | 2014-01-09 | 2014-01-09 | Somatic cell copy number based on Two-dimensional Statistical model variation significance detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103778350A CN103778350A (en) | 2014-05-07 |
CN103778350B true CN103778350B (en) | 2016-10-05 |
Family
ID=50570578
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410010002.9A Expired - Fee Related CN103778350B (en) | 2014-01-09 | 2014-01-09 | Somatic cell copy number based on Two-dimensional Statistical model variation significance detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103778350B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016025892A1 (en) * | 2014-08-15 | 2016-02-18 | Life Technologies Corporation | Methods and systems for detecting minor variants in a sample of genetic material |
CN105760712B (en) * | 2016-03-01 | 2019-03-26 | 西安电子科技大学 | A kind of copy number mutation detection method based on new-generation sequencing |
CN106682455B (en) * | 2016-11-24 | 2019-03-26 | 西安电子科技大学 | A kind of Statistical Identifying Method of multisample copy number consistency variable region |
CN106650312B (en) * | 2016-12-29 | 2022-05-17 | 浙江安诺优达生物科技有限公司 | Device for detecting copy number variation of circulating tumor DNA |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5919624A (en) * | 1997-01-10 | 1999-07-06 | The United States Of America As Represented By The Department Of Health & Human Services | Methods for detecting cervical cancer |
CN102103750A (en) * | 2011-01-07 | 2011-06-22 | 杭州电子科技大学 | Vision significance detection method based on Weber's law and center-periphery hypothesis |
CN103093119A (en) * | 2013-01-24 | 2013-05-08 | 南京大学 | Method for recognizing significant biologic pathway through utilization of network structural information |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7822555B2 (en) * | 2002-11-11 | 2010-10-26 | Affymetrix, Inc. | Methods for identifying DNA copy number changes |
-
2014
- 2014-01-09 CN CN201410010002.9A patent/CN103778350B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5919624A (en) * | 1997-01-10 | 1999-07-06 | The United States Of America As Represented By The Department Of Health & Human Services | Methods for detecting cervical cancer |
CN102103750A (en) * | 2011-01-07 | 2011-06-22 | 杭州电子科技大学 | Vision significance detection method based on Weber's law and center-periphery hypothesis |
CN103093119A (en) * | 2013-01-24 | 2013-05-08 | 南京大学 | Method for recognizing significant biologic pathway through utilization of network structural information |
Non-Patent Citations (4)
Title |
---|
"a faster circular binary segmention algorithm for the analysis of CGH data";E.S.Venkatraman etal;《original paper》;20070118;第23卷(第6期);第657-663页 * |
"改进的基因拷贝数变异检测算法";李平等;《计算机工程》;20130131;第39卷(第1期);第309-312页 * |
Vonn walter etal."DiNAMIC: Amethod to identify recurrent DNA copy number aberrations in tumors".《Bioinformatics》.2010,第27卷(第5期),第678-685页. * |
Xiguo Yuan etal."TAG: A method to identify significant consensus events of copy number alterations in cancer".《PloSone》.2012,第7卷(第7期),第1-10页. * |
Also Published As
Publication number | Publication date |
---|---|
CN103778350A (en) | 2014-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105760712B (en) | A kind of copy number mutation detection method based on new-generation sequencing | |
US11507049B2 (en) | Method for detecting abnormity in unsupervised industrial system based on deep transfer learning | |
CN103778350B (en) | Somatic cell copy number based on Two-dimensional Statistical model variation significance detection method | |
CN103544392B (en) | Medical science Gas Distinguishing Method based on degree of depth study | |
Brill et al. | Testing for differential abundance in compositional counts data, with application to microbiome studies | |
Bao et al. | One-dimensional convolutional neural network for damage detection of jacket-type offshore platforms | |
CN104318059B (en) | Method for tracking target and tracking system for non-linear Gaussian Systems | |
CN103245907B (en) | A kind of analog-circuit fault diagnosis method | |
CN104820993B (en) | It is a kind of to combine particle filter and track the underwater weak signal target tracking for putting preceding detection | |
CN102829967A (en) | Time-domain fault identifying method based on coefficient variation of regression model | |
CN111562108A (en) | Rolling bearing intelligent fault diagnosis method based on CNN and FCMC | |
CN112949387B (en) | Intelligent anti-interference target detection method based on transfer learning | |
CN104330721A (en) | Integrated circuit hardware Trojan horse detection method and integrated circuit hardware Trojan horse detection system | |
CN108549908A (en) | Chemical process fault detection method based on more sampled probability core principle component models | |
CN103323228A (en) | Mining drill fault intelligent identification method | |
CN113239022B (en) | Method and device for complementing missing data in medical diagnosis, electronic device and medium | |
CN105447243A (en) | Weak signal detection method based on adaptive fractional order stochastic resonance system | |
CN103885867A (en) | Online evaluation method of performance of analog circuit | |
CN117495640A (en) | Regional carbon emission prediction method and system | |
He et al. | Study on missing data imputation and modeling for the leaching process | |
CN115310499B (en) | Industrial equipment fault diagnosis system and method based on data fusion | |
Huang et al. | Threshold-optimized swarm decomposition using grey wolf optimizer for the acoustic-based internal defect detection of arc magnets | |
Tang et al. | A rolling bearing signal model based on a correlation probability box | |
Mills et al. | Phase space sampling and operator confidence with generative adversarial networks | |
CN112651168B (en) | Construction land area prediction method based on improved neural network algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20161005 |
|
CF01 | Termination of patent right due to non-payment of annual fee |