CN101539967A - Method for detecting mononucleotide polymorphism - Google Patents

Method for detecting mononucleotide polymorphism Download PDF

Info

Publication number
CN101539967A
CN101539967A CN200810218343A CN200810218343A CN101539967A CN 101539967 A CN101539967 A CN 101539967A CN 200810218343 A CN200810218343 A CN 200810218343A CN 200810218343 A CN200810218343 A CN 200810218343A CN 101539967 A CN101539967 A CN 101539967A
Authority
CN
China
Prior art keywords
order
checking
probability
site
gene group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200810218343A
Other languages
Chinese (zh)
Other versions
CN101539967B (en
Inventor
李英睿
李瑞强
方晓东
李松岗
余昶
王俊
杨焕明
汪建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WUHAN BGI MEDICAL LABORATORY CO., LTD.
Original Assignee
BGI Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BGI Shenzhen Co Ltd filed Critical BGI Shenzhen Co Ltd
Priority to CN200810218343XA priority Critical patent/CN101539967B/en
Publication of CN101539967A publication Critical patent/CN101539967A/en
Application granted granted Critical
Publication of CN101539967B publication Critical patent/CN101539967B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention is applicable to the field of biological engineering and provides a method for detecting mononucleotide polymorphism. The method comprises the following steps: sequencing fragments obtained by high throughput sequencing technology are compared on a referenced genome sequence; the likelihood ratio of various genotypes of the corresponding sites on the genome to be tested is obtained according to the sequencing mass fraction of each basic group in the genome to be tested and obtained by sequencing; the posterior probability of each genotype of each site on the referenced genome is calculated according to the likelihood ratio and the prior probability preset for each genotype, and the genotype which has the highest posterior probability is determined as the most likely right genotype of the corresponding sites on the genome to be tested to obtain the consistent sequence of the genome to be tested; and the sites of the genome to be tested, which are inconsistent with the sequence of the referenced genome in the consistent sequence are detected to obtain the polymorphism sites of the genome to be tested. The embodiment of the invention can achieve a more accurate test result as the influence of the prior probability on mononucleotide polymorphism test result is considered in advance.

Description

A kind of method for detecting single nucleotide polymorphism
Technical field
The invention belongs to bioengineering field, relate in particular to a kind of method for detecting single nucleotide polymorphism and system.
Background technology
(Single Nucleotide Polymorphism SNP), promptly refers to the polymorphism of the nucleotide sequence that the change owing to single nucleotide base causes to single nucleotide polymorphism.In the nucleotide sequence in same chromosome of Different Individual or same site, most nucleotide sequences are consistent and have only a phenomenon that base is different, are exactly SNP.Because SNP quantity in human genome is more, occurrence frequency is higher, therefore be considered to the science of heredity mark of new generation after little satellite, very high researching value and application prospect all arranged in research fields such as Medical Genetics, pharmacogenetics, disease genetic, medical diagnosis on disease and human evolutions.
Since two more than ten years, Sanger method order-checking and fluorescence electrophoresis technology always in the dna sequencing field in occupation of leading position, use the pcr amplification method of shotgun strategy at random or target area to carry out a large amount of SNP detections.All identify by these methods in most of SNP site in the existing dbSNP database.It is that order-checking fragment and genome reference sequences are compared that shotgun sequencing carries out the standard method that SNP detects, and gives a mark according to the quality of base and to filter out low-quality order-checking mistake, obtains comparatively believable SNP result.Use is directly checked order from the pcr amplification sequence of dliploid sample, again by chromatogram is analyzed, detects the pleomorphism site of heterozygosis, also is common method, and main software has SNPdetector, novoSNP, PolyPhred and PolyScan.
Compare with the order-checking of traditional capillary electrophoresis, new-generation sequencing technology such as Illumina GenomeAnalyzer (GA), AB SOLiD, and Roche 454FLX system improved sequencing throughput significantly, greatly reduced cost.Order-checking fragment that can to produce about 40,000,000 length be 50bp that Illumina GA once moves.The sequencing throughput of this superelevation makes the sequencing technologies of a new generation be particularly suitable for carrying out the extensive individual preface of resurveying on the basis of known reference genome sequence, thereby carries out the research of genetic mutation.By current, use new sequencing technologies to finish two people's gene order-checking: the individual gene order-checking of James Watson (Roche 454FLX) and first Asian gene order-checking (Illumina GA).In addition, international thousand people's gene batch totals are drawn executive committee and are also determined to use the sort sequencer technology to checking order from global 1000 individual genomes, obtain the most detailed human genome variation collection of illustrative plates.
Development along with new sequencing technologies, corresponding SNP detection method has also had good development, yet, because the order-checking fragment that new sequencing technologies produces is compared with the past significant difference is arranged, so existing SNP sequence measurement is difficult to satisfy the requirement of high throughput sequencing technologies.
Summary of the invention
The object of the present invention is to provide a kind of method for detecting single nucleotide polymorphism, be intended to solve the problem that existing method for detecting single nucleotide polymorphism is difficult to satisfy the requirement of high throughput sequencing technologies.
The present invention is achieved in that a kind of method for detecting single nucleotide polymorphism, and described method comprises the steps:
The order-checking fragment that high throughput sequencing technologies is obtained is compared with reference to genome sequence and is listed;
The sequencing quality mark of each base in the testing gene group that obtains according to order-checking obtains the likelihood probability of the range gene type in corresponding site on the testing gene group;
According to described likelihood probability be the default prior probability of every kind of genotype, calculating is with reference to every kind of genotypic posterior probability on each site on the genome, and the genotype that posterior probability is the highest is defined as the most possible correct genotype in the corresponding site of testing gene group, obtains the concensus sequence of testing gene group;
With the inconsistent site of reference genome sequence, obtain the pleomorphism site in the testing gene group in the concensus sequence of detection testing gene group.
In embodiments of the present invention, calculate the likelihood probability of the range gene type in corresponding site on the testing gene group according to the massfraction of order-checking fragment, again according to likelihood probability be the default prior probability of every kind of genotype, calculating is with reference to every kind of genotypic posterior probability on each site on the genome, and the genotype that posterior probability is the highest is defined as the most possible correct genotype in the corresponding site of testing gene group, obtain the concensus sequence of testing gene group, with the inconsistent site of reference genome sequence, obtain the pleomorphism site in the mononucleotide in the testing gene group in the concensus sequence of detection testing gene group.The embodiment of the invention has been considered the influence of prior probability to testing result when detecting single nucleotide polymorphism, thereby makes the testing result of the embodiment of the invention more accurate, is applicable to high throughput sequencing technologies.
Description of drawings
Fig. 1 is the realization flow figure of the method for detecting single nucleotide polymorphism that provides of the embodiment of the invention;
Fig. 2 is the realization flow figure of the method for detecting single nucleotide polymorphism that provides of another embodiment of the present invention;
Fig. 3 a to Fig. 3 d is the accuracy and the uniqueness synoptic diagram of order-checking fragment comparison;
The inaccuracy of the sequencing quality mark that Fig. 4 a and Fig. 4 b embodiment of the invention provide and the skewed popularity synoptic diagram of mispairing;
Fig. 5 is that the sequencing quality mark that the embodiment of the invention provides is proofreaied and correct the distribution schematic diagram of counting out that rises of front and back;
Fig. 6 is the coverage and the error rate synoptic diagram in full genome coverage, Genotyping site under the different quality threshold values that the embodiment of the invention provides.
Embodiment
In order to make purpose of the present invention, technical scheme and advantage clearer,, the present invention is further elaborated below in conjunction with drawings and Examples.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.
In embodiments of the present invention, the order-checking fragment that high throughput sequencing technologies obtains is compared on the reference genome, the massfraction of each base in the testing gene group that order-checking is obtained, calculate the likelihood probability of range gene type on each position of testing gene group, according to this likelihood probability be that the default prior probability of every kind of genotype calculates every kind of genotypic posterior probability, the genotype that posterior probability is the highest is defined as the most possible correct genotype in this site of testing gene group, obtain the concensus sequence of testing gene group, by in the concensus sequence that detects the testing gene group with the inconsistent site of reference genome sequence, can detect the pleomorphism site in the testing gene group.
Fig. 1 shows the realization flow of the method for detecting single nucleotide polymorphism that the embodiment of the invention provides, and details are as follows:
In step S101, the order-checking fragment that high throughput sequencing technologies is obtained is compared with reference to genome sequence and is listed.
In embodiments of the present invention, high throughput sequencing technologies can be Illumina GA sequencing technologies, also can be existing other high throughput sequencing technologies.Can be by any short sequence mapping program, as mapping programs such as soap, the order-checking fragment that high throughput sequencing technologies is obtained is compared with reference to genome sequence and is listed.
In step S102, the sequencing quality mark of each base is converted to the base mispairing rate according to the definition of PHRED mark in the testing gene group that order-checking is obtained, and long-pending according to mispairing rate to all order-checking fragment bases of comparing each site on the reference genome, obtain the likelihood probability of the range gene type in corresponding site on the testing gene group.
In step S103, according to likelihood probability be the default prior probability of every kind of genotype, calculating is with reference to every kind of genotypic posterior probability on each site on the genome, the genotype that posterior probability is the highest is defined as the most possible correct genotype in the corresponding site of testing gene group, obtains the concensus sequence of testing gene group.
Wherein every kind of genotypic prior probability can rule of thumb be worth and sets in advance, and as a given available reference genome, can estimate that testing gene group sequence is with respect to reference to genomic mutation rate.For example, the probability that most positions per generations undergos mutation on the human genome is 10 -8The order of magnitude, and two individualities and its common ancestor's difference approximately will form through the time in 10000 generations.The haplochromosome SNP probability of calculating two people that come out in view of the above is approximately 0.001.Each monoploid probably has the difference of 1000 nucleotide with reference sequences.Suppose that people's reference genome sequence shows 0.00001 error rate, then this error rate can be ignored with respect to the incidence of true pleomorphism site.Comprehensive these data can estimate that the isozygoty probability of SNP of amphiploid is 0.0005, and the probability of generation heterozygosis SNP is 0.001.
To the research of NCBI dbSNPs, the frequency that transition mutations takes place was 4 times of transversion according in the past, but but almost was that equifrequent occurs between all types of in these two kinds of variations.Thus, in our SNP detection model, these ratios have been used.For example, if certain loci gene type is G on the reference sequences, monoploid base type may be A, the prior probability of C and T all is 6.67E-4, and the prior probability of G is 0.999, and be 0.9985 for the probability that GG combination appears in amphiploid, AA is 3.33E-4, and TT is 8.33E-5, and AC and AT are 1.11E-7, GC and GT are 1.67E-4, AG is 6.67E-4, and CT is 2.78E-8, shown in table 1.1.
A C G T
A 3.33E-4 1.11E-7 6.67E-4 1.11E-7
C 8.33E-5 1.67E-4 2.78E-8
G 0.9985 1.67E-4
T 8.33E-5
According to above-mentioned analysis, can be that every kind of genotype is provided with prior probability according to the testing gene group to prior probability.
According to likelihood probability be the default prior probability of every kind of genotype, calculating is with reference on each site on the genome during every kind of genotypic posterior probability, according to likelihood probability be the default prior probability of every kind of genotype, adopt Bayesian formula to calculate with reference to every kind of genotypic posterior probability on each site on the genome, the genotype that posterior probability is the highest obtains the concensus sequence of testing gene group as the most possible correct genotype in the corresponding site of testing gene group.Wherein the probability that this genotype is correct is its posterior probability shared ratio in all genotype posterior probability sums.
In step S104, by in the concensus sequence that detects the testing gene group with the inconsistent site of reference genome sequence, detect the pleomorphism site in the mononucleotide in the testing gene group.
In embodiments of the present invention, with in the concensus sequence of testing gene group with the inconsistent site of reference genome sequence as potential pleomorphism site, thereby detect the polymorphism of mononucleotide in the testing gene group.
In order to make the SNP testing result more accurate, in another embodiment of the present invention, this method also comprises the steps:
The probability that genotype among the step S103 is correct is converted to massfraction, and by massfraction is provided with threshold values, filter out the pleomorphism site of massfraction in the detected potential pleomorphism site less than the threshold values that is provided with, thereby from detected potential pleomorphism site, filter out the bigger pleomorphism site of assurance, thereby optimize the accuracy of the pleomorphism site that is extracted.
In order further to optimize the accuracy of the pleomorphism site that is extracted, in another embodiment of the present invention, this method also comprises the steps:
Number to the order-checking fragment of supporting polymorphism base type is set threshold values, filters out the pleomorphism site of the number of the order-checking fragment of supporting polymorphism base type less than pre-set threshold value, thereby filters out the mismatched bases type of the overwhelming majority.Illustrate as follows, if the number of order-checking fragment of supporting detected potential pleomorphism site then filters out this potential pleomorphism site less than default threshold values.
Because the mistake in the order-checking process and the comparison mistake of short-movie section all may cause the appearance of difference between sequence to be measured and the reference sequences, and the order-checking fragment length that high throughput sequencing technologies obtains is very short, have more order-checking segment so and the comparison mistake occurs, wrong SNP therefore just occurred.In embodiments of the present invention, by the minimum number of the order-checking fragment of supporting polymorphism base type is set, can filter out the mismatched bases type of the overwhelming majority.
Below with test figure explanation, by the minimum number to the order-checking fragment of supporting polymorphism base type is set, filter out the mismatched bases type of the overwhelming majority.
Error rate by calculating base from simulated data is to detect the SNPs of wrong identification.At least 95% wrong base has all only occurred once in the order-checking fragment (35bp) of unidirectional order-checking and the order-checking fragment of two-way order-checking (inserting fragment 200bp).Based on these data, by all low-frequency wrong bases (being made as 4 usually) of a threshold value filtering are set.Search SNPs with this method on genome of Yan Di and Huang Di, two legendary rulers of remote antiquity, the result has only about 0.036% wrong allele not to be detected.In the simulation of random dna fragment, 36 times of sequencing datas have 0.008% order-checking mistake greatly down approximately not by filtering.In order to distinguish mistake with high frequency base type and real heterozygosis pleomorphism site, we utilize binomial distribution (P=0.0001) to detect the difference of these two kinds of base types, find in the remaining wrong base type 87.3% all by filtering.Altogether 99.93% since the wrong base type that causes of order-checking mistake of order-checking fragment all by filtering.
Contain the insertion deletion in the order-checking fragment owing to cause the factor of SNP mistake also to comprise, this mistake is relevant with the comparison mode of order-checking fragment.Because 5~10 times of the insertion of SNPs quantity the chances are small fragment deletion have been paid the utmost attention to the comparison situation that does not add gap in the comparison, this class contains the order-checking fragment of inserting deletion may compare failure.For fear of the problems referred to above, in another embodiment of the present invention, this method also comprises the steps:
Distance between the more detected potential pleomorphism site, and filter out the pleomorphism site of distance less than pre-set threshold value, contain the order-checking fragment of inserting deletion really thereby filter out.
We have simulated 10000 small fragments and have inserted the potential impact that the SNPs detection is estimated in deletion, found that 0.6% the reads that inserts deletion that comprises does not detect.If we require at least 4 reads to support just believable words, have only 3 (0.03%) individual wrong SNP allele to be detected.
Fig. 2 shows the realization flow of the method for detecting single nucleotide polymorphism that another embodiment of the present invention provides, and details are as follows:
In step S201, the order-checking fragment that high throughput sequencing technologies is obtained is compared with reference to genome sequence and is listed.
In step S202, the sequencing quality mark of each base is proofreaied and correct in the testing gene group that order-checking is obtained, and obtains the likelihood probability of the range gene type in corresponding site on the testing gene group according to the sequencing quality mark after proofreading and correct.The method that the sequencing quality mark of each base is proofreaied and correct in wherein a kind of testing gene group that order-checking is obtained is to proofread and correct the sequencing quality mark by four-matrix, obtains likelihood probability, and its detailed process is as follows:
A, according to unique order-checking fragment of comparing with reference to genome sequence, add up under specific sequencing quality mark and the sequencing sequence coordinate, mispairing ratio between per two kinds of bases, with of the estimation of this mispairing ratio as the mispairing probability, be recorded in the four-dimensional probability matrix, as the basis of parameters in the statistical models.
Wherein the dimension of four-dimensional probability matrix is respectively sequence coordinate, original base and the observed base on sequencing quality mark, the order-checking fragment.Add up under specific sequencing quality mark and the sequencing sequence coordinate, mispairing ratio between per two kinds of bases is meant the order-checking fragment of comparing on the reference sequences according to unique, statistics is observed the probability of order-checking base from original base under the sequence coordinate on sequencing quality mark, the order-checking fragment.
In embodiments of the present invention, when the order-checking fragment is compared the position that has minimum base mispairing on the reference sequences, think optimum matching, if an order-checking fragment lists at the reference genome sequence and has only a best match position, then thinking order-checking, fragment is unique compares with reference to genome sequence, if the order-checking fragment has listed a plurality of best match position at the reference genome sequence, think that then the order-checking fragment repeats to compare with reference to genome sequence.
In order to estimate accuracy and the uniqueness that the order-checking fragment covers, with the mankind with reference to No. 12 chromosomes of genome as the reference sequence, the order-checking fragment of simulation different length (SNP and the order-checking that have comprised 0.001 ratio are wrong) is compared the order-checking fragment of these simulations in the full genome reference sequences of the mankind then again.From the order-checking fragment of these simulations, can calculate number percent with unique comparison order-checking fragment.
For the order-checking fragment in the unique comparison of unidirectional order-checking, the ratio of its unique comparison changes greatly when 15bp rises to 25bp in order-checking length, and when increasing order-checking length more afterwards, the ratio of unique comparison only has variation seldom.For length is the order-checking fragment of 25bp, has 78.6% order-checking fragment can compare uniquely on the full genome reference sequences of the mankind.For length is the order-checking fragment of 50bp, has 91.5% order-checking fragment can compare uniquely on the full genome reference sequences of the mankind.As the Illumina sequencing technologies length that typically checks order is 35bp, in comparison result, find, there is not wrong order-checking fragment to have 85.7% can compare uniquely on the genome, the order-checking fragment of 1 mispairing has 86.3% can compare uniquely on the genome, and the order-checking fragment of two mispairing has 85.9% can compare on the genome uniquely.A genome actual measurement of the comparison result of simulated data and Yan Di and Huang Di, two legendary rulers of remote antiquity comparison result is similar.From analyze, can find, with two-way sequencing technologies can be very big the ratio of order-checking fragment of the unique comparison of raising.Two-way sequencing technologies inserts fragment length in 100bp to 10kbp (± 10%) scope, and order-checking fragment ratio that can unique comparison increases along with the increase of inserting fragment length.Wherein, when being 200bp, fragment has in the comparison that 95.4% paired order-checking fragment can be unique inserting.
Go to compare with short sequence order-checking fragment, the SNP that order-checking fragment the inside comprises and some order-checking mistakes fragment that all might cause checking order is compared on the incorrect position.For the order-checking fragment of simulation, owing to knowing in advance original position, so can estimate to compare the content of correct order-checking fragment.Length is the unidirectional order-checking order-checking fragment of 25bp, and 2.3% 2 the wrong order-checking fragments of order-checking that have with the wrong order-checking fragment of 1 order-checking and 3.5% are failed on the correct coupling, referring to Fig. 3 c.If order-checking length is 50bp, these two ratios have dropped to 0.6% and 0.8% respectively.In like manner simulated and contained a two-way order-checking order-checking fragment that order-checking is wrong, inserted fragment, 0.4% and 0.06% order-checking fragment comparison mistake has been arranged respectively from 100bp-10kbp; Have the wrong order-checking fragment of 2 order-checkings have under the same insertion fragment 0.3% and 0.06% can not than on.Referring to Fig. 3 d.
B, for each site on the reference genome, collect the base of comparison all order-checking fragments on this site, and write down its base type, sequencing quality mark and the sequence coordinate on the order-checking fragment, from four-dimensional probability matrix, find the probability that four kinds of bases are observed the order-checking base.
The probability that c, basis are observed each independent base from each true genotype is long-pending, obtains the likelihood probability of the range gene type in corresponding site on the testing gene group.
Wherein observe the probability of all bases that cover this site from each true genotype, long-pending for the probability of observing each independent base, find and observe in the four-dimensional probability matrix that the probability of each independent base can set up from step a from each true genotype.Like this, just obtained the likelihood probability that each potential possible genotype obtains the observation base in this site.
For the monoploid genome, its true genotypic possibility has four kinds of A, T, C, G, and for the diploid gene group, its true genotypic possibility has 10 kinds, and wherein homozygous genotype is 4 kinds: AA, CC, GG, TT; 6 kinds on heterozygous genes type: AC, AG, AT, CG, CT, GT.
The Illumina sequencing technologies comes the calibrating quality mark by each order-checking round-robin data.With actual mispairing rate certain difference is arranged through the massfraction after the calibration, and this difference be along with the order-checking fragment coordinate difference and up and down the fluctuation (Fig. 4 a), in embodiments of the present invention, the mark that the correction of massfraction is remained after proofreading and correct by comparison information and original series information or through Illumina gets.
Except sequencing error impacted massfraction along with the order-checking round-robin constantly increases, order-checking instrument itself had also influenced massfraction to the detection of base.The Illumina sequencing technologies utilizes four kinds of bases that were labeled of laser radiation of two kinds of different frequencies, A, and C represents with a kind of laser, G, T represents with another kind of laser.So
Figure A20081021834300131
The wrong frequency that occurs of order-checking just is higher than other mispairing.We find
Figure A20081021834300132
The probability of mispairing exceeds 58% and 72% respectively than the situation that we simulate comparison, simultaneously
Figure A20081021834300133
The mispairing probability approximately will reduce by 36% (Fig. 4 b).For example, massfraction is in the base of 10 (theoretical error rates 0.1), can observed mispairing between order-checking fragment and the reference sequences
Figure A20081021834300134
Figure A20081021834300135
Be respectively 4.62%, 5.27%, 5.29%, 4.62%, and other mispairing type only has 1.62%~2.48%.Given this result in embodiments of the present invention, has also proofreaied and correct massfraction by the mispairing type.
The another kind of sequencing quality mark bearing calibration that the embodiment of the invention provides is to filter out known SNP site earlier, again according to unique order-checking fragment of comparing with reference to genome sequence, add up under specific sequencing quality mark and the sequencing sequence coordinate, mispairing ratio between per two kinds of bases, with of the estimation of this mispairing ratio, be recorded in the four-dimensional probability matrix as the mispairing probability.Adopt this four-dimension probability matrix to proofread and correct the sequencing quality mark again, obtain the likelihood probability of the range gene type in corresponding site on the testing gene group.
Fragment 3 ' end error rate will be far above 5 ' end because the wrong accumulation of order-checking causes checking order.When adopting the Illumina sequencing technologies that the testing gene group is checked order, can provide a sequencing quality mark, but this sequencing quality mark calculates by signal intensity, can not accurately represent the generation of error rate.In order to correct this problem, assess by result a gene order-checking of Yan Di and Huang Di, two legendary rulers of remote antiquity, in embodiments of the present invention, by before statistics mispairing rate, known SNP site is foreclosed, thus interference in correction in order to avoid SNPs to cause as much as possible.
Another sequencing quality mark bearing calibration that the embodiment of the invention provides, a plurality of identical order-checking fragments are arranged in its order-checking fragment on comparing the reference genome, and when the location is identical on the reference genome, a plurality of herein fragments may be the redundancies that the PCR reaction causes in the order-checking test, therefore after the step b of above-mentioned sequencing quality mark bearing calibration, also comprise the steps:
When article one order-checking fragment is compared with reference to genomic certain site, to from four-dimensional probability matrix, find four kinds of bases observe the probability of order-checking base long-pending carry out probability normalization after, obtain the likelihood probability of the range gene type in corresponding site on the testing gene group.
When second order-checking fragment is compared with reference to genomic above-mentioned site, after will from four-dimensional probability matrix, finding four kinds of bases and observing long-pending behind the probability * correction factor of order-checking base and carry out probability normalization, obtain the likelihood probability of the range gene type in corresponding site on the testing gene group.Wherein the value of correction factor is generally 0.9.
When N bar order-checking fragment is compared with reference to genomic above-mentioned site, then will from four-dimensional probability matrix, find probability * correction factor that four kinds of bases are observed the order-checking base (N-1)After long-pending carry out probability normalization after, obtain the likelihood probability of the range gene type in corresponding site on the testing gene group.
Because each locational candidate's allelotype D can observe out from the order-checking fragment of comparing fully with reference sequences.The possibility of supposing every kind of base type Ti is P (D|Ti), and influencing P (D|Ti) principal element has four, is respectively the position on base type, sequencing quality, the order-checking fragment and the number of times of appearance.And in embodiments of the present invention, when calculating likelihood probability, all considered the influence that above-mentioned factor detects SNP, thereby improved the accuracy that SNP detects.
Because DNA sets up in the process of library and used pcr amplification, the initial segment of the DNA of given very few number can produce a large amount of copies, therefore can obtain the dna fragmentation of a large amount of consistent length.Yet the segment of these massive duplications has a significant impact the randomness in the order-checking process, and the order-checking degree of depth in some zone is not very desirable.This also may cause various easily and the appearance of the mistake obscured of heterozygosis allele site.Particularly the damage of DNA is by the redundant sequence coverage that may bring after the pcr amplification, and these are repeatably wrong to be difficult to and pleomorphism site distinguishes.Therefore, be provided with the point penalty rule that repeats at amplification in embodiments of the present invention.If DNA library and order-checking process all are at random, the distribution of the starting point of sequence also should be to obey Poisson distribution so.The degree of depth of a gene order-checking of Yan Di and Huang Di, two legendary rulers of remote antiquity is 36 times, and using the order-checking fragment length is 35bp, and the position on about 0.39% the chromosome has the fragments of the order-checking above 6 initial; Yet this ratio should be only to have 0.07% in theory.Therefore, in embodiments of the present invention, can reduce the influence that the order-checking of common reference position fragment is arranged according to an experimental formula.As use Illumina 1MBeadChip that sample of Yan Di and Huang Di, two legendary rulers of remote antiquity is done Genotyping, and checked the site of isozygotying.In theory, meet Poisson distribution through adjusting the wrong frequency that occurs of back order-checking, referring to Fig. 5.
Test above method with individual genome 36X sequencing data of Yan Di and Huang Di, two legendary rulers of remote antiquity, on Illumina 1MBeadChip, used same dna sample that concensus sequence is carried out Genotyping.Suppose that all Genotypings all are correct, discern and owe to discern classify Genotyping and the different site of order-checking by crossing.What is called is owed identification and looked for a heterozygosis site exactly less in the allele search procedure, and identification just is meant and has looked for an incorrect base type more excessively.
In step S203, according to this likelihood probability be that the default prior probability of every kind of genotype calculates every kind of genotypic posterior probability, the genotype that posterior probability is the highest is defined as the most possible correct genotype in this site of testing gene group, obtains the concensus sequence of testing gene group.Its detailed process does not repeat them here as mentioned above.
In step S204, by in the concensus sequence that detects the testing gene group with the inconsistent site of reference genome sequence, detect the pleomorphism site in the mononucleotide in the testing gene group.
What Fig. 6 showed is the coverage of concensus sequence on reference sequences, the coverage in Illumina 1M BeadChip somatotype site and the error rate of concensus sequence.If there is not quality to filter, the coverage of whole genome will be lower than the coverage in Genotyping site.Because Genotyping site and genomic feature are differentiated.Increase the quality limitations of Q0~Q40, the coverage in Genotyping site has small minimizing: drop to 98.75% from 98.98%; But when critical value establish higher the time, the coverage in Genotyping site reduces just apparent in viewly.This can explain that the result is low much with respect to other site massfraction with the prior probability in SNP site.Cross the probability discerning and owe to discern along with the increase of quality threshold (is owed identification: 0.046%Q0~0.024%Q20, crossed identification: 0.096%Q0~0.067%Q20) and continuous minimizing.Obtaining a quality threshold according to these data is Q20, with the balance coverage with owe to discern the identification probability.In embodiments of the present invention, used five other filtration steps with removing insecure part in some consensus sequences: 1) require minimum two the order-checking fragments of monoploid, minimum 4 the order-checking fragments of amphiploid chromosome cover; 2) overall depth comprises the covering of the order-checking fragment that repeats to compare, must be less than 100; 3) repeat number that is listed on the genome of local sequence is less than 2; 4) at least one two-way order-checking fragment is supported; 5) SNPs is at interval at least more than the 5bp.
Remove in some consensus sequences after insecure part by adopting above-mentioned filtration step, for the monoploid chromosome x, we check order to the male genome that only contains an X chromosome, thus the concensus sequence of X chromosome to search with the monoploid genome be identical.There are four kinds of different possible genotype in each site.In 37933 Illumina 1M BeadChip Genotyping sites of X chromosome, 99.59% site can well be covered, and Genotyping and order-checking have 99.96% common point, sees also table 1.2.On the X chromosome of concensus sequence in the reference genome 88.07% coverage is arranged.The chromosomal region of covering not yet in effect is highly to repeat, and almost do not have can unique comparison the order-checking fragment.Y chromosome mainly is made up of repetitive sequence and reference sequences is assembled to such an extent that be not fine, so we do not discuss to it.
For the amphiploid autosome, in order to estimate the accuracy that concensus sequence and SNP detect, we contrast concensus sequence and NCBI reference sequences that all autosomes of Yan Di and Huang Di, two legendary rulers of remote antiquity assemble, find that concensus sequence has 92.25% covering at the autosome of reference sequences, on Illumina 1M BeadChip Genotyping site, 99.22% covering is arranged, wherein nearly 99.92% site is consistent, sees also table 1.2.Homozygous gene somatotype site has 0.062% quilt order-checking to think heterozygote.
Table 1.2
Figure A20081021834300171
Be the identification accuracy of assessment this method, cross at part and use traditional Sanger sequencing technologies to measure again after the SNPs that discerns adopts pcr amplification SNP.In 57 tests, 49 (86.0%) individual base types are consistent with the result that chip draws.
In sum, the SNP detection method that the embodiment of the invention provides and the result of gene typing chips have very high consistance, and for the site of crossing identification, GA sequencing technologies degree of accuracy is higher.
The above only is preferred embodiment of the present invention, not in order to restriction the present invention, all any modifications of being done within the spirit and principles in the present invention, is equal to and replaces and improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1, a kind of method for detecting single nucleotide polymorphism is characterized in that, described method comprises the steps:
The order-checking fragment that high throughput sequencing technologies is obtained is compared with reference to genome sequence and is listed;
The sequencing quality mark of each base in the testing gene group that obtains according to order-checking obtains the likelihood probability of the range gene type in corresponding site on the testing gene group;
According to described likelihood probability be the default prior probability of every kind of genotype, calculating is with reference to every kind of genotypic posterior probability on each site on the genome, and the genotype that posterior probability is the highest is defined as the most possible correct genotype in the corresponding site of testing gene group, obtains the concensus sequence of testing gene group;
With the inconsistent site of reference genome sequence, obtain the pleomorphism site in the testing gene group in the concensus sequence of detection testing gene group.
2, the method for claim 1 is characterized in that, the sequencing quality mark of each base in the described testing gene group that obtains according to order-checking, and the step of likelihood probability that obtains the range gene type in corresponding site on the testing gene group is specially:
The sequencing quality mark of each base is converted to the base mispairing rate in the testing gene group that order-checking is obtained, and long-pending according to mispairing rate to all order-checking fragment bases of comparing each site on the reference genome, obtain the likelihood probability of the range gene type in corresponding site on the testing gene group.
3, the method for claim 1, it is characterized in that, the sequencing quality mark of each base in the described testing gene group that obtains according to order-checking obtains before the step of likelihood probability of range gene type in corresponding site on the testing gene group, and described method also comprises the steps:
The sequencing quality mark of each base is proofreaied and correct in the testing gene group that order-checking is obtained.
4, method as claimed in claim 3 is characterized in that, the step that the sequencing quality mark of each base is proofreaied and correct in the described testing gene group that order-checking is obtained is specially:
According to unique order-checking fragment of comparing with reference to genome sequence, to add up under specific sequencing quality mark and the sequencing sequence coordinate, the mispairing ratio between per two kinds of bases with the estimation of described mispairing ratio as the mispairing rate, is recorded in the four-dimensional probability matrix;
For each site on the reference genome, collect the base of comparison all order-checking fragments on this site, and write down its base type, sequencing quality mark and the sequence coordinate on the order-checking fragment, from four-dimensional probability matrix, find the probability that four kinds of bases are observed the order-checking base.
5, method as claimed in claim 4, it is characterized in that, described according to unique order-checking fragment of comparing with reference to genome sequence, add up under specific sequencing quality mark and the sequencing sequence coordinate, mispairing ratio between per two kinds of bases, with of the estimation of described mispairing ratio as the mispairing rate, be recorded in before the step in the four-dimensional probability matrix, described method also comprises the steps:
Filter out known SNP site.
6, method as claimed in claim 4 is characterized in that, N identical order-checking fragment is arranged in the order-checking fragment on comparing the reference genome, and each identical order-checking fragment of described N locatees on the reference genome when identical, and described method also comprises:
When N bar order-checking fragment is compared with reference to genomic site, will from four-dimensional probability matrix, find probability * correction factor that four kinds of bases are observed the order-checking base (N-1)After long-pending carry out probability normalization after, obtain the likelihood probability of the range gene type in corresponding site on the testing gene group.
7, method as claimed in claim 4 is characterized in that, the sequencing quality mark of each base in the described testing gene group that obtains according to order-checking, and the step of likelihood probability that obtains the range gene type in corresponding site on the testing gene group is specially:
The probability of observing each independent base from four-dimensional probability matrix according to each true genotype of finding is long-pending, obtains the likelihood probability of the range gene type in corresponding site on the testing gene group.
8, as the described method of the arbitrary claim of claim 1 to 7, it is characterized in that described method also comprises the steps:
When the genotype that posterior probability is the highest was defined as the most possible correct genotype in the corresponding site of testing gene group, the probability that described genotype is correct was described genotypic posterior probability shared ratio in the every kind of genotypic posterior probability sum in corresponding site;
The probability that described genotype is correct is converted to massfraction, and by massfraction is provided with threshold values, filters out in the detected potential pleomorphism site massfraction less than the pleomorphism site of the threshold values that is provided with.
9, as the described method of the arbitrary claim of claim 1 to 7, it is characterized in that described method also comprises the steps:
Number to the order-checking fragment of supporting polymorphism base type is set threshold values, filters out the pleomorphism site of the number of the order-checking fragment of supporting polymorphism base type less than pre-set threshold value.
10, as the described method of the arbitrary claim of claim 1 to 7, it is characterized in that described method also comprises the steps:
Distance between the more detected pleomorphism site, and filter out the pleomorphism site of distance less than pre-set threshold value.
CN200810218343XA 2008-12-12 2008-12-12 Method for detecting mononucleotide polymorphism Active CN101539967B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200810218343XA CN101539967B (en) 2008-12-12 2008-12-12 Method for detecting mononucleotide polymorphism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200810218343XA CN101539967B (en) 2008-12-12 2008-12-12 Method for detecting mononucleotide polymorphism

Publications (2)

Publication Number Publication Date
CN101539967A true CN101539967A (en) 2009-09-23
CN101539967B CN101539967B (en) 2010-12-01

Family

ID=41123153

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200810218343XA Active CN101539967B (en) 2008-12-12 2008-12-12 Method for detecting mononucleotide polymorphism

Country Status (1)

Country Link
CN (1) CN101539967B (en)

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101914628A (en) * 2010-09-02 2010-12-15 深圳华大基因科技有限公司 Method and system for detecting polymorphism locus of genome target region
CN102982251A (en) * 2012-11-14 2013-03-20 北京诺禾致源生物信息科技有限公司 Processing method and device of genetic map construction
CN103020490A (en) * 2011-09-26 2013-04-03 深圳华大基因科技有限公司 Quality control locus selection method and device for sequencing of target area
CN103617256A (en) * 2013-11-29 2014-03-05 北京诺禾致源生物信息科技有限公司 Method and device for processing file needing mutation detection
CN104169927A (en) * 2012-02-28 2014-11-26 皇家飞利浦有限公司 Compact next generation sequencing database and efficient sequence processing using same
CN104424398A (en) * 2013-09-03 2015-03-18 三星Sds株式会社 System and method for base sequence alignment
WO2016078067A1 (en) * 2014-11-21 2016-05-26 深圳华大基因研究院 Individual single nucleotide polymorphisms locus genotyping method and apparatus
WO2016090585A1 (en) * 2014-12-10 2016-06-16 深圳华大基因研究院 Sequencing data processing apparatus and method
CN106503488A (en) * 2016-11-04 2017-03-15 成都鑫云解码科技有限公司 The acquisition methods and device in the mutational site of the corresponding gene of digestive system
CN106503490A (en) * 2016-11-04 2017-03-15 成都鑫云解码科技有限公司 The acquisition methods and device in the mutational site of urinary system gene corresponding with reproductive system
CN106503489A (en) * 2016-11-04 2017-03-15 成都鑫云解码科技有限公司 The acquisition methods and device in the mutational site of the corresponding gene of cardiovascular system
CN106529209A (en) * 2016-11-04 2017-03-22 成都鑫云解码科技有限公司 Method and device for acquiring gene mutation site corresponding to immune system
CN106570350A (en) * 2015-12-17 2017-04-19 复旦大学 Single nucleotide polymorphisms site parting algorithm
CN106951732A (en) * 2010-05-25 2017-07-14 加利福尼亚大学董事会 BAMBAM:The parallel comparative analysis of high-flux sequence data
CN107058489A (en) * 2017-01-10 2017-08-18 湖南农业大学 A kind of molecular labeling related to potato cold resistance and application
CN107153777A (en) * 2017-05-03 2017-09-12 武汉菲沙基因信息有限公司 A kind of method for the diplodization degree for estimating tetraploid species gene group
CN108241792A (en) * 2016-12-23 2018-07-03 深圳华大基因科技服务有限公司 A kind of method and apparatus for integrating multi-platform genotypic results
CN108629156A (en) * 2017-03-21 2018-10-09 深圳华大基因科技服务有限公司 The method, apparatus and computer readable storage medium of three generations's sequencing data error correction
CN108647495A (en) * 2018-05-21 2018-10-12 广州金域医学检验中心有限公司 Personal status relationship identification method, device, equipment and storage medium
CN108694304A (en) * 2018-05-21 2018-10-23 广州金域医学检验中心有限公司 A kind of personal status relationship identification method, device, equipment and storage medium
US10242155B2 (en) 2010-05-25 2019-03-26 The Regents Of The University Of California BAMBAM: parallel comparative analysis of high-throughput sequencing data
CN109785906A (en) * 2019-02-14 2019-05-21 北京安智因生物技术有限公司 A kind of anticoagulant medication genetic test report automatic generating method and system
CN110211631A (en) * 2018-02-07 2019-09-06 深圳先进技术研究院 A kind of whole-genome association method, system and electronic equipment
CN110570908A (en) * 2019-11-05 2019-12-13 南京医基云医疗数据研究院有限公司 Sequencing sequence polymorphic identification method and device, storage medium and electronic equipment
CN110997936A (en) * 2017-09-08 2020-04-10 深圳华大生命科学研究院 Method and device for genotyping based on low-depth genome sequencing and application of method and device
CN111344794A (en) * 2017-07-20 2020-06-26 华为技术有限公司 Apparatus and method for identifying haplotypes
CN112669903A (en) * 2020-12-29 2021-04-16 北京旌准医疗科技有限公司 HLA typing method and device based on Sanger sequencing
CN114649055A (en) * 2022-04-15 2022-06-21 北京贝瑞和康生物技术有限公司 Methods, devices, and media for detecting single nucleotide variations and indels
CN115631789A (en) * 2022-10-25 2023-01-20 哈尔滨工业大学 Pangenome-based group joint variation detection method
CN110997936B (en) * 2017-09-08 2024-05-10 深圳华大生命科学研究院 Method, device and application of genotyping based on low-depth genome sequencing

Cited By (64)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10825551B2 (en) 2010-05-25 2020-11-03 The Regents Of The University Of California Bambam: parallel comparative analysis of high-throughput sequencing data
CN106951732B (en) * 2010-05-25 2020-03-10 加利福尼亚大学董事会 Genome sequence analysis system based on computer
US11164656B2 (en) 2010-05-25 2021-11-02 The Regents Of The University Of California Bambam: parallel comparative analysis of high-throughput sequencing data
US11158397B2 (en) 2010-05-25 2021-10-26 The Regents Of The University Of California Bambam: parallel comparative analysis of high-throughput sequencing data
US10242155B2 (en) 2010-05-25 2019-03-26 The Regents Of The University Of California BAMBAM: parallel comparative analysis of high-throughput sequencing data
US10706956B2 (en) 2010-05-25 2020-07-07 The Regents Of The University Of California Bambam: parallel comparative analysis of high-throughput sequencing data
US10726945B2 (en) 2010-05-25 2020-07-28 The Regents Of The University Of California BAMBAM: parallel comparative analysis of high-throughput sequencing data
US10268800B2 (en) 2010-05-25 2019-04-23 The Regents Of The University Of California BAMBAM: parallel comparative analysis of high-throughput sequencing data
US10878937B2 (en) 2010-05-25 2020-12-29 The Regents Of The University Of California BamBam: parallel comparative analysis of high-throughput sequencing data
US10971248B2 (en) 2010-05-25 2021-04-06 The Regents Of The University Of California BamBam: parallel comparative analysis of high-throughput sequencing data
US10249384B2 (en) 2010-05-25 2019-04-02 The Regents Of The University Of California Bambam: parallel comparative analysis of high-throughput sequencing data
US11152080B2 (en) 2010-05-25 2021-10-19 The Regents Of The University Of California BAMBAM: parallel comparative analysis of high-throughput sequencing data
US11133085B2 (en) 2010-05-25 2021-09-28 The Regents Of The University Of California BAMBAM: parallel comparative analysis of high-throughput sequencing data
US10825552B2 (en) 2010-05-25 2020-11-03 The Regents Of The University Of California BAMBAM: parallel comparative analysis of high-throughput sequencing data
US10991451B2 (en) 2010-05-25 2021-04-27 The Regents Of The University Of California BamBam: parallel comparative analysis of high-throughput sequencing data
CN106951732A (en) * 2010-05-25 2017-07-14 加利福尼亚大学董事会 BAMBAM:The parallel comparative analysis of high-flux sequence data
CN101914628B (en) * 2010-09-02 2013-01-09 深圳华大基因科技有限公司 Method and system for detecting polymorphism locus of genome target region
CN101914628A (en) * 2010-09-02 2010-12-15 深圳华大基因科技有限公司 Method and system for detecting polymorphism locus of genome target region
CN103020490B (en) * 2011-09-26 2015-11-25 深圳华大基因科技服务有限公司 Quality Control site choosing method and device in the order-checking of target area
CN103020490A (en) * 2011-09-26 2013-04-03 深圳华大基因科技有限公司 Quality control locus selection method and device for sequencing of target area
CN104169927B (en) * 2012-02-28 2018-05-29 皇家飞利浦有限公司 Compact next-generation sequencing data collection and the efficient sequencing processing using it
CN104169927A (en) * 2012-02-28 2014-11-26 皇家飞利浦有限公司 Compact next generation sequencing database and efficient sequence processing using same
CN102982251A (en) * 2012-11-14 2013-03-20 北京诺禾致源生物信息科技有限公司 Processing method and device of genetic map construction
CN104424398A (en) * 2013-09-03 2015-03-18 三星Sds株式会社 System and method for base sequence alignment
CN103617256B (en) * 2013-11-29 2018-01-02 北京诺禾致源科技股份有限公司 The processing method and processing device of file needing mutation detection
CN103617256A (en) * 2013-11-29 2014-03-05 北京诺禾致源生物信息科技有限公司 Method and device for processing file needing mutation detection
CN107075565A (en) * 2014-11-21 2017-08-18 深圳华大基因研究院 Individual mononucleotide polymorphism site classifying method and device
CN107075565B (en) * 2014-11-21 2021-07-27 深圳华大生命科学研究院 Individual single nucleotide polymorphism site typing method and device
WO2016078067A1 (en) * 2014-11-21 2016-05-26 深圳华大基因研究院 Individual single nucleotide polymorphisms locus genotyping method and apparatus
CN107077538A (en) * 2014-12-10 2017-08-18 深圳华大基因研究院 Sequencing data processing unit and method
CN107077538B (en) * 2014-12-10 2020-08-07 深圳华大生命科学研究院 Sequencing data processing device and method
WO2016090585A1 (en) * 2014-12-10 2016-06-16 深圳华大基因研究院 Sequencing data processing apparatus and method
CN106570350B (en) * 2015-12-17 2019-04-05 复旦大学 Mononucleotide polymorphic site parting algorithm
CN106570350A (en) * 2015-12-17 2017-04-19 复旦大学 Single nucleotide polymorphisms site parting algorithm
CN106503488A (en) * 2016-11-04 2017-03-15 成都鑫云解码科技有限公司 The acquisition methods and device in the mutational site of the corresponding gene of digestive system
CN106503490A (en) * 2016-11-04 2017-03-15 成都鑫云解码科技有限公司 The acquisition methods and device in the mutational site of urinary system gene corresponding with reproductive system
CN106503489A (en) * 2016-11-04 2017-03-15 成都鑫云解码科技有限公司 The acquisition methods and device in the mutational site of the corresponding gene of cardiovascular system
CN106529209A (en) * 2016-11-04 2017-03-22 成都鑫云解码科技有限公司 Method and device for acquiring gene mutation site corresponding to immune system
CN108241792A (en) * 2016-12-23 2018-07-03 深圳华大基因科技服务有限公司 A kind of method and apparatus for integrating multi-platform genotypic results
CN107058489B (en) * 2017-01-10 2020-12-11 湖南农业大学 Molecular marker related to cold resistance of potatoes and application of molecular marker
CN107058489A (en) * 2017-01-10 2017-08-18 湖南农业大学 A kind of molecular labeling related to potato cold resistance and application
CN108629156A (en) * 2017-03-21 2018-10-09 深圳华大基因科技服务有限公司 The method, apparatus and computer readable storage medium of three generations's sequencing data error correction
CN108629156B (en) * 2017-03-21 2020-08-28 深圳华大基因科技服务有限公司 Method, device and computer readable storage medium for correcting error of third generation sequencing data
CN107153777B (en) * 2017-05-03 2020-01-21 武汉菲沙基因信息有限公司 Method for estimating doubling degree of tetraploid species genome
CN107153777A (en) * 2017-05-03 2017-09-12 武汉菲沙基因信息有限公司 A kind of method for the diplodization degree for estimating tetraploid species gene group
CN111344794A (en) * 2017-07-20 2020-06-26 华为技术有限公司 Apparatus and method for identifying haplotypes
CN111344794B (en) * 2017-07-20 2024-04-23 华为技术有限公司 Apparatus and method for identifying haplotypes
CN110997936B (en) * 2017-09-08 2024-05-10 深圳华大生命科学研究院 Method, device and application of genotyping based on low-depth genome sequencing
CN110997936A (en) * 2017-09-08 2020-04-10 深圳华大生命科学研究院 Method and device for genotyping based on low-depth genome sequencing and application of method and device
CN110211631B (en) * 2018-02-07 2024-02-09 深圳先进技术研究院 Whole genome association analysis method, system and electronic equipment
CN110211631A (en) * 2018-02-07 2019-09-06 深圳先进技术研究院 A kind of whole-genome association method, system and electronic equipment
CN108647495B (en) * 2018-05-21 2020-04-10 广州金域医学检验中心有限公司 Identity relationship identification method, device, equipment and storage medium
CN108694304A (en) * 2018-05-21 2018-10-23 广州金域医学检验中心有限公司 A kind of personal status relationship identification method, device, equipment and storage medium
CN108694304B (en) * 2018-05-21 2020-03-24 广州金域医学检验中心有限公司 Identity relationship identification method, device, equipment and storage medium
CN108647495A (en) * 2018-05-21 2018-10-12 广州金域医学检验中心有限公司 Personal status relationship identification method, device, equipment and storage medium
CN109785906A (en) * 2019-02-14 2019-05-21 北京安智因生物技术有限公司 A kind of anticoagulant medication genetic test report automatic generating method and system
CN109785906B (en) * 2019-02-14 2021-05-11 北京安智因生物技术有限公司 Automatic generation method of anticoagulation drug gene detection report
CN110570908B (en) * 2019-11-05 2020-03-27 南京医基云医疗数据研究院有限公司 Sequencing sequence polymorphic identification method and device, storage medium and electronic equipment
CN110570908A (en) * 2019-11-05 2019-12-13 南京医基云医疗数据研究院有限公司 Sequencing sequence polymorphic identification method and device, storage medium and electronic equipment
CN112669903B (en) * 2020-12-29 2024-04-02 北京旌准医疗科技有限公司 HLA typing method and equipment based on Sanger sequencing
CN112669903A (en) * 2020-12-29 2021-04-16 北京旌准医疗科技有限公司 HLA typing method and device based on Sanger sequencing
CN114649055A (en) * 2022-04-15 2022-06-21 北京贝瑞和康生物技术有限公司 Methods, devices, and media for detecting single nucleotide variations and indels
CN115631789A (en) * 2022-10-25 2023-01-20 哈尔滨工业大学 Pangenome-based group joint variation detection method
CN115631789B (en) * 2022-10-25 2023-08-15 哈尔滨工业大学 Group joint variation detection method based on pan genome

Also Published As

Publication number Publication date
CN101539967B (en) 2010-12-01

Similar Documents

Publication Publication Date Title
CN101539967B (en) Method for detecting mononucleotide polymorphism
CN103403182B (en) The heredity relevant to cancer or the detection of molecular distortion
CN105441432B (en) Composition and its purposes in sequencing and variation detection
Huang et al. High-throughput genotyping by whole-genome resequencing
Matthies et al. Population structure revealed by different marker types (SSR or DArT) has an impact on the results of genome-wide association mapping in European barley cultivars
US20040210400A1 (en) Analysis methods for individual genotyping
CN110459265B (en) Method for improving prediction accuracy of whole genome
CN106715711A (en) Method for determining the sequence of a probe and method for detecting genomic structural variation
Crawford et al. Assessing the accuracy and power of population genetic inference from low-pass next-generation sequencing data
CN106834474A (en) Using gene order-checking diagnosing fetal chromosomal aneuploidy
Wang Effects of genotyping errors on parentage exclusion analysis
WO2019222560A1 (en) Methods and reagents for resolving nucleic acid mixtures and mixed cell populations and associated applications
CN111091869A (en) Genetic relationship identification method using SNP as genetic marker
Ben Halim et al. Estimation of recent and ancient inbreeding in a small endogamous tunisian community through genomic runs of homozygosity
US8321138B2 (en) Method of characterizing quality of hybridized CGH arrays
US20120191364A1 (en) Methods and arrays for dna sequencing
CN117095746A (en) GBS whole genome association analysis method for buffalo
CN103184275A (en) Novel method for gene identification of rice genome
CN107194206A (en) A kind of screening technique in the chromosome abnormality site based on big data
EP1939778A2 (en) Analyzing CGH data to identify aberrations
US20080090735A1 (en) Methods and systems for removing offset bias in chemical array data
JP7446343B2 (en) Systems, computer programs and methods for determining genome ploidy
Cornforth et al. Inferences regarding the numbers and locations of QTLs under multiple-QTL models using interval mapping and composite interval mapping
Barroso et al. Predicting performance of soybean populations using genetic distances estimated with RAPD markers
Thompson et al. Association mapping of sponge cake volume in US Pacific Northwest elite soft white wheat (Triticum aestivum L.)

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Free format text: FORMER OWNER: BGI-SHENZHEN CO., LTD.

Owner name: WUHAN HUADA GENE TECHNOLOGY CO., LTD.

Free format text: FORMER OWNER: BGI-SHENZHEN

Effective date: 20110922

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 518083 SHENZHEN, GUANGDONG PROVINCE TO: 430075 WUHAN, HUBEI PROVINCE

TR01 Transfer of patent right

Effective date of registration: 20110922

Address after: 430075 No. 666, hi tech Avenue, East Lake New Technology Development Zone, Hubei, Wuhan

Patentee after: Wuhan Huada Gene Technology Co.,Ltd.

Address before: Beishan Industrial Zone Building in Yantian District of Shenzhen city of Guangdong Province in 518083

Co-patentee before: BGI-Shenzhen Co., Ltd.

Patentee before: BGI-Shenzhen

ASS Succession or assignment of patent right

Owner name: WUHAN HUADA MEDICAL TESTING INSTITUTE CO., LTD.

Free format text: FORMER OWNER: WUHAN HUADA GENE TECHNOLOGY CO., LTD.

Effective date: 20121205

C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20121205

Address after: 430075, building, B, C and D District, Wuhan hi tech Avenue, East Lake Development Zone, Wuhan, Hubei Province, B2

Patentee after: WUHAN BGI MEDICAL LABORATORY CO., LTD.

Address before: 430075 No. 666, hi tech Avenue, East Lake New Technology Development Zone, Hubei, Wuhan

Patentee before: Wuhan Huada Gene Technology Co.,Ltd.