CN110699436A - Method and system for determining whether number seven exon deletion exists in SMN1 gene of sample to be detected - Google Patents

Method and system for determining whether number seven exon deletion exists in SMN1 gene of sample to be detected Download PDF

Info

Publication number
CN110699436A
CN110699436A CN201810749278.7A CN201810749278A CN110699436A CN 110699436 A CN110699436 A CN 110699436A CN 201810749278 A CN201810749278 A CN 201810749278A CN 110699436 A CN110699436 A CN 110699436A
Authority
CN
China
Prior art keywords
gene
exon
smn1
sample
sequencing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810749278.7A
Other languages
Chinese (zh)
Other versions
CN110699436B (en
Inventor
郭凤禹
宋立洁
孙隽
王亚玲
范林林
彭智宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bgi Guangzhou Medical Laboratory Co ltd
Huada Biotechnology Wuhan Co ltd
Shenzhen Huada Medical Laboratory
Tianjin Medical Laboratory Bgi
BGI Shenzhen Co Ltd
Original Assignee
Guangzhou Huada Gene Medical Laboratory Co Ltd
Shenzhen Huada Clinical Laboratory Center
Tianjin Huada Medical Laboratory Co Ltd
BGI Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huada Gene Medical Laboratory Co Ltd, Shenzhen Huada Clinical Laboratory Center, Tianjin Huada Medical Laboratory Co Ltd, BGI Shenzhen Co Ltd filed Critical Guangzhou Huada Gene Medical Laboratory Co Ltd
Priority to CN201810749278.7A priority Critical patent/CN110699436B/en
Publication of CN110699436A publication Critical patent/CN110699436A/en
Application granted granted Critical
Publication of CN110699436B publication Critical patent/CN110699436B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6858Allele-specific amplification

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a method and a system for determining whether a SMN1 gene of a sample to be detected has a seven-exon deletion. Compared with the prior art, the sensitivity and specificity of the method and the system are obviously improved in the aspect of detecting the deletion of the exon 7 of the SMN1 gene, and heterozygous deletion samples and homozygous deletion samples can be effectively distinguished; when the proportion of normal samples (2 copies of the No. 7 exons of the SMN1 gene and the SMN2 gene) in a batch is small, the detection precision is greatly improved by selecting a control sample set.

Description

Method and system for determining whether number seven exon deletion exists in SMN1 gene of sample to be detected
Technical Field
The invention relates to the field of biological information, in particular to a method and a system for determining whether a seven-exon deletion exists in SMN1 gene of a sample to be detected.
Background
Spinal Muscular Atrophy (SMA) is an autosomal recessive genetic disease, which is a group of diseases that can start in infancy, childhood or juvenile, and is characterized by skeletal muscle atrophy caused by progressive degeneration of anterior keratinocytes of the spinal cord and motor nuclei in the brain stem, without affecting the intelligence of patients. The clinical manifestations are progressive symmetric muscle weakness and muscular atrophy of lower motor neurons, with the proximal end heavier than the distal end and the lower limbs heavier than the upper limbs. The onset of the disease is 1/6000-1/10000, the second place of the lethal autosomal recessive genetic disease is located, and no effective treatment method exists at present. The carrying rate in China is 1/62, and the carrying rate in the world is 1/30-1/40.
Two highly homologous motor neuron survival genes: the SMN1 gene and SMN2 gene, which are believed to be associated with spinal muscular dystrophy, are up to 99% similar. The SMN1 gene is the main determinant of the function, the loss or mutation of homology causes spinal muscular atrophy, and the copy number of the SMN2 gene is related to the severity of the onset of the disease. The SMN1 gene contains 9 exons (exon 1,2a,2b, and 3-8), encodes 294 amino acid residues, and constitutes a 38kD SMN protein whose function is not completely understood, but is essential for normal motor neurons. Due to the defective splicing pattern of the SMN2 gene, most of the pre mRNA produced by the SMN2 gene can be spliced variably to produce a truncated, functionally deleted protein. Of the products of SMN2, only 15% were normal, functional SMN proteins. At present, the situation of SMA molecular diagnosis shows that about 95 to 98 percent of spinal muscular atrophy affected individuals are homozygous mutations of deletion or truncation of exon 7 of SMN1 gene. Therefore, the quick and simple gene diagnosis of the patient with the spinal muscular atrophy can be clinically carried out by detecting the deletion condition of the exon 7 of the SMN1 gene.
The currently mainstream gene detection method is target region capture + high throughput sequencing (NGS), and the technology can realize the simultaneous capture and sequencing of multiple genes of multiple samples, which can effectively reduce the cost and the delivery cycle for screening products containing multiple genes and diseases. Spinal muscular atrophy is taken as a high-carrying-rate disease, and the screening guide of an expansibility carrier clearly indicates that the spinal muscular atrophy is a disease needing to be screened. High throughput sequencing data using the SMN1 gene and SMN2 gene present certain difficulties due to the high homology of the two genes. At present, screening products launched by a plurality of companies adopt supplementary methods, such as quantitative PCR (polymerase chain reaction) or Multiple Ligation Probe Amplification (MLPA) to detect the spinal muscular atrophy, and the like, so that the cost of the screening products is additionally increased. Although there are some reports in the literature on methods for detecting deletion of exon 7 of SMN1 gene by using target region capture and high throughput sequencing, the methods are not applied to mature products in the market due to the defects of the literature methods.
Disclosure of Invention
The present application is based on the discovery and recognition by the inventors of the following facts and problems:
an article entitled "deletion Of A High Resolution NGS method for Detecting Spinal molecular vectors amplitude Phase 3 markers Inthe 1000 genes Project" published by Larson et al in 2015, which proposed a method for Detecting exon 7 deletion in SMN1 using High-throughput sequencing data (Larson et al BMCMedial Genetics (2015)16:100.DOI 10.1186/s 12881-015-0246-2). In the article, all samples in a batch are used for selecting a control gene set, and the weight average value (hereinafter referred to as scaling coefficient) of the SMN genes relative to the control gene set is calculated. The objective is to determine the copy number of each sample by taking the average expression of the depths of all samples in the batch on the SMN1, SMN2 gene and control gene sets as the expression of normal samples. However, the inventor finds that the article is considered based on the condition that normal samples (samples with 2 copies of SMN1 and SMN 2) account for most of screened samples, and for abnormal conditions (namely, when the copy numbers of SMN1 and SMN2 are missing or repeated samples account for a large proportion), the risk of error detection by using the method of the article is high
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art. To address the above problem, the present application introduces the concept of controlling the sample set, i.e. first selecting a sample with 2 copies of both SMN1 and SMN2 as reference. When the historical detection sample data under the same experimental conditions exist, the samples with 2 copies of both SMN1 and SMN2 can be directly selected. Otherwise, samples with the ratio of the depth of SMN1 to the depth of SMN1 plus SMN2 (calculated by the number of reads covering specific sites) between 0.43 and 0.57 are selected as the control sample set in the batch. Through testing, the introduction of the method can effectively solve the problems that when the total number of samples in a batch is small or the number of positive samples is too large, a lot of gray area samples (samples which cannot be judged) exist in a detection result, or false negative and false positive samples appear.
In a first aspect of the invention, the invention provides a method for determining whether an exon seven deletion exists in the SMN1 gene of a sample to be tested. According to an embodiment of the invention, the method comprises: (1) sequencing a plurality of nucleic acid samples from a total sample set respectively, wherein the total sample set comprises a sample to be tested and at least one control sample, the plurality of nucleic acid samples all contain SMN coding genes and at least one control gene, and the SMN coding genes comprise: coding sequence of seven exon of SMN 1; coding sequence of seven exon of SMN 2; coding sequences adjacent to the left and right of exon seven of SMN 1; and coding sequences adjacent to the left and right of exon seven of SMN 2; (2) selecting, for each sample in the total sample set, based on the sequencing results of step (1), sequencing reads derived from the SMN-encoding gene and the at least one control gene; (3) determining an SMN1 exon seven parameter for the sample to be tested, the SMN1 exon seven parameter positively correlating to the number of sequencing reads derived from the SMN1 exon seven coding sequence; (4) correcting the SMN1 exon seven parameter for the test sample based on sequencing reads of the at least one control gene in the at least one control sample; (5) predicting, based on the corrected exon seven parameter of SMN1, a probability that a sequenced read derived from the SMN-encoding gene will be assigned to an SMN1 exon seven coding sequence; and (6) determining whether the SMN1 gene of the sample to be detected has seven exon deletions or not based on the probability. According to the method provided by the embodiment of the invention, on the aspect of detecting the deletion of the exon 7 of the SMN1 gene, the sensitivity and specificity are obviously improved compared with those of the prior art, and heterozygous deletion samples and homozygous deletion samples can be effectively distinguished; when the proportion of normal samples (2 copies of the No. 7 exons of the SMN1 gene and the SMN2 gene) in a batch is small, the detection precision is greatly improved by selecting a control sample set.
According to an embodiment of the present invention, the method may further include at least one of the following additional technical features:
according to an embodiment of the invention, the SMN1 exon seven coding sequence comprises a first mutation site located at chr5: 70247773; the coding sequence adjacent to the left side of the SMN1 seventh exon comprises a second mutation site, and the second mutation is located in chr5: 70247724; the coding sequence adjacent to the right side of the SMN1 exon seven comprises a third mutation site, and the third mutation is located in chr5: 70247921; the coding sequence of the SMN2 exon seven comprises a fourth mutation site, and the fourth mutation site is located in chr5: 69372353; the coding sequence adjacent to the left side of the SMN2 seventh exon comprises a fifth mutation site, and the fifth mutation site is located in chr5: 69372304; and the coding sequence adjacent to the right of the SMN2 exon seven comprises a sixth mutation site, and the sixth mutation is located in chr5: 69372501.
According to an embodiment of the present invention, in step (3), the parameter of the exon seven of SMN1 is determined for the sample to be tested by the following steps: (3-1) determining the number of the sequencing reads carrying the first to sixth mutation sites, respectively, based on the sequencing result of the sample to be tested; (3-2) determining a first to third ratio based on the number of the sequencing reads of the first to sixth mutation sites obtained in step (3-1), wherein the first ratio y is B/B, wherein B represents the number of sequencing reads carrying the first mutation site, B represents the number of sequencing reads from the sequence carrying the first or fourth mutation site, and the second ratio x is A/a, wherein A represents the number of said sequencing reads carrying said second mutation site, a represents the number of said sequencing reads from carrying said second or fifth mutation site, and said third ratio x ═ M/M, wherein M represents the number of sequencing reads carrying the third mutation site and M represents the number of sequencing reads from the third or sixth mutation site; (3-3) determining parameters R and R according to the following formulas based on the first to third ratios, wherein R constitutes the SMN1 exon seven parameter: when at least one of a difference absolute value of the first ratio and the second ratio and a difference absolute value of the first ratio and the third ratio exceeds 0.1, R is B and R is B; when the absolute value of the difference between the first ratio and the second ratio and the absolute value of the difference between the first ratio and the third ratio do not exceed 0.1, R is a + B + M, and R is a + B + M.
According to an embodiment of the invention, the method further comprises: and (3-4) determining whether the sample to be detected is qualified or not based on the parameter r.
According to an embodiment of the present invention, the parameter r is less than 200, which is an indication that the sample to be tested is not qualified.
According to a specific embodiment of the present invention, the method further comprises: (3-5a) determining a fourth ratio q based on the parameters R and R, wherein the fourth ratio q is R/R; (3-5b) judging whether the control sample is qualified, wherein the fourth proportion q is an indication that the control sample is qualified within the range of 0.43-0.57; or preliminarily determining that the SMN1 gene of the sample to be detected has no deletion of the seven exons based on the fourth proportion q within the range of 0.43-0.57.
According to an embodiment of the invention, said at least one control gene is determined by: (a) selecting a plurality of candidates based on the sequencing results of the at least one control sampleSelecting genes, the sequencing depth of the plurality of candidate genes in at least a portion of the control sample being above a predetermined threshold; (b) calculating a fifth ratio z in each of the at least one control sample for each of the plurality of candidate genes, respectivelyk,i=si/Hk,iWherein k represents a candidate gene number, i represents the number of the sample, and siIndicates the sequencing depth of the SMN gene in sample i, Hk,jRepresenting the sequencing depth of the candidate gene No. k in the sample No. i; and (c) determining the at least one control gene based on the fifth ratio.
According to an embodiment of the invention, in step (c), the control gene meets at least one of the following criteria: (c-1) between the at least one control samples, the coefficient of variation of the sequencing depth of the control gene is the smallest first 10 bits; and (c-2) between the at least one control sample, the coefficient of variation of the fifth ratio is the first 10 smallest bits.
According to an embodiment of the invention, the predetermined threshold is determined by: sequencing depths of at least one part of all genes of the sample are arranged from small to large according to the sequencing result of the at least one control sample; and determining the predetermined threshold value based on the arrangement result, wherein the threshold value is not less than the sequencing depth corresponding to the genes at 5% of positions.
According to a specific embodiment of the invention, the threshold is the sequencing depth corresponding to the genes at 5% of the positions.
According to an embodiment of the invention, the sequencing depth of the candidate gene is greater than the predetermined threshold in at least 90% of the at least one control sample. In other words, the candidate gene is sequenced to a depth that is no greater than 10% of the predetermined threshold in the at least one control sample.
According to an embodiment of the present invention, in step (4), the correction is performed by multiplying the parameter of exon seven of SMN1 by a correction factor, wherein the correction factor is determined by the following formula:
wherein Z iskRepresenting a ratio of a sequencing depth of an SMN gene to a sequencing depth of the kth number gene in the sample to be tested, K representing a total number of the control genes in the set of control genes,
Figure BDA0001725135680000042
represents an average of said fifth ratio of the k-th numbered gene in said control sample set.
According to yet another embodiment of the present invention,
Figure BDA0001725135680000051
is determined by the following formula:
Figure BDA0001725135680000052
where N represents the total number of samples in the control sample gene, i represents the sample number, and k represents the gene number.
According to an embodiment of the invention, when passing through the formulaWhen the calculated value exceeds 1.5, the correction factor is selected to be 1.5. The inventors found that in the prior art, θiThe maximum value is set to 1, but the ratio of the gray area sample to the false positive sample is increased. According to a particular embodiment of the invention, θiThe maximum value of (2) is set to be 1.5, so that the proportion of the gray area sample and the false positive sample can be obviously reduced while no false negative result is generated.
According to an embodiment of the invention, in step (5), the number of sequencing reads of the coding sequence of exon seven corresponding to the corrected exon seven parameter of SMN1 is subjected to a binomial distribution, and a Bayesian model is used to calculate the probability p that sequencing reads derived from the SMN coding gene belong to the coding sequence of exon seven SMN1i. In particular, of corrected exon seven coding sequencesThe number of sequencing reads obeys a binomial distribution Ri'=θiRi~Bin(ri,pi) Where p isiIndicates the probability that the number of aligned SMN sequencing reads actually came from SMN1, Ri' number of sequencing reads representing corrected exon seven coding sequence of SMN1, RiIndicates the number of sequencing reads of the exon seven coding sequence of SMN1 uncorrected, and ri indicates the total number of sequencing reads aligned to exon 7 of the SMN gene. Since the Beta distribution is a density function of the conjugate prior distribution of the binomial distribution, assume the prior distribution piBeta (1,1), the posterior distribution is pi~Beta(1+Ri',1+ri)。
According to a further embodiment of the invention, in step (6), based on said pi95% confidence interval [ a ', b']Determining whether the SMN1 gene of the sample to be detected has seven exon deletions, wherein a 'more than 0.38 is an indication that the SMN1 gene of the sample to be detected does not have seven exon deletions, and b' less than 0.38 is an indication that the SMN1 gene of the sample to be detected has seven exon deletions; and a 'is not more than 0.38, and b' is not less than 0.38, so that whether the seven exon deletion exists in the sample to be detected can not be judged. Specifically, consider the case where SMN1 is 1 copy and SMN2 is 2 copies, when piThe theoretical value of (c) should be 1/3 (the remaining deletion cases are all less than 1/3); a class error of 0.05 is allowed, and a threshold of 0.38 is finally set; further through pi95% confidence interval [ a, b%]The relation with 0.38 judges that the exon 7 of SMN1 is deleted, namely when a'>0.38, negative (absence of exon seven deletion); b'<0.38, positive (indicating the presence of exon seven deletion); a'<0.38 and 0.38<When b', the region is gray, which cannot be judged.
According to an embodiment of the present invention, when there is exon seven deletion in SMN1 gene of the test sample, the method further comprises the step of determining whether there is exon seven deletion by the formula
Figure BDA0001725135680000061
Determining the copy number of the SMN1 gene in the sample to be tested by a formulaDetermining the copy number of the SMN2 gene in the sample to be tested, wherein c1,iOr c2,iNot more than 0.1, is an indication that the SMN1 gene or the SMN2 gene copy number is 0, c1,iOr c2,iGreater than 0.1 but less than 0.5 is an indication that the SMN1 gene or SMN2 gene copy number is between 0 and 1, c1,iOr c2,iNot less than 0.5 but less than 1.485 is an indication that the SMN1 gene or the SMN2 gene copy number is 1, c1,iOr c2,iNot less than 1.485 but less than 2.324, is an indication that the SMN1 gene or the SMN2 gene copy number is 2, c1,iOr c2,iNot less than 2.324 but less than 2.743 is an indication that the SMN1 gene or SMN2 gene copy number is between 2 and 3, c1,iOr c2,iNot less than 2.743 indicates that the SMN1 gene or the SMN2 gene copy number is not less than 3.
According to an embodiment of the invention, a SMN1 gene copy number of 0 is an indication of homozygous deletion of exon 7 of SMN1 gene; an SMN1 gene copy number of not less than 1 is an indication of heterozygous deletion of exon 7 of SMN1 gene; the copy number of the SMN1 gene is 0-1, which is an indication of deletion of the No. 7 exon gray region of the SMN1 gene.
Compared with the prior art that whether SMN1 is deleted or not is judged, but the defects of copy numbers of SMN1 and SMN2 cannot be judged, according to the method provided by the embodiment of the invention, the copy numbers of SMN1 and SMN2 can be calculated, heterozygous deletion samples and homozygous deletion samples can be distinguished, the copy numbers of SMN1 and SMN2 in a population can be further counted, and a foundation is laid for researching an autosomal recessive genetic disease such as Spinal Muscular Atrophy (SMA).
In a second aspect, the invention provides a system for determining whether there is an exon seven deletion in the SMN1 gene in a test sample. According to an embodiment of the invention, the system comprises: a sequencing device for sequencing a plurality of nucleic acid samples from a total sample set, respectively, the total sample set including a sample to be tested and at least one control sample, the plurality of nucleic acid samples each containing an SMN-encoding gene and at least one control gene, the SMN-encoding gene comprising: coding sequence of seven exon of SMN 1; coding sequence of seven exon of SMN 2; coding sequences adjacent to the left and right of exon seven of SMN 1; and coding sequences adjacent to the left and right of exon seven of SMN 2; means for selecting SMN-encoding genes and control genes, said means for selecting SMN-encoding genes and control genes being connected to said sequencing means for selecting, for each sample in said total sample set, a sequencing read derived from said SMN-encoding genes and said at least one control gene based on the sequencing results of said sequencing means; a device for determining parameters of the seven exon of SMN1, wherein the device for determining parameters of the seven exon of SMN1 is connected with the device for selecting the SMN coding gene and the control gene, and is used for determining parameters of the seven exon of SMN1 aiming at the sample to be tested, and the parameters of the seven exon of SMN1 are positively correlated with the number of sequencing reads derived from the coding sequence of the seven exon of SMN 1; a correcting device connected with the SMN1 exon seven parameter determining device and used for correcting the SMN1 exon seven parameter based on the sequencing reading of the at least one control gene in the at least one control sample aiming at the sample to be tested; a prediction attribution device connected with the correcting device and used for predicting the probability that the sequencing reading segment derived from the SMN coding gene belongs to the SMN1 exon seven coding sequence based on the corrected SMN1 exon seven parameter; and the determining device is connected with the prediction attribution device and is used for determining whether the SMN1 gene of the sample to be detected has seven exon deletions or not based on the probability. According to the system disclosed by the embodiment of the invention, on the aspect of detecting the deletion of the exon 7 of the SMN1 gene, the sensitivity and specificity are obviously improved compared with those of the prior art, and heterozygous deletion samples and homozygous deletion samples can be effectively distinguished; when the proportion of normal samples (2 copies of the No. 7 exons of the SMN1 gene and the SMN2 gene) in a batch is small, the detection precision is greatly improved by selecting a control sample set.
The system according to the embodiment of the present invention is suitable for performing the method for determining whether there is an exon seven deletion in the SMN1 gene of the sample to be tested according to the embodiment of the present invention, and the advantages, effects and additional technical features of the method are as described above and will not be described herein again.
Drawings
FIG. 1 is a schematic structural diagram of a system for determining whether there is an exon seven deletion in the SMN1 gene of a sample to be tested according to an embodiment of the present invention; and
FIG. 2 is a schematic structural diagram of a system for determining whether there is an exon seven deletion in the SMN1 gene of a test sample according to another embodiment of the present invention.
Detailed Description
The system for determining whether the deletion of the seven exon in the SMN1 gene of the sample to be tested exists according to the embodiment of the invention will be described in further detail with reference to the attached drawings. It is to be understood that the embodiments described below in conjunction with the appended drawings are exemplary and intended to illustrate the present invention, and are not to be construed as limiting the present invention.
The invention provides a system for determining whether a SMN1 gene of a sample to be detected has seven exon deletion. According to an embodiment of the invention, with reference to fig. 1, the system comprises:
a sequencing apparatus 100 for sequencing a plurality of nucleic acid samples from a total sample set, respectively, the total sample set including a sample to be tested and at least one control sample, the plurality of nucleic acid samples each containing an SMN-encoding gene and at least one control gene, the SMN-encoding gene comprising:
an SMN1 seven exon coding sequence, wherein the SMN1 seven exon coding sequence comprises a first mutation site, and the first mutation site is located in chr5: 70247773;
an SMN2 seven exon coding sequence, wherein the SMN2 seven exon coding sequence comprises a fourth mutation site, and the fourth mutation site is located in chr5: 69372353;
left and right adjacent coding sequences of the seven exon of SMN1, wherein the left adjacent coding sequence of the seven exon of SMN1 comprises a second mutation site, the second mutation is located at chr5:70247724, the right adjacent coding sequence of the seven exon of SMN1 comprises a third mutation site, and the third mutation is located at chr5: 70247921; and
left and right adjacent coding sequences of the seven exon of SMN2, wherein the left adjacent coding sequence of the seven exon of SMN2 comprises a fifth mutation site, the fifth mutation is located at chr5:69372304, the right adjacent coding sequence of the seven exon of SMN2 comprises a sixth mutation site, and the sixth mutation is located at chr5: 69372501;
a means 200 for selecting SMN-encoding and control genes, said means 200 for selecting SMN-encoding and control genes being connected to said sequencing means 100 for selecting, for each sample in said total set of samples, sequencing reads derived from said SMN-encoding gene and said at least one control gene based on the sequencing results of said sequencing means;
a means 300 for determining the parameters of the seven exon of SMN1, said means 300 for determining the parameters of the seven exon of SMN1 being connected 200 to said means for selecting an SMN coding gene and a control gene, for determining the parameters of the seven exon of SMN1 for said sample to be tested, said parameters of the seven exon of SMN1 being positively correlated with the number of sequencing reads derived from the coding sequence of the seven exon of SMN1,
the means 300 for determining the parameters of exon seven of SMN1 is adapted to perform the following operations:
(3-1) determining the number of the sequencing reads carrying the first to sixth mutation sites, respectively, based on the sequencing result of the sample to be tested;
(3-2) determining a first to third ratio based on the number of the sequencing reads of the first to sixth mutation sites obtained in step (3-1), wherein,
(ii) the first ratio y ═ B/B, wherein B represents the number of sequencing reads carrying the first mutation site, B represents the number of sequencing reads from carrying the first or fourth mutation site,
(ii) the second ratio x ═ a/a, where a represents the number of sequencing reads carrying the second mutation site, a represents the number of sequencing reads from carrying the second or fifth mutation site, and
(iii) the third ratio x is M/M, wherein M represents the number of sequencing reads carrying the third mutation site and M represents the number of sequencing reads from the third or sixth mutation site;
(3-3) determining parameters R and R according to the following formulas based on the first to third ratios, wherein R constitutes the SMN1 exon seven parameter:
when at least one of a difference absolute value of the first ratio and the second ratio and a difference absolute value of the first ratio and the third ratio exceeds 0.1, R is B and R is B;
when the absolute value of the difference between the first ratio and the second ratio and the absolute value of the difference between the first ratio and the third ratio do not exceed 0.1, R is a + B + M;
(3-4) determining whether the sample to be detected is qualified or not based on the parameter r, wherein the parameter r is smaller than 200 and is an indication that the sample to be detected is unqualified;
(3-5a) determining a fourth ratio q based on the parameters R and R, wherein the fourth ratio q is R/R;
(3-5b) judging whether the control sample is qualified, wherein the fourth proportion q is an indication that the control sample is qualified within the range of 0.43-0.57; or
Preliminarily determining that the SMN1 gene of the sample to be detected has no deletion of the seven exons based on the fourth proportion q within the range of 0.43-0.57;
a correction device 400, said correction device 400 connected to said SMN1 exon seven parameter determining device 300, for correcting said SMN1 exon seven parameter based on the sequencing reads of said at least one control gene in said at least one control sample for said sample to be tested;
a prediction attribution device 500, wherein the prediction attribution device 500 is connected with the correcting device 400 and is used for predicting the probability that the sequencing reading segment derived from the SMN coding gene is attributed to the SMN1 exon seven coding sequence based on the corrected SMN1 exon seven parameter,
the prediction homing device 500 is adapted to perform the following operations: the number of sequencing reads of the seven-exon coding sequence corresponding to the corrected parameters of the seven-exon of the SMN1 is subjected to binomial distribution, and a Bayesian model is used for calculating the probability that the sequencing reads derived from the SMN coding gene belong to the SMN1 seven-exon coding sequence; and
a determining device 600, wherein the determining device 600 is connected to the prediction attribution device 500, and is used for determining whether the SMN1 gene of the sample to be tested has the seven exon deletion or not based on the probability,
the determination means 600 is adapted to perform the following operations:
based on said pi95% confidence interval [ a ', b']Determining whether the SMN1 gene of the sample to be tested has seven exon deletion,
wherein a 'is greater than 0.38 and is an indication that no exon seven deletion exists in the SMN1 gene of the sample to be detected, and b' is less than 0.38 and is an indication that no exon seven deletion exists in the SMN1 gene of the sample to be detected; and a 'is not more than 0.38, and b' is not less than 0.38, so that whether the seven exon deletion exists in the sample to be detected can not be judged.
According to a particular embodiment of the invention, said at least one control gene is determined by:
(a) selecting a plurality of candidate genes based on sequencing results of the at least one control sample, the plurality of candidate genes having a sequencing depth in at least a portion of the control sample above a predetermined threshold;
(b) calculating a fifth ratio z in each of the at least one control sample for each of the plurality of candidate genes, respectivelyk,i=si/Hk,iWherein k represents a candidate gene number, i represents the number of the sample, and siIndicates the sequencing depth of the SMN gene in sample i, Hk,jRepresenting the sequencing depth of the candidate gene No. k in the sample No. i; and
(c) based on the fifth ratio, determining the at least one control gene that satisfies at least one of the following criteria:
(c-1) between the at least one control samples, the coefficient of variation of the sequencing depth of the control gene is the smallest first 10 bits; and
(c-2) between the at least one control sample, the coefficient of variation of the fifth ratio is the first 10 smallest bits.
According to a specific embodiment of the present invention, the predetermined threshold is determined by: sequencing depths of at least one part of all genes of the sample are arranged from small to large according to the sequencing result of the at least one control sample; and determining the predetermined threshold based on the arrangement result, wherein the threshold is not less than the sequencing depth corresponding to the genes at 5% of the positions;
according to a further embodiment of the invention, the threshold is the sequencing depth corresponding to genes at 5% of the positions.
According to a specific embodiment of the invention, the sequencing depth of the candidate gene is greater than the predetermined threshold in at least 90% of the at least one control sample. In other words, the candidate gene is sequenced to a depth that is no greater than 10% of the predetermined threshold in the at least one control sample.
According to a specific embodiment of the present invention, the correction device is adapted to perform the following operations, wherein the correction is performed by multiplying the parameter of exon seven of SMN1 by a correction coefficient, wherein the correction coefficient is determined by the following formula:
Figure BDA0001725135680000101
wherein the content of the first and second substances,
Zkrepresenting the ratio of the sequencing depth of the SMN gene to the sequencing depth of the kth gene in the sample to be tested,
k represents the total number of the control genes in the set of control genes,
to represent(vi) the average of said fifth ratio of kth numbered gene in said control sample set.
In accordance with a particular embodiment of the present invention,
Figure BDA0001725135680000103
is determined by the following formula:
Figure BDA0001725135680000104
n represents the total number of samples in the control sample gene, i represents the sample number, and k represents the gene number.
According to an embodiment of the present invention, when the formula is passed
Figure BDA0001725135680000105
When the calculated value exceeds 1.5, the correction factor is selected to be 1.5.
According to still another embodiment of the present invention, referring to fig. 2, the system further comprises a copy number determining device 700 for SMN1 gene, the copy number determining device 700 for SMN1 gene is connected to the determining device 600, the copy number determining device 700 for SMN1 gene is adapted to perform the following operations:
when the SMN1 gene of the sample to be tested has the seven exon deletion,
by the formula
Figure BDA0001725135680000106
Determining the copy number of the SMN1 gene in the sample to be tested by a formula
Figure BDA0001725135680000107
Determining the copy number of the SMN2 gene in the sample to be tested,
wherein, c1,iOr c2,iNot more than 0.1, is an indication that the SMN1 gene or the SMN2 gene copy number is 0,
c1,ior c2,iGreater than 0.1 but less than 0.5 is an indication that the SMN1 gene or SMN2 gene copy number is between 0 and 1,
c1,ior c2,iNot less than 0.5 but less than 1.485, is an indication that the SMN1 gene or the SMN2 gene copy number is 1,
c1,ior c2,iNot less than 1.485 but less than 2.324, is an indication that the SMN1 gene or the SMN2 gene copy number is 2,
c1,ior c2,iNot less than 2.324 but less than 2.743 is an indication that the SMN1 gene or SMN2 gene copy number is between 2 and 3,
c1,ior c2,iNot less than 2.743 indicates that the SMN1 gene or the SMN2 gene copy number is not less than 3.
According to a specific embodiment of the invention, a SMN1 gene copy number of 0 is an indication of homozygous deletion of exon 7 of SMN1 gene; an SMN1 gene copy number of not less than 1 is an indication of heterozygous deletion of exon 7 of SMN1 gene; the copy number of the SMN1 gene is 0-1, which is an indication of deletion of the No. 7 exon gray region of the SMN1 gene.
The system provided by the embodiment of the invention is suitable for executing the method for determining whether the SMN1 gene of the sample to be detected has the exon seven deletion or not according to the embodiment of the invention, the sensitivity and the specificity of the detection on the exon 7 deletion of the SMN1 gene are obviously improved compared with the prior art, and the heterozygous deletion sample and the homozygous deletion sample can be effectively distinguished; when the proportion of normal samples (2 copies of the No. 7 exons of the SMN1 gene and the SMN2 gene) in a batch is small, the detection precision is greatly improved by selecting a control sample set.
The method for determining whether the SMN1 gene of the sample to be tested has seven exon deletions is further illustrated by the following specific examples. The following described embodiments are exemplary and are intended to be illustrative of the invention and are not to be construed as limiting the invention.
The analysis flow for estimating the copy number of the SMN1/2 gene exon 7 by applying NGS off-machine data is as follows:
example 1 raw offline data processing
And step A, applying filter software (SOAPnuke software, version: 1.5.2) to off-line data to filter out low-quality sequencing reads (reads).
Step B, reads were aligned to the human reference genome (HG19) using alignment software (BWA software, version: 0.7.12), repeated sequences were labeled using Picard (a basic sequence processing software, version: 1.87), and base quality value correction and re-alignment were performed using GATK (a software for second generation re-sequencing data analysis, version: 3.2).
And step C, acquiring the reads coverage information of the capture area by using a DepthoCoverage tool of the GATK. The site reads number information file and the area average coverage file (suffix: sample _ interval _ summary file) need to be output.
Example 2 estimation of exon 7 copy number of SMN1/2 Gene
Step A, calculating the numbers of reads aligned to the SMN1 gene and the SMN2 gene. The SMN1 gene and the SMN2 gene have only 1 site difference in the No. 7 exon region sequence (chr 5:70247773 on SMN 1; chr5:69372353 on SMN2, denoted as site b), and the introns on both sides of the No. 7 exon have 1 site difference respectively (chr 5:70247724 on SMN1 gene and chr5:69372304 on SMN2 gene are denoted as site a, and chr5:70247921 on SMN1 gene and chr5:69372501 on SMN2 gene are denoted as site c). Assuming that the batch has N samples, calculating the ratio y of the number of reads of the SMN1 gene at the site b to the total number of reads of the SMNi=Bi/biIn which B isiThe number of reads aligned to site b of SMN1 gene in the ith sample, biThe SMN total reads number is compared. Also for positions a, c, calculate xi=Ai/ai、zi=Ci/ci. Note RiTo align the number of reads for exon 7 of SMN1, riTo align the total number of reads to exon 7 of the SMN gene, | xi-yi|>0.1 or | zi-yi|>0.1, then Ri=Bi,ri=bi(ii) a Otherwise Ri=Ai+Bi+Ci,ri=ai+bi+ci. Calculating the proportion pi of reads aligned to SMN1 to total reads of SMNi=Ri/ri. In addition, r is seti<The sample of 200 is a failed sample.
And B, controlling the selection of the sample set. Selecting pi of 0.43 or less in batchesiSamples ≦ 0.57 (typically samples with 2 copies of each of exon 7 of SMN1 and SMN2 genes) were used as the control sample set. If historical data of the same experimental conditions exist, samples with 2 copies of detection results of the No. 7 exon of the SMN1 gene and SMN2 gene can be selected as a control sample set.
And C, controlling the selection of the gene set. In genes of sufficient depth (below the 5 th percentile of all genes in at least 10% of the samples), z is calculatedk,i=si/Hk,iWherein s isiAverage depth of SMN Gene, Hk,iThe mean depth of the kth gene. Selecting the control gene set according to the following two conditions (1) with small average depth change in the control sample set (2) and z in the control sample setk,iThe variation is small. Wherein the degree of change is measured using the coefficient of variation cv. The first 10 genes with smaller cv values under two conditions are selected, and the union set is taken as a control gene set.
Step D, calculating the weight average value of the SMN genes relative to the control gene setWherein K represents the number of genes in the control gene set,
Figure BDA0001725135680000122
as a baseline for the normal sample at the kth gene. E.g. thetai>1.5,θiStill the output was 1.5. Calculation of Ri'=Ri×θiThe number of reads of the SMN1 gene after scaling. Calculation of the probability p that reads aligned to the SMN Gene are from SMN1 using the Bayesian modeli. Assuming a prior distribution piBeta (α, β), with a standard a priori α β 1, then piThe posterior distribution of (2) is still a beta distribution
pi|Ri',pi~Beta(α+Ri',ri-Ri'+β),
Calculating P (P) by a cumulative distribution function of the posterior distributioni≤0.38|Ri',ri) The probability of the carrier of deletion of exon 7 of SMN1 gene was used as the ith sample. Calculating pi95% confidence interval. If the confidence interval is completely larger than 0.38, the sample is positive; completely less than 0.38, the sample is negative; 0.38 is within the confidence interval, the result is judged as a gray zone.
Step E, calculating the initial copy number of the SMN1 geneInitial copy number of SMN2 Gene
Figure BDA0001725135680000132
Then the predicted copy number
Figure BDA0001725135680000133
Here 0.5 represents the case between 0 copies and 1 copies but not subdivided. 2.5 represents the case between 2 copies and 3 copies but without subdivision. 3 represents a case where there are 3 copies or more.
And d, taking the result of the step d as the standard for the final judgment result of the SMN1 gene exon 7 deletion of the sample. And d, when the result of the step d is positive, judging homozygous deletion and heterozygous deletion by combining the result of the step e. And e, if the predicted copy number n of the SMN1 gene in the step e is 0, judging the sample to be homozygous deletion, if the n is 0.5, judging the sample to be a deletion gray area sample, and if the n is more than or equal to 1, judging the sample to be heterozygous deletion.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (34)

1. A method for determining whether a deletion of exon seven exists in SMN1 gene of a sample to be detected, which is characterized by comprising the following steps:
(1) sequencing a plurality of nucleic acid samples from a total sample set respectively, wherein the total sample set comprises a sample to be tested and at least one control sample, the plurality of nucleic acid samples all contain SMN coding genes and at least one control gene, and the SMN coding genes comprise:
coding sequence of seven exon of SMN 1;
coding sequence of seven exon of SMN 2;
coding sequences adjacent to the left and right of exon seven of SMN 1; and
coding sequence of left and right adjacent regions of seven exons of SMN2
(2) Selecting, for each sample in the total sample set, based on the sequencing results of step (1), sequencing reads derived from the SMN-encoding gene and the at least one control gene;
(3) determining an SMN1 exon seven parameter for the sample to be tested, the SMN1 exon seven parameter positively correlating to the number of sequencing reads derived from the SMN1 exon seven coding sequence;
(4) correcting the SMN1 exon seven parameter for the test sample based on sequencing reads of the at least one control gene in the at least one control sample;
(5) predicting, based on the corrected exon seven parameter of SMN1, a probability that a sequenced read derived from the SMN-encoding gene will be assigned to an SMN1 exon seven coding sequence; and
(6) and determining whether the SMN1 gene of the sample to be detected has seven exon deletion or not based on the probability.
2. The method of claim 1,
the coding sequence of the SMN1 exon seven comprises a first mutation site which is positioned at chr5:70247773,
the coding sequence adjacent to the left side of the SMN1 seventh exon comprises a second mutation site, and the second mutation is located in chr5: 70247724;
the coding sequence adjacent to the right side of the SMN1 exon seven comprises a third mutation site, and the third mutation is located in chr5: 70247921;
the coding sequence of the SMN2 exon seven comprises a fourth mutation site, and the fourth mutation site is located in chr5: 69372353;
the coding sequence adjacent to the left side of the SMN2 seventh exon comprises a fifth mutation site, and the fifth mutation site is located in chr5: 69372304; and
the coding sequence adjacent to the right side of the SMN2 exon seven comprises a sixth mutation site, and the sixth mutation site is located in chr5: 69372501.
3. The method of claim 2, wherein in step (3), the SMN1 exon seven parameter is determined for the test sample by:
(3-1) determining the number of the sequencing reads carrying the first to sixth mutation sites, respectively, based on the sequencing result of the sample to be tested;
(3-2) determining a first to third ratio based on the number of the sequencing reads of the first to sixth mutation sites obtained in step (3-1), wherein,
(ii) the first ratio y ═ B/B, wherein B represents the number of sequencing reads carrying the first mutation site, B represents the number of sequencing reads from carrying the first or fourth mutation site,
(ii) the second ratio x ═ a/a, where a represents the number of sequencing reads carrying the second mutation site, a represents the number of sequencing reads from carrying the second or fifth mutation site, and
(iii) the third ratio x is M/M, wherein M represents the number of sequencing reads carrying the third mutation site and M represents the number of sequencing reads from the third or sixth mutation site;
(3-3) determining parameters R and R according to the following formulas based on the first to third ratios, wherein R constitutes the SMN1 exon seven parameter:
when at least one of a difference absolute value of the first ratio and the second ratio and a difference absolute value of the first ratio and the third ratio exceeds 0.1, R is B and R is B;
when the absolute value of the difference between the first ratio and the second ratio and the absolute value of the difference between the first ratio and the third ratio do not exceed 0.1, R is a + B + M, and R is a + B + M.
4. The method of claim 3, further comprising:
and (3-4) determining whether the sample to be detected is qualified or not based on the parameter r.
5. The method of claim 4, wherein the parameter r is less than 200, which is an indication that the sample to be tested is not acceptable.
6. The method of claim 3, further comprising:
(3-5a) determining a fourth ratio q based on the parameters R and R, wherein the fourth ratio q is R/R;
(3-5b) judging whether the control sample is qualified, wherein the fourth proportion q is an indication that the control sample is qualified within the range of 0.43-0.57; or
And preliminarily determining that the SMN1 gene of the sample to be detected has no deletion of the seven exons based on the fourth proportion q within the range of 0.43-0.57.
7. The method of claim 1 or 6, wherein the at least one control gene is determined by:
(a) selecting a plurality of candidate genes based on sequencing results of the at least one control sample, the plurality of candidate genes having a sequencing depth in at least a portion of the control sample above a predetermined threshold;
(b) calculating a fifth ratio z in each of the at least one control sample for each of the plurality of candidate genes, respectivelyk,i=si/Hk,iWherein k represents a candidate gene number, i represents the number of the sample, and siIndicates the sequencing depth of the SMN gene in sample i, Hk,jRepresenting the sequencing depth of the candidate gene No. k in the sample No. i; and
(c) determining the at least one control gene based on the fifth ratio.
8. The method according to claim 7, wherein in step (c), the control gene meets at least one of the following criteria:
(c-1) between the at least one control samples, the coefficient of variation of the sequencing depth of the control gene is the smallest first 10 bits; and
(c-2) between the at least one control sample, the coefficient of variation of the fifth ratio is the first 10 smallest bits.
9. The method of claim 7, wherein the predetermined threshold is determined by:
sequencing depths of at least one part of all genes of the sample are arranged from small to large according to the sequencing result of the at least one control sample; and
determining the predetermined threshold based on the arrangement result, wherein the threshold is not less than the sequencing depth corresponding to the genes at 5% of positions;
optionally, the threshold is the sequencing depth corresponding to genes at 5% of the positions.
10. The method of claim 9,
the sequencing depth of the candidate gene is greater than the predetermined threshold in at least 90% of the at least one control sample.
11. The method of claim 7, wherein in step (4), said correction is performed by multiplying said SMN1 exon seventy parameter by a correction factor, wherein said correction factor is determined by the following formula:
wherein Z iskRepresenting the ratio of the sequencing depth of the SMN gene to the sequencing depth of the kth gene in the sample to be tested,
k represents the total number of the control genes in the set of control genes,
Figure FDA0001725135670000032
represents an average of said fifth ratio of the k-th numbered gene in said control sample set.
12. The method of claim 11,
Figure FDA0001725135670000041
is determined by the following formula:
Figure FDA0001725135670000042
n represents the total number of samples in the control sample gene, i represents the sample number, and k represents the gene number.
13. The method of claim 11, when formulated by formula
Figure FDA0001725135670000046
When the calculated value exceeds 1.5, the correction factor is selected to be 1.5.
14. The method of claim 1, wherein in step (5), the number of sequencing reads of the seven exon coding sequence corresponding to the seven exon parameter of the corrected SMN1 is subjected to a binomial distribution, and a Bayesian model is used to calculate the probability p that a sequencing read from the SMN-encoding gene belongs to the SMN1 seven exon coding sequencei
15. The method of claim 14, wherein in step (6), based on pi95% confidence interval [ a ', b']Determining whether the SMN1 gene of the sample to be tested has seven exon deletion,
wherein a' >0.38 is an indication that no exon seven deletion exists in the SMN1 gene of the sample to be tested,
b' <0.38, which is an indication that the SMN1 gene of the sample to be tested has a deletion of a No. seven exon;
and a 'is 0.38 and 0.38 is b', and whether the seven exons in the sample to be detected are deleted cannot be judged.
16. The method of claim 15, wherein the SMN1 gene of the test sample has a deletion of exon seven, further comprising the step of determining the presence of a deletion of exon seven according to the formula
Figure FDA0001725135670000044
Determining the copy number of the SMN1 gene in the sample to be tested by a formula
Figure FDA0001725135670000045
Determining the copy number of the SMN2 gene in the sample to be tested,
wherein, c1,iOr c2,iNot more than 0.1, is an indication that the SMN1 gene or the SMN2 gene copy number is 0,
c1,ior c2,iGreater than 0.1 but less than 0.5 is an indication that the SMN1 gene or SMN2 gene copy number is between 0 and 1,
c1,ior c2,iNot less than 0.5 but less than 1.485, is an indication that the SMN1 gene or the SMN2 gene copy number is 1,
c1,ior c2,iNot less than 1.485 but less than 2.324, is an indication that the SMN1 gene or the SMN2 gene copy number is 2,
c1,ior c2,iNot less than 2.324 but less than 2.743 is an indication that the SMN1 gene or SMN2 gene copy number is between 2 and 3,
c1,ior c2,iNot less than 2.743 indicates that the SMN1 gene or the SMN2 gene copy number is not less than 3.
17. The method of claim 16,
a SMN1 gene copy number of 0 is an indication of homozygous deletion of exon 7 of SMN1 gene;
an SMN1 gene copy number of not less than 1 is an indication of heterozygous deletion of exon 7 of SMN1 gene;
the copy number of the SMN1 gene is 0-1, which is an indication of deletion of the No. 7 exon gray region of the SMN1 gene.
18. A system for determining whether a deletion of exon seven exists in SMN1 gene of a sample to be tested, comprising:
a sequencing device for sequencing a plurality of nucleic acid samples from a total sample set, respectively, the total sample set including a sample to be tested and at least one control sample, the plurality of nucleic acid samples each containing an SMN-encoding gene and at least one control gene, the SMN-encoding gene comprising:
coding sequence of seven exon of SMN 1;
coding sequence of seven exon of SMN 2;
coding sequences adjacent to the left and right of exon seven of SMN 1; and
coding sequences adjacent to the left and right of exon seven of SMN 2;
means for selecting SMN-encoding genes and control genes, said means for selecting SMN-encoding genes and control genes being connected to said sequencing means for selecting, for each sample in said total sample set, a sequencing read derived from said SMN-encoding genes and said at least one control gene based on the sequencing results of said sequencing means;
a device for determining parameters of the seven exon of SMN1, wherein the device for determining parameters of the seven exon of SMN1 is connected with the device for selecting the SMN coding gene and the control gene, and is used for determining parameters of the seven exon of SMN1 aiming at the sample to be tested, and the parameters of the seven exon of SMN1 are positively correlated with the number of sequencing reads derived from the coding sequence of the seven exon of SMN 1;
a correcting device connected with the SMN1 exon seven parameter determining device and used for correcting the SMN1 exon seven parameter based on the sequencing reading of the at least one control gene in the at least one control sample aiming at the sample to be tested;
a prediction attribution device connected with the correcting device and used for predicting the probability that the sequencing reading segment derived from the SMN coding gene belongs to the SMN1 exon seven coding sequence based on the corrected SMN1 exon seven parameter; and
and the determining device is connected with the prediction attribution device and is used for determining whether the SMN1 gene of the sample to be detected has the deletion of the seven exons or not based on the probability.
19. The system of claim 18,
the coding sequence of the SMN1 exon seven comprises a first mutation site which is positioned at chr5:70247773,
the coding sequence adjacent to the left side of the SMN1 seventh exon comprises a second mutation site, and the second mutation is located in chr5: 70247724;
the coding sequence adjacent to the right side of the SMN1 exon seven comprises a third mutation site, and the third mutation is located in chr5: 70247921;
the coding sequence of the SMN2 exon seven comprises a fourth mutation site, and the fourth mutation site is located in chr5: 69372353;
the coding sequence adjacent to the left side of the SMN2 seventh exon comprises a fifth mutation site, and the fifth mutation site is located in chr5: 69372304; and
the coding sequence adjacent to the right side of the SMN2 exon seven comprises a sixth mutation site, and the sixth mutation site is located in chr5: 69372501.
20. The system of claim 19 wherein the means for determining the parameters of exon seven of SMN1 is adapted to:
(3-1) determining the number of the sequencing reads carrying the first to sixth mutation sites, respectively, based on the sequencing result of the sample to be tested;
(3-2) determining a first to third ratio based on the number of the sequencing reads of the first to sixth mutation sites obtained in step (3-1), wherein,
(ii) the first ratio y ═ B/B, wherein B represents the number of sequencing reads carrying the first mutation site, B represents the number of sequencing reads from carrying the first or fourth mutation site,
(ii) the second ratio x ═ a/a, where a represents the number of sequencing reads carrying the second mutation site, a represents the number of sequencing reads from carrying the second or fifth mutation site, and
(iii) the third ratio x is M/M, wherein M represents the number of sequencing reads carrying the third mutation site and M represents the number of sequencing reads from the third or sixth mutation site;
(3-3) determining parameters R and R according to the following formulas based on the first to third ratios, wherein R constitutes the SMN1 exon seven parameter:
when at least one of a difference absolute value of the first ratio and the second ratio and a difference absolute value of the first ratio and the third ratio exceeds 0.1, R is B and R is B;
when the absolute value of the difference between the first ratio and the second ratio and the absolute value of the difference between the first ratio and the third ratio do not exceed 0.1, R is a + B + M, and R is a + B + M.
21. The system of claim 20 wherein the means for determining the parameters of exon seven of SMN1 is further adapted to:
and (3-4) determining whether the sample to be detected is qualified or not based on the parameter r.
22. The system of claim 21, wherein the parameter r is less than 200, which is an indication that the sample under test is unacceptable.
23. The system of claim 22 wherein the means for determining the parameters of exon seven of SMN1 is further adapted to:
(3-5a) determining a fourth ratio q based on the parameters R and R, wherein the fourth ratio q is R/R;
(3-5b) judging whether the control sample is qualified, wherein the fourth proportion q is an indication that the control sample is qualified within the range of 0.43-0.57; or
And preliminarily determining that the SMN1 gene of the sample to be detected has no deletion of the seven exons based on the fourth proportion q within the range of 0.43-0.57.
24. The system of claim 18 or 23, wherein the at least one control gene is determined by:
(a) selecting a plurality of candidate genes based on sequencing results of the at least one control sample, the plurality of candidate genes having a sequencing depth in at least a portion of the control sample above a predetermined threshold;
(b) calculating a fifth ratio z in each of the at least one control sample for each of the plurality of candidate genes, respectivelyk,i=si/Hk,iWherein k represents a candidate gene number, i represents the number of the sample, and siIndicates the sequencing depth of the SMN gene in sample i, Hk,jRepresenting the sequencing depth of the candidate gene No. k in the sample No. i; and
(c) determining the at least one control gene based on the fifth ratio.
25. The system of claim 24, wherein in step (c), the control gene meets at least one of the following criteria:
(c-1) between the at least one control samples, the coefficient of variation of the sequencing depth of the control gene is the smallest first 10 bits; and
(c-2) between the at least one control sample, the coefficient of variation of the fifth ratio is the first 10 smallest bits.
26. The system of claim 24, wherein the predetermined threshold is determined by:
sequencing depths of at least one part of all genes of the sample are arranged from small to large according to the sequencing result of the at least one control sample; and
determining the predetermined threshold based on the arrangement result, wherein the threshold is not less than the sequencing depth corresponding to the genes at 5% of positions;
optionally, the threshold is the sequencing depth corresponding to genes at 5% of the positions.
27. The system of claim 26, wherein the sequencing depth of the candidate gene is greater than the predetermined threshold in at least 90% of the at least one control sample.
28. The system of claim 24, wherein the corrective device is adapted to perform the following operations, the correction being performed by multiplying the SMN1 exon seventy parameter by a correction factor, wherein the correction factor is determined by the following equation:
Figure FDA0001725135670000081
wherein the content of the first and second substances,
Zkrepresenting the ratio of the sequencing depth of the SMN gene to the sequencing depth of the kth gene in the sample to be tested,
k represents the total number of the control genes in the set of control genes,
Figure FDA0001725135670000082
represents an average of said fifth ratio of the k-th numbered gene in said control sample set.
29. The system of claim 28,is determined by the following formula:
Figure FDA0001725135670000084
n represents the total number of samples in the control sample gene, i represents the sample number, and k represents the gene number.
30. The system of claim 28, when formulated by formula
Figure FDA0001725135670000085
When the calculated value exceeds 1.5, the value is determinedThe correction factor was chosen to be 1.5.
31. The system according to claim 18, characterized in that said prediction attribution means are adapted to perform the following operations: the sequencing reading number of the seven exon coding sequence corresponding to the corrected seven exon parameter of the SMN1 is subjected to binomial distribution, and a Bayesian model is used for calculating the probability p that the sequencing reading from the SMN coding gene belongs to the SMN1 seven exon coding sequencei
32. The system according to claim 31, characterized in that said determining means are adapted to perform the following operations:
based on said pi95% confidence interval [ a ', b']Determining whether the SMN1 gene of the sample to be tested has seven exon deletion,
wherein a' >0.38 is an indication that no exon seven deletion exists in the SMN1 gene of the sample to be tested,
b' <0.38 is an indication that no exon seven deletion exists in the SMN1 gene of the test sample;
and a 'is 0.38 and 0.38 is b', and whether the seven exons in the sample to be detected are deleted cannot be judged.
33. The system of claim 18, further comprising means for determining the copy number of the SMN1 gene, the means for determining the copy number of the SMN1 gene being coupled to the means for determining, the means for determining the copy number of the SMN1 gene being adapted to:
when the SMN1 gene of the sample to be tested has the seven exon deletion,
by the formula
Figure FDA0001725135670000086
Determining the copy number of the SMN1 gene in the sample to be tested by a formula
Figure FDA0001725135670000087
Determining the copy number of the SMN2 gene in the sample to be tested,
wherein, c1,iOr c2,iNot more than 0.1, is an indication that the SMN1 gene or the SMN2 gene copy number is 0,
c1,ior c2,iGreater than 0.1 but less than 0.5 is an indication that the SMN1 gene or SMN2 gene copy number is between 0 and 1,
c1,ior c2,iNot less than 0.5 but less than 1.485, is an indication that the SMN1 gene or the SMN2 gene copy number is 1,
c1,ior c2,iNot less than 1.485 but less than 2.324, is an indication that the SMN1 gene or the SMN2 gene copy number is 2,
c1,ior c2,iNot less than 2.324 but less than 2.743 is an indication that the SMN1 gene or SMN2 gene copy number is between 2 and 3,
c1,ior c2,iNot less than 2.743 indicates that the SMN1 gene or the SMN2 gene copy number is not less than 3.
34. The system of claim 33,
a SMN1 gene copy number of 0 is an indication of homozygous deletion of exon 7 of SMN1 gene;
an SMN1 gene copy number of not less than 1 is an indication of heterozygous deletion of exon 7 of SMN1 gene;
the copy number of the SMN1 gene is 0-1, which is an indication of deletion of the No. 7 exon gray region of the SMN1 gene.
CN201810749278.7A 2018-07-10 2018-07-10 Method and system for determining whether seven-exon deletion exists in SMN1 gene of sample to be tested Active CN110699436B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810749278.7A CN110699436B (en) 2018-07-10 2018-07-10 Method and system for determining whether seven-exon deletion exists in SMN1 gene of sample to be tested

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810749278.7A CN110699436B (en) 2018-07-10 2018-07-10 Method and system for determining whether seven-exon deletion exists in SMN1 gene of sample to be tested

Publications (2)

Publication Number Publication Date
CN110699436A true CN110699436A (en) 2020-01-17
CN110699436B CN110699436B (en) 2023-07-21

Family

ID=69192858

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810749278.7A Active CN110699436B (en) 2018-07-10 2018-07-10 Method and system for determining whether seven-exon deletion exists in SMN1 gene of sample to be tested

Country Status (1)

Country Link
CN (1) CN110699436B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111292804A (en) * 2020-04-08 2020-06-16 北京智因东方转化医学研究中心有限公司 Method and system for detecting SMN1 gene mutation by means of high-throughput sequencing
CN112435710A (en) * 2020-10-16 2021-03-02 赛福解码(北京)基因科技有限公司 Method for detecting single-sample SMN gene copy number in WES data

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105648045A (en) * 2014-11-13 2016-06-08 天津华大基因科技有限公司 Method and apparatus for determining fetus target area haplotype
US20160188793A1 (en) * 2014-12-29 2016-06-30 Counsyl, Inc. Method For Determining Genotypes in Regions of High Homology
CN106834502A (en) * 2017-03-06 2017-06-13 明码(上海)生物科技有限公司 A kind of spinal muscular atrophy related gene copy number detection kit and method based on gene trap and two generation sequencing technologies
CN107267613A (en) * 2017-06-28 2017-10-20 安吉康尔(深圳)科技有限公司 Sequencing data processing system and SMN gene detection systems
CN107526941A (en) * 2017-09-22 2017-12-29 至本医疗科技(上海)有限公司 Copy number variation detection pretreatment unit, detection means, decision maker and system
US20180129778A1 (en) * 2015-05-28 2018-05-10 Genepeeks, Inc. Systems and methods for providing improved prediction of carrier status for spinal muscular atrophy
WO2018112249A1 (en) * 2016-12-15 2018-06-21 Illumina, Inc. Methods and systems for determining paralogs

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105648045A (en) * 2014-11-13 2016-06-08 天津华大基因科技有限公司 Method and apparatus for determining fetus target area haplotype
US20160188793A1 (en) * 2014-12-29 2016-06-30 Counsyl, Inc. Method For Determining Genotypes in Regions of High Homology
US20180129778A1 (en) * 2015-05-28 2018-05-10 Genepeeks, Inc. Systems and methods for providing improved prediction of carrier status for spinal muscular atrophy
WO2018112249A1 (en) * 2016-12-15 2018-06-21 Illumina, Inc. Methods and systems for determining paralogs
CN106834502A (en) * 2017-03-06 2017-06-13 明码(上海)生物科技有限公司 A kind of spinal muscular atrophy related gene copy number detection kit and method based on gene trap and two generation sequencing technologies
CN107267613A (en) * 2017-06-28 2017-10-20 安吉康尔(深圳)科技有限公司 Sequencing data processing system and SMN gene detection systems
CN107526941A (en) * 2017-09-22 2017-12-29 至本医疗科技(上海)有限公司 Copy number variation detection pretreatment unit, detection means, decision maker and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JESSICA L. LARSON等: "Validation of a high resolution NGS method for detecting spinal muscular atrophy carriers among phase 3 participants in the 1000 Genomes Project", 《BMC MEDICAL GENETICS》 *
王佶等: "脊髓性肌萎缩症SMN1和SMN2基因拷贝数变异分析", 《中国循证儿科杂志》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111292804A (en) * 2020-04-08 2020-06-16 北京智因东方转化医学研究中心有限公司 Method and system for detecting SMN1 gene mutation by means of high-throughput sequencing
WO2021204205A1 (en) * 2020-04-08 2021-10-14 北京智因东方转化医学研究中心有限公司 Method and system for detecting smn1 gene mutation by means of high-throughput sequencing
CN111292804B (en) * 2020-04-08 2021-11-26 北京智因东方诊断科技有限公司 Method and system for detecting SMN1 gene mutation by means of high-throughput sequencing
CN112435710A (en) * 2020-10-16 2021-03-02 赛福解码(北京)基因科技有限公司 Method for detecting single-sample SMN gene copy number in WES data
CN112435710B (en) * 2020-10-16 2024-05-03 赛福解码(北京)基因科技有限公司 Method for detecting single sample SMN gene copy number in WES data

Also Published As

Publication number Publication date
CN110699436B (en) 2023-07-21

Similar Documents

Publication Publication Date Title
CN106834502B (en) A kind of spinal muscular atrophy related gene copy number detection kit and method based on gene trap and two generation sequencing technologies
CN109887548B (en) ctDNA ratio detection method and detection device based on capture sequencing
CN111440884A (en) Intestinal flora for diagnosing sarcopenia and application thereof
CN111081315B (en) Homologous pseudogene mutation detection method
WO2018054254A1 (en) Method and system for identifying tumor load in sample
CN111534579B (en) Capture probe, kit and detection method for large fragment rearrangement detection based on capture sequencing
CN110699436A (en) Method and system for determining whether number seven exon deletion exists in SMN1 gene of sample to be detected
JP4414008B2 (en) SCA7 gene and method of use
CN111292804B (en) Method and system for detecting SMN1 gene mutation by means of high-throughput sequencing
CN110993029A (en) Method and system for detecting chromosome abnormality
CN110846393A (en) MYBPC 3G 1831T mutation affecting diagnosis and treatment of human hypertrophic cardiomyopathy and application thereof
CN117079723A (en) Biomarker and diagnostic model related to amyotrophic lateral sclerosis and application of biomarker and diagnostic model
CN112048548A (en) Method for detecting SMN gene copy number by taking SMNP as control
CN116312779A (en) Method and apparatus for detecting sample contamination and identifying sample mismatch
CN110993024B (en) Method and device for establishing fetal concentration correction model and method and device for quantifying fetal concentration
KR101915701B1 (en) Method for measuring mutation rate
KR101289134B1 (en) Sbf1 (mtmr5) as a causative gene responsible for a inherited charcot-marie-tooth peripheral neuropathy type cmt4b3 and diagnosis method and composition for the disease
CN109785899B (en) Genotype correction device and method
Høyer et al. Hereditary peripheral neuropathies diagnosed by next-generation sequencing
CN114420214A (en) Quality evaluation method and screening method of nucleic acid sequencing data
KR101929165B1 (en) Kit for Diagnosing Charcot-Marie-Tooth
CN109192243B (en) Method, apparatus and medium for correcting chromosome proportion
CN111334513A (en) Non-syndromic cleft lip related low-frequency/rare mutation and detection method thereof
CN112435710A (en) Method for detecting single-sample SMN gene copy number in WES data
CN115662507B (en) Sequencing sample homology detection method and system based on small sample SNPs linear fitting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210111

Address after: Room 201-1, Building 3, East District, Airport Business Park, No. 80 Huanbei Road, Free Trade Zone (Airport Economic Zone) of Tianjin Binhai New Area, 300308

Applicant after: TIANJIN MEDICAL LABORATORY, BGI

Applicant after: BGI-GUANGZHOU MEDICAL LABORATORY Co.,Ltd.

Applicant after: BGI SHENZHEN Co.,Ltd.

Applicant after: SHENZHEN HUADA CLINIC EXAMINATION CENTER

Applicant after: Huada Biotechnology (Wuhan) Co.,Ltd.

Address before: Room 201-1, Building 3, East District, Airport Business Park, No. 80 Huanbei Road, Free Trade Zone (Airport Economic Zone) of Tianjin Binhai New Area, 300308

Applicant before: TIANJIN MEDICAL LABORATORY, BGI

Applicant before: BGI-GUANGZHOU MEDICAL LABORATORY Co.,Ltd.

Applicant before: BGI SHENZHEN Co.,Ltd.

Applicant before: SHENZHEN HUADA CLINIC EXAMINATION CENTER

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: Room 201-1, Building 3, East District, Airport Business Park, No. 80 Huanbei Road, Free Trade Zone (Airport Economic Zone) of Tianjin Binhai New Area, 300308

Patentee after: TIANJIN MEDICAL LABORATORY, BGI

Patentee after: BGI-GUANGZHOU MEDICAL LABORATORY Co.,Ltd.

Patentee after: BGI SHENZHEN Co.,Ltd.

Patentee after: Shenzhen Huada Medical Laboratory

Patentee after: Huada Biotechnology (Wuhan) Co.,Ltd.

Address before: Room 201-1, Building 3, East District, Airport Business Park, No. 80 Huanbei Road, Free Trade Zone (Airport Economic Zone) of Tianjin Binhai New Area, 300308

Patentee before: TIANJIN MEDICAL LABORATORY, BGI

Patentee before: BGI-GUANGZHOU MEDICAL LABORATORY Co.,Ltd.

Patentee before: BGI SHENZHEN Co.,Ltd.

Patentee before: SHENZHEN HUADA CLINIC EXAMINATION CENTER

Patentee before: Huada Biotechnology (Wuhan) Co.,Ltd.

CP01 Change in the name or title of a patent holder