WO2021204205A1 - Method and system for detecting smn1 gene mutation by means of high-throughput sequencing - Google Patents

Method and system for detecting smn1 gene mutation by means of high-throughput sequencing Download PDF

Info

Publication number
WO2021204205A1
WO2021204205A1 PCT/CN2021/085974 CN2021085974W WO2021204205A1 WO 2021204205 A1 WO2021204205 A1 WO 2021204205A1 CN 2021085974 W CN2021085974 W CN 2021085974W WO 2021204205 A1 WO2021204205 A1 WO 2021204205A1
Authority
WO
WIPO (PCT)
Prior art keywords
smn1
sequencing
sma
exon
reads
Prior art date
Application number
PCT/CN2021/085974
Other languages
French (fr)
Chinese (zh)
Inventor
谷为岳
Original Assignee
北京智因东方转化医学研究中心有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京智因东方转化医学研究中心有限公司 filed Critical 北京智因东方转化医学研究中心有限公司
Publication of WO2021204205A1 publication Critical patent/WO2021204205A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • the invention belongs to the field of gene detection and analysis. Specifically, the present invention relates to a device, method and system for detecting SMN1 gene mutations through high-throughput sequencing and special analysis methods, especially detecting the homozygous deletion of exon 7 of SMN1 gene.
  • the present invention also relates to the use of the device, method and system of the present invention to diagnose spinal muscular atrophy (SMA) or differential diagnosis of SMA and other diseases that are easily confused with the SMA phenotype, and a machine on which the method of the present invention is stored Readable media and devices.
  • SMA spinal muscular atrophy
  • differential diagnosis of SMA and other diseases that are easily confused with the SMA phenotype
  • Readable media and devices Readable media and devices.
  • SMA Spinal muscular atrophy
  • OMIM#253300 is a neuromuscular disease caused by the loss of motor neurons in the brain stem and anterior horn of the spinal cord. It is an autosomal recessive genetic disease. Most patients survive by motor neurons Caused by homozygous deletion of 1(SMN1) gene. In the Chinese population, the probability of carrying SMA-related SMN1 heterozygous deletion is about 1 in 42 (Sheng-Yuan, Z., etc., Molecular characterization of SMN copy number derived from carrier screening and from core families with SMA in a Chinese population.Eur J Hum Genet, 2010.18(9): Pages 978-84).
  • the carrier rate of SMA pathogenic genes in Taiwan, China is about 1 to 3%, and the incidence rate is about 1/17,000 (Chien, YH, et al., Presymptomatic Diagnosis of Spinal Molecular Atrophy Through Newborn Screening. J Pediatr, 2017) , Similar to mainland China.
  • SMA motor neuron survival gene 1
  • the SMN1 gene plays a vital role in various physiological processes such as the growth of motor neuron axons in the anterior horn of the spinal cord and the formation of synapses in neuromuscular junctions (Yang Lan, Song Fang, Research Progress in the Treatment of Spinal Muscular Atrophy, " Chinese Journal of Pediatrics, 2016.54(8): Pages 634-637).
  • the deletion of the encoded protein due to this gene defect is associated with a variety of cross-system diseases (Singh, RN, etc., Diverse role of survival motor neuron protein. Biochim Biophys Acta, 2017.1860(3): pages 299-315), and also due to some The clinical characteristics of patients are not typical, which creates the need for clinical differential diagnosis.
  • a suspected diagnosis can be made based on the characteristics of SMA.
  • Specific methods include electromyography, muscle biopsy histochemical staining, and serum creatine phosphokinase detection.
  • the aforementioned examination methods are either not suitable for infants and young children, or require high testing conditions, so in the diagnosis and differentiation
  • the diagnosis also depends on special molecular tests. Since Steege and his colleagues used restriction fragment length polymorphism PCR (PCR-RFLP) technology for the diagnosis of SMA in 1995, multiple ligation-dependent probe amplification (MLPA) and real-time fluorescent quantitative PCR (qPCR) have subsequently appeared.
  • PCR-RFLP restriction fragment length polymorphism PCR
  • MLPA multiple ligation-dependent probe amplification
  • qPCR real-time fluorescent quantitative PCR
  • NGS second-generation genome sequencing technology
  • WES Whole Exom Sequencing
  • the detection range of WES covers all the coding regions of about 20,000 genes in the human genome, it can not only detect pathogenic SMN1 homozygous deletions, but also help to identify those with the clinical phenotype spectrum of SMA in etiology. Cases with similar but different genetic causes have unique advantages in accurate diagnosis of neuromuscular diseases.
  • centromere copy of the SMN gene in the human body namely SMN2, or SMN centromere.
  • SMN2 Due to the omission of exon 7 during transcription, SMN2 only encodes a very small amount of full-length SMN protein and a large amount of SMN truncated form SMN ⁇ 7. Because SMN1 and SMN2 genes are highly homologous, WES's conventional data analysis method is difficult to distinguish between the two, so it is considered not suitable for molecular diagnosis of SMA.
  • the present invention provides a device, method and system for detecting SMN1 gene mutations, especially SMN1 homozygous deletion mutations, which are the most common pathogenic mutations in SMA, which rely on high-throughput sequencing and use special algorithms to analyze high-throughput
  • the sequencing result can achieve a high detection rate and accuracy of the SMN1 gene mutation that is no less than that of the traditional "gold standard" detection technology MLPA.
  • the sequencing results can also contain information about genes related to other neuromuscular diseases, and can not only be used to diagnose SMA, but also Differential diagnosis is achieved for other neuromuscular diseases with similar clinical features of SMA.
  • the method and system of the present invention eliminate the problem that high-throughput sequencing, such as whole-exome sequencing, is not suitable for detecting SMN1 homozygous deletion mutations in the prior art through the use of special algorithms, thereby completing the present invention.
  • the present invention relates to an analysis device for detecting homozygous mutations in the SMN1 gene of a subject, wherein the analysis device includes:
  • a calculation module that calculates the ratio of (the number of reads at base C at position 840 of exon 7 of SMN1)/(the total number of reads at position 840 of SMN1 exon 7),
  • the determination module when the ratio is equal to or close to 0, determines that the subject is a positive subject with a homozygous deletion of SMN1 exon 7, otherwise it is determined that the subject does not have SMN1 Subjects negative for homozygous deletion of exon 7.
  • the calculation module in the analysis device filters out read sequences with an average quality value of 20 or less, preferably filters read sequences with an evaluation quality value of 25 or less.
  • the calculation module in the analysis device filters out reads with a mass value of less than 10 at position 840 of exon 7 of SMN1 before performing the calculation.
  • the calculation module filters out reads with a mass value of less than 20 at position 840 of exon 7 of SMN1 before performing the calculation. More preferably, the calculation module filters out reads with a mass value of less than 25 at position 840 of exon 7 of SMN1 before performing the calculation.
  • the calculation module in the analysis device removes the repetitive sequence amplified by PCR before performing the calculation.
  • the ratio close to 0 is a ratio less than 0.1.
  • the analysis device is used for the diagnosis of SMA.
  • the SMA is genetically related to a mutation of SMN.
  • the SMA is selected from the group consisting of type SMA-I, type SMA-II, type SMA-III, and type SMA-IV.
  • the analysis device is used for the differential diagnosis of SMA and other diseases with similar phenotypes to SMA.
  • the disease having a similar phenotype to SMA is a neuromuscular disease.
  • the present invention relates to a system for detecting homozygous mutations in the SMN1 gene of a subject, wherein the system includes:
  • a sequencing device that sequenced a plurality of amplicons obtained by amplifying nucleic acids in a sample from a subject and containing the 840th position of exon 7 of the SMN1 gene, The sequencing generates a plurality of reads containing the 840th position of exon 7 of the SMN1 gene; and the analysis device described in the first aspect.
  • the sequencing of the present invention is high-throughput sequencing.
  • the high-throughput sequencing used in the present invention is selected from the group consisting of: sequencing of SMN1 gene, sequencing of SMN1 exon 7, sequencing of Panel containing SMN1 gene or its seventh exon, whole Genome Sequencing (Whole Genome Sequencing, WGS), Whole Exom Sequencing (WES), or Clinical Exom Sequencing (CES).
  • the high-throughput sequencing of the present invention is whole exome sequencing or clinical exome sequencing.
  • the number of reads at position 840 of SMN1 exon 7 in the information read by the analysis device of the present invention is 100,000 to 1 million.
  • the number of reads in the information read by the analysis device of the present invention is at least 10, at least 50, at least 100, at least 1,000, at least 10,000, at least 100,000, or at least 1 million.
  • the system further includes an amplification device that amplifies a nucleic acid-containing sample from the subject to generate a plurality of exons containing the SMN1 gene.
  • the amplicons of the site, the multiple amplicons are used in the sequencing device.
  • the analysis device or system of the present invention is used to diagnose spinal muscular atrophy (SMA).
  • SMA spinal muscular atrophy
  • the SMA is genetically related to a mutation of SMN.
  • the SMA is selected from the group consisting of type SMA-I, type SMA-II, type SMA-III, and type SMA-IV.
  • the analysis device or system is used for the differential diagnosis of SMA and other diseases with similar phenotypes to SMA.
  • the disease having a similar phenotype to SMA is a neuromuscular disease.
  • the analysis device or system of the present invention is used to differentially diagnose SMA and diseases similar to the SMA phenotype.
  • the disease similar to the SMA phenotype is selected from: Becker type muscular dystrophy, Bethlem myopathy, Kleefstra syndrome, Merosin deficiency congenital muscular dystrophy, Ullrich type congenital muscular nutrition Poor, X-linked myotube myopathy, X-linked central nucleus myopathy, YWHAE gene Miller-Dieker syndrome, congenital glycosylation type 1A (OMIM: 212065), congenital myasthenia syndrome type 4A, congenital Myopathy (early onset, with cardiomyopathy), giant skull with subcortical spongiform sac with individual leukoencephalopathy type 1, autosomal dominant lower extremity spinal muscular atrophy, autosomal recessive muscular sclerosis, autosomal recessive Genetic Distal Spinal Cord Muscular Atrophy Type 2 (OMIM:605726), Duch
  • the modules in the analysis device of the present invention may be connected via wired and wireless connections.
  • the present invention relates to a method for detecting a homozygous mutation in the SMN1 gene of a subject, the method comprising:
  • the information includes multiple reads containing the 840th position of exon 7 of the SMN1 gene;
  • the subject is determined to be a positive subject with a homozygous deletion of SMN1 exon 7, otherwise the subject is determined to be absent of SMN1 Subjects negative for homozygous deletion of exon 7.
  • the present invention relates to a machine-readable medium, which contains a machine-readable code that, when implemented by a machine, performs the following operations to detect the presence of a homozygous mutation in the SMN1 gene of a subject:
  • the information includes multiple reads containing the 840th position of exon 7 of the SMN1 gene;
  • the subject is determined to be a positive subject with a homozygous deletion of SMN1 exon 7, otherwise the subject is determined to be absent of SMN1 Subjects negative for homozygous deletion of exon 7.
  • reads with a mass value of less than 10 at position 840 of exon 7 of SMN1 are filtered out.
  • the reads with a mass value of less than 20 at position 840 of exon 7 of SMN1 are filtered out.
  • the repetitive sequence amplified by PCR is removed before the calculation in step (2) is performed.
  • the ratio close to 0 is a ratio less than 0.1.
  • the method or the machine-readable medium is used for the diagnosis of SMA.
  • the SMA is genetically related to a mutation of SMN.
  • the SMA is selected from the group consisting of type SMA-I, type SMA-II, type SMA-III, and type SMA-IV.
  • the method or the machine-readable medium is used for the differential diagnosis of SMA and other diseases with similar phenotypes to SMA.
  • the disease having a similar phenotype to SMA is a neuromuscular disease.
  • the disease having a similar phenotype to SMA is selected from: Becker type muscular dystrophy, Bethlem myopathy, Kleefstra syndrome, Merosin deficiency congenital muscular dystrophy, Ullrich type congenital muscular nutrition Poor, X-linked myotube myopathy, X-linked central nucleus myopathy, YWHAE gene Miller-Dieker syndrome, congenital glycosylation type 1A (OMIM: 212065), congenital myasthenia syndrome type 4A, congenital Myopathy (early onset, with cardiomyopathy), giant skull with subcortical spongiform sac with individual leukoencephalopathy type 1, autosomal dominant lower extremity spinal muscular atrophy, autosomal recessive muscular sclerosis, autosomal recessive Genetic Distal Spinal Cord Muscular Atrophy Type 2 (OMIM:605726), Duchenne Muscular Dystrophy/Progressive Duchenne Muscular Dystrophy, Amyotrophic Toilet Sclerosis (ASL
  • the number of reads at position 840 of SMN1 exon 7 in the information from the sequencing is 100,000 to 1 million, for example, at least 10, at least 50, or at least 100 reads. , At least 1,000, at least 10,000, at least 100,000, or at least 1 million.
  • the sequencing is high-throughput sequencing.
  • the high-throughput sequencing used in the present invention is selected from the group consisting of: sequencing of SMN1 gene, sequencing of SMN1 exon 7, sequencing of Panel containing SMN1 gene or its seventh exon, whole Genome Sequencing (WGS), Whole Exome Sequencing (WES) or Clinical Exome Sequencing (CES).
  • the high-throughput sequencing of the present invention is whole exome sequencing or clinical exome sequencing.
  • the present invention relates to a device that includes the machine-readable medium according to the fourth aspect of the present invention.
  • the present invention relates to the analysis device described in the first aspect of the present invention, the system described in the second aspect, the medium in the fourth aspect, and the equipment in the fifth aspect of the present invention for use in diagnosing SMA or differentially diagnosing SMA and Other uses for diseases with similar phenotypes to SMA.
  • the present invention relates to a method for diagnosing SMA, which includes: (1) obtaining sequencing information of a subject, the sequencing information including a plurality of reads at position 840 of exon 7 of the SMN1 gene; (2) Calculate the ratio of (the 840th position of SMN1 exon 7 is the number of reads at base C)/(including the total number of reads at the 840th position of SMN1 exon 7), when the ratio is equal to or close to 0 ,
  • the subject was determined to be a positive subject with a homozygous deletion of SMN1 exon 7 and was diagnosed with SMA.
  • the sequencing information of the subject comes from high-throughput sequencing.
  • the high-throughput sequencing is selected from the group consisting of: sequencing of SMN1 gene, sequencing of SMN1 exon 7, including SMN1 Panel sequencing, whole genome sequencing, whole exome sequencing or clinical exome sequencing of genes or their 7th exon.
  • the SMA is genetically related to a mutation of SMN.
  • the SMA is selected from the group consisting of type SMA-I, type SMA-II, type SMA-III, and type SMA-IV.
  • the method is used to differentially diagnose SMA from other diseases that have a similar phenotype to SMA.
  • the disease having a similar phenotype to SMA is a neuromuscular disease.
  • reads with a mass value of less than 10 at position 840 of exon 7 of SMN1 are filtered out.
  • reads with a mass value of less than 20 at position 840 of SMN1 exon 7 are filtered out, and more preferably, the quality value at position 840 of SMN1 exon 7 is filtered out Less than 25 reads.
  • step (2) before the calculation of step (2), the repetitive sequence amplified by PCR is removed.
  • the ratio close to 0 is a ratio less than 0.1.
  • Figure 1 exemplarily shows the processing method of randomly assigning the obtained read sequences to SMN1 and SMN2 through the comparison algorithm Burrows-Wheeler after the sample is subjected to genome sequencing.
  • the table in Figure 2 lists that when the present invention is used for differential diagnosis, among the subjects diagnosed as non-SMN1 homozygous deletion by the present invention, 21 subjects were further diagnosed and diagnosed as other subjects. Neuromuscular disease. Figure 2 lists the ages, mutation types, and diagnosed diseases of the 21 subjects.
  • subject refers to vertebrates, preferably mammals, such as rodents, primates, and more preferably humans.
  • genomic DNA refers to nucleic acids (e.g., DNA, e.g., genomic DNA and cDNA) and their corresponding nucleotide sequences encoding RNA transcripts.
  • genomic DNA includes inserted non-coding regions and regulatory regions, and may include 5'and 3'ends.
  • the term includes transcribed sequences, including 5'and 3'untranslated regions (5'-UTR and 3'-UTR), exons and introns.
  • the transcribed region will contain an "open reading frame" encoding the polypeptide.
  • a “gene” contains only the coding sequence necessary to encode a polypeptide (e.g., an "open reading frame” or “coding region”). In some cases, genes do not encode polypeptides, such as ribosomal RNA genes (rRNA) and transfer RNA (tRNA) genes. In some cases, the term “gene” includes not only transcribed sequences, but also non-transcribed regions, including upstream and downstream regulatory regions, enhancers and promoters. A gene can refer to an "endogenous gene” or a natural gene in its natural location in the genome of an organism. Genes can refer to "foreign genes” or non-natural genes.
  • Non-native genes may refer to genes that are not normally found in the host organism but are introduced into the host organism by gene transfer.
  • Non-natural genes can also refer to genes that are not in their natural locations in the genome of an organism.
  • Non-natural genes can also refer to naturally occurring nucleic acid or polypeptide sequences that contain mutations, insertions, and/or deletions (e.g., non-natural sequences).
  • nucleotide generally refers to a base-sugar-phosphate combination. Nucleotides may include synthetic nucleotides. Nucleotides may include synthetic nucleotide analogs. Nucleotides may be monomeric units of nucleic acid sequences (e.g., deoxyribonucleic acid (DNA) and ribonucleic acid (RNA)).
  • DNA deoxyribonucleic acid
  • RNA ribonucleic acid
  • nucleotide may include ribonucleoside adenosine triphosphate (ATP), uridine triphosphate (UTP), cytosine triphosphate (CTP), guanosine triphosphate (GTP) and deoxyribonucleoside triphosphates such as dATP, dCTP , DITP, dUTP, dGTP, dTTP or derivatives thereof.
  • ATP ribonucleoside adenosine triphosphate
  • UDP uridine triphosphate
  • CTP cytosine triphosphate
  • GTP guanosine triphosphate
  • deoxyribonucleoside triphosphates such as dATP, dCTP , DITP, dUTP, dGTP, dTTP or derivatives thereof.
  • derivatives may include, for example, [ ⁇ S]dATP, 7-deaza-dGTP and 7-deaza-dATP, and nucleotide derivatives that confer nuclea
  • nucleotide as used herein may refer to dideoxyribonucleoside triphosphate (ddNTP) and its derivatives.
  • ddNTP dideoxyribonucleoside triphosphate
  • Illustrative examples of dideoxyribonucleoside triphosphates may include, but are not limited to, ddATP, ddCTP, ddGTP, ddITP, and ddTTP.
  • Nucleotides can be unlabeled or detectably labeled by well-known techniques. Marking can also be done with quantum dots. Detectable labels can include, for example, radioisotopes, fluorescent labels, chemiluminescent labels, bioluminescent labels, and enzyme labels.
  • polynucleotide refers to nucleotides, deoxyribonucleotides or ribonucleotides, or their analogues, of any length in polymerized form. It is in single-stranded, double-stranded or multi-stranded form.
  • the polynucleotide can be exogenous or endogenous to the cell.
  • the polynucleotide may exist in a cell-free environment.
  • the polynucleotide can be a gene or a fragment thereof.
  • the polynucleotide may be DNA.
  • the polynucleotide may be RNA.
  • a polynucleotide can have any three-dimensional structure and can perform any function, known or unknown.
  • a polynucleotide may contain one or more analogs (e.g., altered backbone, sugar or nucleobases).
  • SMA spinal muscular atrophy
  • SMN1 and SMN2 spinal muscular atrophy
  • SMA spinal muscular atrophy Ranked among 121 diseases.
  • SMA has an early onset and is a pediatric neurodegenerative disease. In severe cases, it is generally difficult to survive beyond 2 years of age. Broadly speaking, SMA also includes some rarer types that are not related to the SMN gene. Therefore, in the context of the present invention, "SMA" refers to the type caused by a homozygous deletion point mutation of the SMN gene, unless specifically indicated otherwise.
  • the SMA associated with homozygous deletion mutations in the SMN gene on chromosome 5 is further divided into four subtypes: type I, type II, type III, and type IV. There are differences in clinical manifestations. In general, the earlier the onset, the more severe the subtype. There are also opinions that the copy number of SMN2 may have an impact on the severity of SMA ( M et al., Am J Hum Genet 2002; 70:358-368).
  • SMA-I also known as Werdnig Hoffman's disease
  • SMA-I is the most serious subtype. Its onset time is early. In some cases, fetal movement may be weakened and lessened even as early as the fetal period. Other cases will also develop within a few months after birth, and die from respiratory failure within one year after the onset, and generally cannot survive beyond 2 years of age.
  • the clinical manifestations of patients with SMA-I include symmetrical muscle weakness, reduced major exercises, and inability to sit alone after 6 months; muscle relaxation, decreased or disappeared tendon reflexes; muscle atrophy, but it is difficult to be found due to excessive fat in infants; intercostal space Muscle paralysis; motor cranial nerve damage, etc.
  • Type SMA-II is also called intermediate type or chronic SMA.
  • Type SMA-IV is an adult-onset type of SMA, which is usually diagnosed between the ages of 20 and 40, especially in the 30s. Patients with type SMA-IV usually have 4 to 6 copies of the SMN2 gene, which can partially compensate for SMN1. The lack of SMN protein caused by homozygous deletion.
  • the clinical manifestations of SMA according to the present invention include, for example, muscle weakness, low muscle tone, muscle relaxation, muscle atrophy, muscle paralysis, scoliosis and curvature, as well as the resulting inconvenience, breathing problems, eating and swallowing Problems, developmental delays in big sports, etc.
  • SMA patients will also show abnormalities in electromyography, showing the phenomenon of innervation. These conditions can help the diagnosis and identification of SMA clinically, but due to the presence of other symptoms and similar diseases, molecular testing is usually needed to diagnose SMA clinically.
  • the detection method of the present invention uses the results of high-throughput sequencing, so it can contain molecular diagnostic information related to other genetic diseases, especially molecular diagnostic information related to other genetic diseases with similar clinical manifestations of SMA.
  • next generation sequencing is also called massively parallel sequencing and next generation sequencing (Next Generation Sequencing, NGS), which is characterized in that it can obtain multiple non-repetitive sequence reads from the same position in the genome, thereby improving the sequencing result data depth.
  • NGS Next Generation Sequencing
  • Next-generation sequencing is generally considered to include 454 sequencing using pyrosequencing and DNA polymerase, Solexa sequencing using sequencing by synthesis and DNA polymerase, SoLiD sequencing using ligase sequencing and DNA ligase, and semiconductor sequencing Ion Torrent sequencing of DNA polymerase and other sequencing methods.
  • the high-throughput sequencing method refers to a sequencing method capable of realizing deep sequencing, which can obtain multiple sequencing reads at specific sites in the genome of a sample.
  • reads refer to the sequences generated by each reaction in the high-throughput sequencing process, and the original sequencing data are formed by reading these sequences.
  • Contigs can be obtained by splicing overlapping reads, and this process is usually completed by sequencing splicing software. Through the analysis of the contigs, the overlapping parts can be further matched, and the order of the contigs in the genome can be determined. A longer scaffold composed of contigs whose order is known.
  • the high-throughput sequencing of the present invention is not limited to specific sequencing principles, methods, instruments and/or reagents.
  • the information in the reading module of the present invention is such raw data composed of reads.
  • the raw data composed of reads is obtained by the sequencing device included in the system of the present invention.
  • each base is assigned a quality value (Q) to describe the accuracy of the sequencing result.
  • Q quality value
  • the quality value of a certain base Q20 means that in the process of base calling, the error rate given by the recognition result of the base is 10 to the minus 2 power, that is, the error rate is 1% and the correct rate is 99%.
  • the quality value of Q30 means that the error rate is 0.1% and the correct rate is 99.9%, and so on. Therefore, the higher the quality value, the lower the probability of the base being sequenced incorrectly.
  • the meaning of Q20 being greater than or equal to 90% is that for a certain amount of sequencing data, the quality value of 90% of the base data can reach Q20 or better.
  • Q20 is greater than or equal to 90%, and Q30 is greater than or equal to 85%.
  • Average quality value refers to the overall average quality value in terms of base positions included in the entire genome.
  • the calculation module filters out read sequences with an average quality value of 20 or less, preferably filters read sequences with an average quality value of 25 or less, and more preferably filters out the average quality value. Sequence of reads below 30.
  • the calculation module preferably filters out reads with a mass value of less than 20 at position 840 of SMN1 exon 7 in the original data. This filtering means that reads whose sequencing accuracy is less than 99% for the 840th position of SMN1 exon 7 are removed.
  • the calculation module removes the repetitive sequence amplified by PCR before performing the calculation.
  • “coverage” refers to the proportion of the size of the genome sequence obtained after the sequencing results are assembled to the size of the entire genome. In sequencing, it is often impossible to obtain sequencing results covering 100% of the genome sequence. This is caused by the inherent composition of the genome and the insufficiency of sequencing methods. For example, the genome contains some complex structures such as high GC content regions and repetitive sequences.
  • depth or “sequencing depth” refers to the ratio of the total number of bases sequenced during sequencing to the size of the genome to be tested.
  • a sequencing depth of 10X means that the total amount of data obtained is ten times that of the entire genome, and each single base in the genome has been sequenced or read 10 times on average.
  • the depth of sequencing as a whole is 50x or more, preferably 60x or more, more preferably 70x or more, even more preferably 80x or more, even more preferably 90x or more, and even more preferably 100x or more.
  • depth also means that the number of reads at a specific site is included in the sequencing result. Therefore, in a specific embodiment, the sequencing depth of the single sample for sequencing is more than 80x.
  • the 10x coverage in the information obtained by sequencing described in the present invention is greater than 85%, which means that in terms of the overall sequence to be sequenced, 85% of the regions have at least 10x coverage.
  • the 10x coverage of the information obtained by the sequencing is greater than 85%, preferably greater than 90%, and more preferably greater than 95%.
  • the sequencing depth means the number of reads containing the 840 sites in the 7th exon region of the SMN1 gene of interest.
  • the high-throughput sequencing of the region containing the 7th exon of the SMN1 gene can be used to obtain deep sequencing data of the 7th exon region of the SMN1 gene.
  • Sequence reads data of repeated fragments the total number of reads in this area is generally not less than 10X, not less than 15X, not less than 20X, not less than 30X, not less than 40X, not less than 50X, not less than 60X, not less than 70X, not less than 80X, Not less than 90X, not less than 100X.
  • the sequencing of the region has an average Q20 ⁇ 90% and Q30 ⁇ 85%.
  • the homozygous deletion of exon 7 of SMN1 described herein can be embodied as including but not limited to three situations: 1) Homozygous deletion, that is, at the absolute coordinate position of the chromosome locus where the SMN1 gene is located, two SMN1 genes The copy number of exon 7 of the allele is missing; 2) Homozygous point mutation, the base C of c.840 on exon 7 of the two alleles of SMN1 gene is changed to T, which is equivalent to Exon 7 of both alleles of SMN1 are missing (ie homozygous deletion); 3) For deletion point mutation heterozygotes, in the absolute coordinate position of the chromosome locus where the SMN1 gene is located, one allele of the SMN1 gene The copy number of exon 7 is missing, and the base C at position c.840 on exon 7 of the other allele is changed to T. These three conditions are all regarded as homozygous deletion of exon 7 of SMN1,
  • the detection method herein also judges the result of the ratio R close to 0 as SMN1 homozygous deletion positive.
  • high-throughput sequencing has a certain probability of sequencing errors. For example, if the T error at position c.840 of the SMN1 gene is detected as C, it will cause R to be 0 in fact but based on the sequencing data. The result is not 0.
  • the probability of sequencing errors is currently low. For example, the probability of sequencing errors of Illumima's Hiseq series or NOVAseq series sequencers is about one in a thousand.
  • close to 0 may specifically be a value selected from the following group, for example, less than or equal to 0.05 or 5%, less than or equal to 0.03 or 3%, less than or equal to 0.02 or 2%, less than or equal to 0.01 or 1%, and less than Equal to 0.005 or 0.5%, less than or equal to 0.003 or 0.3%, less than or equal to 0.002 or 0.2%, or even less than or equal to 0.001 or 0.1%.
  • the case where C/C+T is close to 0 is defined as less than or equal to 0.1 or 10%, Less than or equal to 0.05 or 5%, less than or equal to 0.01 or 1%, less than or equal to 0.005 or 0.5%, less than or equal to 0.003 or 0.3%, less than or equal to 0.002 or 0.2%, even less than or equal to 0.001 or 0.1%, in these cases,
  • the test result was judged to be positive for SMN1 homozygous deletion. Therefore, the R value of the present invention close to 0 can also mean that it is a value less than or equal to the systematic error of the sequencing system.
  • the copy number is predicted by calculating the ratio instead of whether the absolute number of reads at c.840 of SMN1 is zero or not.
  • the advantage is that it avoids errors in the detection process, such as sequencing errors, etc.
  • the non-zero case of SMN1 is classified as a case where SMN1 is not a homozygous deletion, which leads to missed detection. For example, if the number of reads at c.840 of SMN1 is zero to determine whether it is a homozygous deletion, then in the case of a sequencing error rate of one in a thousand, c.840 of SMN1 A sequencing error will occur every 1000X sequenced.
  • the Panel sequencing containing the SMN1 gene or its 7th exon described herein refers to sequencing a combination of more than one gene containing the SMN1 gene or its 7th exon (ie Panel).
  • WES Whole Genome Sequencing
  • WES Whole Exom Sequencing
  • CES clinical exome sequencing
  • medical exome sequencing refers to the strategy of sequencing multiple known disease-causing genes.
  • the MLPA mentioned in this article refers to multiple ligation-dependent probe amplification (MLPA).
  • MLPA technology was first proposed by the Dutch scholar Dr. Schouten in 2002 to target the target sequence in the nucleic acid to be tested. Detection technology for qualitative and quantitative analysis. The principle is to use a simple probe to hybridize with the target sequence DNA, then ligate and amplify by PCR, separate the products by capillary electrophoresis and collect data, and finally use software to analyze the collected data to draw conclusions. It is a method to detect up to 50 kinds of nucleotide sequence copy number changes in the same reaction tube. This technology can identify deletions and insertions of dozens of genes or sites at the same time.
  • sampling device can be integrated together, or can be physically independent devices. When they are in a physically independent state, there is no limitation on the distance between these devices, as long as these devices can realize the functions they undertake in the system or method of the present invention.
  • the preparation of the genomic DNA of the subject sample for sequencing of the present invention can be carried out by methods and/or kits known to those skilled in the art.
  • the subject sample may be body fluids, cells, tissues, etc., preferably blood.
  • the method and system of the present invention can also be used for differential diagnosis.
  • "Differential diagnosis” in the context herein means to diagnose and determine which of the multiple diseases the subject is suffering from when multiple diseases have similar clinical manifestations.
  • the diseases that can be differentially diagnosed with SMA by the method and system of the present invention are those genetic diseases that can be diagnosed by gene sequencing, which have one or more similar clinical manifestations with SMA, including but not limited to Muscle weakness, low muscle tone, muscle relaxation, muscle atrophy, muscle paralysis, as well as the resulting inconvenience, breathing problems, eating and swallowing problems, developmental delays in grand motor development, scoliosis and bending, etc. .
  • the disease with a clinical manifestation similar to that of SMA is usually also a motor neuron disease, especially a lower extremity motor neuron disease.
  • a motor neuron disease especially a lower extremity motor neuron disease.
  • Specific examples include, but are not limited to, Becker type muscular dystrophy, Bethlem myopathy, Kleefstra syndrome, Merosin deficiency congenital muscular dystrophy, Ullrich type congenital muscular dystrophy, X-linked myotube myopathy, X-linked central Nuclear myopathy, YWHAE gene Miller-Dieker syndrome, congenital glycosylation type 1A (OMIM: 212065), congenital myasthenia syndrome type 4A, congenital myopathy (early onset, with cardiomyopathy), giant skull With subcortical spongiform sac with individual leukoencephalopathy type 1, autosomal dominant lower extremity spinal muscular atrophy, autosomal recessive muscle sclerosis, autosomal recessive inherited distal spinal muscular atrophy type 2 (
  • a commonly used NGS strategy for diagnosing genetic diseases—whole exome sequencing (WES) strategy, combined with the NGS data analysis method of the present invention, is used to detect whether the SMN1 gene has the first The homozygous deletion of seven exons (Example 1), and whether there are pathogenic mutations in other neuromuscular disease genes with similar phenotypes to SMA, is then used for differential diagnosis (Example 2). In the following, this method is collectively referred to as whole exome sequencing or WES.
  • samples of patients who were clinically diagnosed as SMA were also diagnosed with SMA.
  • MLPA was diagnosed using the "gold standard” method, and the diagnosis result of MLPA was used as a diagnostic reference, and compared with the result obtained by using the method and system of the present invention (Example 3).
  • This example relates to the use of the detection system of the present invention to detect SMN1 homozygous deletion in subjects who are clinically diagnosed as SMA.
  • the subjects of this embodiment are patients who were treated in the hospital from June 2015 to July 2018, and peripheral whole blood biological samples were obtained from these patients during the period of treatment.
  • the enrolled cases all have the characteristic phenotype of neuromuscular disease, and the sending physician provides a description of the clinical characteristics and various special examination results corresponding to each patient (private information such as the patient's name has been hidden).
  • written informed consent has been obtained from the patient or guardian and family members participating in the study before the start of the study.
  • the subjects of this example are as shown in Table 1. 240 subjects were clinically suspected of being SMA, of which 140 were males (58.3%) and 100 were females (41.7%). The vast majority of them were affected. The subjects are children.
  • Genomic DNA was obtained from a blood sample from a patient (the required concentration of DNA is greater than 50ng/ul, and the total amount is 1 ⁇ g).
  • the obtained genomic DNA was fragmented by ultrasound, and adapters (Illumina, San Diego, CA) were connected at both ends.
  • the index sequence of the sample is marked on the upper part, and the target sequence is captured by hybridization with a biotin-labeled probe after PCR amplification.
  • Use NimbleGen SeqCap EZ v2 Enrichment Kit (47Mbp) enrichment chip and SeqCap EZ Choice Kits (capture a custom region up to 7Mbp, including SMN1 and SMN2 genes) for DNA capture.
  • Sequencing was performed using the Illumina hiseq 2500 high-throughput sequencer. During the whole exome sequencing process, ensure that the single-sample sequencing depth (the total number of sequencing data bases/the length of the above-mentioned customized region) is more than 80x, the average sequencing Q20 ⁇ 90%, Q30 ⁇ 85%, and the PE+SE percentage ⁇ 95 %, and the coverage of 10x or more is ⁇ 95%. Data analysis uses a calling method to annotate variants.
  • SMN1 and SMN2 genes are highly similar homologous genes, with a total of 5 base differences, of which one is located in exon 7, one is located in exon 8, and the other three are located in introns. Exon 7 contains a stop codon, and exon 8 does not encode an amino acid. Therefore, there is only one base difference between the two coding regions, that is, the difference in exon 7.
  • the SMN1 gene chromosome coordinate chr5:70247773 (NM_000344.3:c.840) is C
  • the SMN2 gene chromosome coordinate position chr5:69372353 (NM_017411.3:c.840) is T.
  • the whole exome sequencing range does not include intron regions, so the algorithm of the present invention uses this single site for copy number calculation.
  • the short sequence alignment algorithm used comes from Burrows-Wheeler alignment software (aligner software), which is a mismatch-tolerant alignment algorithm, so the read sequences actually from SMN1 and SMN2 are randomly aligned and allocated Give these two genes (as shown in Figure 1).
  • the reads segment under SMN2 represents the number or depth of reads allocated to SMN2 by the algorithm
  • the reads segment under SMN1 represents the number or depth of reads allocated to SMN1 by the algorithm.
  • SMN2 actually contains the number or depth of reads that should be SMN2, which is called T2, and it is actually C but was incorrectly allocated to SMN2
  • the number or depth of reads on a gene is called C2.
  • C1 the number or depth of reads assigned to the SMN1 gene, which actually contains the number or depth of reads that are actually C, which is called C1, and the number or depth of reads that are actually T but are incorrectly assigned to the SMN1 gene. , Call it T1.
  • Filtering criteria include: filter out reads with an average quality value of less than 20 in the original data, remove PCR amplified repetitive sequences through samtools software, filter out reads with a base sequencing quality of less than Q20 at c.840, and finally get support for C and T's reads.
  • the algorithm sets C:(C+T) ⁇ 0.1 or C deduplication depth ⁇ 3 as the judgment threshold of SMN1 homozygous deletion (SMA positive), otherwise it is judged as no SMN1 Homozygous deletion (SMA negative).
  • the diagnostic data obtained according to the above method is shown in Table 1.
  • 122 subjects were diagnosed with SMN1 homozygous deletion.
  • the statistical software SPSS 16.0 is used for all the count data in the examples of this application, and the group t test method is used to test statistical significance, and p ⁇ 0.05 is defined as statistical significance.
  • the general method of detecting genetic diseases using all exons was used to analyze whether subjects (especially the 118 subjects identified as non-SMN1 homozygous deletions in Example 1) have other neuromuscular diseases similar to SMA phenotypes, To achieve differential diagnosis.
  • Raw data output statistics remove the linker contamination, filter out reads with an average quality value of less than 20, and filter out bases with a quality value of less than 20 from the end of the reads.
  • Mutation false positive filtering According to the sequencing depth and mutation quality, filter and screen the detected single nucleotide variants (SNV) and indels (Indel) to obtain high-quality and reliable mutations: mutation depth is at least 2x, mutation The mutation rate is> 10%, and the mutation quality value is> 20.
  • the described detection method of the present invention detected 56 subjects carrying SMN1 homozygous deletion mutations (see Table 2).
  • the inventors used the gold standard detection technology in the prior art to verify the multiple ligation dependent probe amplification (MLPA) on all subject samples, and compared the results with the approved method. The results obtained by the detection method of the present invention were compared.
  • MLPA multiple ligation dependent probe amplification
  • the MLPA kit uses the P060 product of the Dutch MRC-Holland company, containing 30 pairs of probes, which can specifically detect the copy number of the 7th and 8th exons of the SMN1 and SMN2 genes (among them, the 7th exon of SMN1 determines this gene Functional integrity, so its copy number is equal to the number of alleles); the four probes in the kit detect SMN1 or SMN2 gene sequence (Table 3), and other probes are used to detect other chromosomes as a reference .
  • the probe for specifically detecting exon 7 of SMN1 gene is located at the position of 183 nt, and the detected loss of heterozygosity indicates that SMA is carried.
  • the probe for specifically detecting SMN1 gene exon 8 is located at 218 nt, and 95% of the copy number changes of exon 7 can be detected (only the detection of SMN1 gene exon 8 deletion does not mean that SMA is carried).
  • the kit includes probes for detecting exon 7 (282nt) and exon 8 (301nt) of SMN2 gene and 17 pairs of internal control probes.
  • Hybridization Take 5 ⁇ l DNA (final concentration 30ng/ ⁇ l) into EP tube, denature at 98°C for 5min, cool to 25°C, add 1.50 ⁇ l multiple probe and 1.50 ⁇ l Buffer dropwise, denature at 95°C for 1min and hybridize at 60°C 16-24hrs.
  • connection drop 32 ⁇ l of the connection mixture, incubate at 54°C for 15min, and inactivate the ligase at 98°C for 5min.
  • Amplification Take 10 ⁇ l of the ligated product, add 4 ⁇ l PCR Buffer and 26 ⁇ l ddH 2 O, add 10 ⁇ l amplification reaction solution at 72° C. and start the PCR reaction.
  • the reaction conditions were denaturation at 95°C for 30s, annealing at 60°C for 30s, extension at 72°C for 1 min, a total of 35 cycles, and finally extension at 72°C for 20 min.
  • MLPA data analysis is performed as follows. Use Genemapper 3.0 program to analyze the results of capillary electrophoresis separation, and export the pattern and data. Divide the peak area of each target fragment by the sum of all internal reference peak areas, that is, the relative peak area (RPA) of the target fragment, and then the average RPA of the SMA group and the normal control group (that is, the average value of the 20 normal control RPAs) ) Is compared to obtain the copy number ratio, and then the copy number of the target fragment can be calculated.
  • RPA relative peak area
  • the copy number ratio ranges from 0.40-0.65 for 1 copy, 0.80-1.20 for 2 copies, and 1.30-1.65 for 3 Copy, 1.75-2.15 is 4 copies. If a fragment has no peak signal, it means that the fragment is missing. When the copy number ratio is close to the boundary of the fluctuation range, repeated verification is implemented to ensure that the results are accurate.
  • the method of the present invention can obtain a comprehensive diagnosis result, especially a differential diagnosis that MLPA cannot provide, and thus demonstrates a more powerful diagnosis capability than MLPA.
  • the subjects can be divided into four groups according to the final diagnosis results (as shown in Example 3, namely "diagnosed”, “misdiagnosis”, “causal unknown”, and “missed diagnosis”). Divide into groups and summarize the clinical characteristics of each group.
  • Misdiagnosed subjects All patients have the common feature of weakened limb muscles. The remaining clinical phenotypes are not unique to SMA, but can be attributed to the characteristic phenotypic spectrum of diseases associated with other pathogenic gene variants, such as pseudohypertrophic muscular dystrophy (also known as Duchenne muscular dystrophy or DMD) Patients have pseudo-hypertrophy and increased muscle tone, as well as facial deformities and developmental delays associated with other gene mutations.
  • pseudohypertrophic muscular dystrophy also known as Duchenne muscular dystrophy or DMD
  • Subjects with unknown etiology Most of them have weakened limb muscle strength, in addition to non-specific clinical features, including seizures, abnormal gait, developmental delay, increased or decreased limb muscle tone, and abnormal brain imaging findings.
  • Missed subjects infants showed low crying and dyspnea; all other patients showed weakened limb muscle strength, normal or decreased muscle tone, but no increase, and none of them had any obvious signs of brain imaging examination. abnormal.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Analytical Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • Medical Informatics (AREA)
  • Organic Chemistry (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

An apparatus, method and system for detecting SMN1 gene mutation, especially detecting the homozygous deletion of the seventh exon of SMN1 gene, by analyzing high-throughput sequencing results. The apparatus, method and system can be used to diagnose spinal muscular atrophy (SMA) or differentially diagnose SMA and other diseases that are easily confused with the phenotype of SMA.

Description

一种借助高通量测序检测SMN1基因突变的方法和***A method and system for detecting SMN1 gene mutation by means of high-throughput sequencing 技术领域Technical field
本发明属于基因检测和分析领域。具体而言,本发明涉及通过高通量测序和特殊的分析方法来检测SMN1基因突变,特别是检测SMN1基因第7外显子纯合缺失的装置、方法和***。本发明还涉及利用本发明的装置、方法和***来诊断脊髓性肌萎缩症(SMA)或鉴别性诊断SMA以及与SMA表型易混淆的其他疾病,以及其上储存有本发明的方法的机器可读介质和设备。The invention belongs to the field of gene detection and analysis. Specifically, the present invention relates to a device, method and system for detecting SMN1 gene mutations through high-throughput sequencing and special analysis methods, especially detecting the homozygous deletion of exon 7 of SMN1 gene. The present invention also relates to the use of the device, method and system of the present invention to diagnose spinal muscular atrophy (SMA) or differential diagnosis of SMA and other diseases that are easily confused with the SMA phenotype, and a machine on which the method of the present invention is stored Readable media and devices.
背景技术Background technique
脊髓性肌萎缩症(SMA;OMIM#253300)是脑干和脊髓前角运动神经元丧失所导致的神经肌肉疾患,其为一种常染色体隐性遗传病,绝大多数患者由运动神经元生存1(SMN1)基因的纯合缺失所导致。在中国人群中,携带SMA相关SMN1杂合缺失的概率约为1/42(Sheng-Yuan,Z.等,Molecular characterization of SMN copy number derived from carrier screening and from core families with SMA in a Chinese population.Eur J Hum Genet,2010.18(9):第978-84页)。据最新统计,中国台湾地区SMA致病基因携带率约1~3%,发病率约为1/17,000(Chien,Y.H.,et al.,Presymptomatic Diagnosis of Spinal Muscular Atrophy Through Newborn Screening.J Pediatr,2017),与中国大陆相似。流行病学调查发现,高达98%的SMA患者以第5号染色体5q13区的运动神经元生存基因1(SMN1)纯合缺失为遗传致病方式(Sangare,M.等,Genetics of low spinal muscular atrophy carrier frequency in sub-Saharan Africa.Ann Neurol,2014.75(4):第525-32页;Rad,I.A.,Mutation Spectrum of Survival Motor Neuron Gene in Spinal Muscular Atrophy.J Down Syndr Chr Abnorm,2017.3(1):第1-2页)。SMN1基因在脊髓前角运动神经元轴突生长,神经肌肉接头突触形成等多种生理过程中起到了至关重要的作用(杨兰、宋昉,脊髓性肌萎缩症的治疗研究进展,《中华儿科杂志》,2016.54(8):第634-637页)。由于此基因缺陷导致的编码蛋白缺失关联到了多种跨***疾患(Singh,R.N.等,Diverse role of survival motor neuron protein.Biochim Biophys Acta,2017.1860(3):第299-315页),同时也由于部分患者临床特征不典型,因而产生了临床鉴别诊断的需求。Spinal muscular atrophy (SMA; OMIM#253300) is a neuromuscular disease caused by the loss of motor neurons in the brain stem and anterior horn of the spinal cord. It is an autosomal recessive genetic disease. Most patients survive by motor neurons Caused by homozygous deletion of 1(SMN1) gene. In the Chinese population, the probability of carrying SMA-related SMN1 heterozygous deletion is about 1 in 42 (Sheng-Yuan, Z., etc., Molecular characterization of SMN copy number derived from carrier screening and from core families with SMA in a Chinese population.Eur J Hum Genet, 2010.18(9): Pages 978-84). According to the latest statistics, the carrier rate of SMA pathogenic genes in Taiwan, China is about 1 to 3%, and the incidence rate is about 1/17,000 (Chien, YH, et al., Presymptomatic Diagnosis of Spinal Molecular Atrophy Through Newborn Screening. J Pediatr, 2017) , Similar to mainland China. Epidemiological investigations have found that up to 98% of SMA patients have homozygous deletion of motor neuron survival gene 1 (SMN1) in the 5q13 region of chromosome 5 as the genetic pathogenesis (Sangare, M., etc., Genetics of low spinal muscular atrophy) carrier frequency in sub-Saharan Africa.Ann Neurol, 2014.75(4): pages 525-32; Rad, IA, Mutation Spectrum of Survival Motor Neuron Gene in Spinal Muscular Atrophy.J Down Syndr Chr Abnorm, 2017.3(1): 1-2 pages). The SMN1 gene plays a vital role in various physiological processes such as the growth of motor neuron axons in the anterior horn of the spinal cord and the formation of synapses in neuromuscular junctions (Yang Lan, Song Fang, Research Progress in the Treatment of Spinal Muscular Atrophy, " Chinese Journal of Pediatrics, 2016.54(8): Pages 634-637). The deletion of the encoded protein due to this gene defect is associated with a variety of cross-system diseases (Singh, RN, etc., Diverse role of survival motor neuron protein. Biochim Biophys Acta, 2017.1860(3): pages 299-315), and also due to some The clinical characteristics of patients are not typical, which creates the need for clinical differential diagnosis.
临床上依据SMA的特征可做出疑似诊断,具体的方法包括肌电图、肌肉活检组织化学染色以及血清肌酸磷酸激酶检测。但是鉴于SMA在临床表现上个体差异较大,并且与多种其他疾病相似,缺乏特征性,前述这些检查手段或者不适用于婴幼儿患者,或者对检测条件要求较高,因此在确诊与鉴别性诊断上还需依赖于特殊分子检测。自Steege与其同事在1995年将限制性片段长度多态性PCR(PCR-RFLP)技术用于SMA诊断以来,后续又出现了多重连接依赖探针扩增技术(MLPA)和实时荧光定量PCR(qPCR)方法学,能够在外显子水平识别出携带SMN1/2缺失/重复的现象(Arkblad,E.L.,et al.,Multiplex ligation-dependent probe amplification improves diagnostics in spinal muscular atrophy.Neuromuscul Disord,2006.16(12):第830-8页)。但这些检测技术的诊断病种单一,在应用于诊断时,这些检测的结果只能揭示受检对象所患疾病“是否为SMA”,但不能在结果为阴性时回答“若不是SMA,应为何种疾病”的问题,因而限制了在鉴别性诊断SMA及与之具有类似表型的疾病等方面的应用。Clinically, a suspected diagnosis can be made based on the characteristics of SMA. Specific methods include electromyography, muscle biopsy histochemical staining, and serum creatine phosphokinase detection. However, in view of the large individual differences in the clinical manifestations of SMA, similar to many other diseases, and lack of characteristics, the aforementioned examination methods are either not suitable for infants and young children, or require high testing conditions, so in the diagnosis and differentiation The diagnosis also depends on special molecular tests. Since Steege and his colleagues used restriction fragment length polymorphism PCR (PCR-RFLP) technology for the diagnosis of SMA in 1995, multiple ligation-dependent probe amplification (MLPA) and real-time fluorescent quantitative PCR (qPCR) have subsequently appeared. ) Methodology, able to identify the phenomena carrying SMN1/2 deletions/duplications at the exon level (Arkblad, EL, et al., Multiplex ligation-dependent probe amplification improvement diagnosis diagnosis in spinal muscle atrophy. Neuromuscul Disord, 2006.16(12): Page 830-8). However, the diagnosis of these detection technologies is single. When applied to diagnosis, the results of these tests can only reveal whether the subject’s disease is "SMA", but cannot answer "If it is not SMA, why should the test result be negative? The problem of “several diseases” has therefore limited its application in the differential diagnosis of SMA and diseases with similar phenotypes.
近年来,以全外显子组测序(Whole exom sequencing,WES)为代表的二代基因组测序技术(NGS)因其高通量和高性价比而被日益广泛地应用于遗传病诊断领域。由于WES的检测范围涵盖了人类基因组全部约两万个基因的编码区域,因而除能够检出致病性SMN1纯合缺失外,也可有助于在病因学上鉴别那些与SMA临床表型谱相近,而遗传病因不同的病例,从而在神经肌肉病的精准诊断上具备独特的优势。然而,人体中还存在SMN基因的另一个着丝粒拷贝,即SMN2,或称SMN着丝粒型。由于转录时遗漏外显子7,SMN2只编码非常少量的全长SMN蛋白和大量的SMN截短形式SMNΔ7。由于SMN1与SMN2基因高度同源,WES的常规数据分析方法难以鉴别两者,故被认为不适用于SMA的分子诊断。In recent years, the second-generation genome sequencing technology (NGS) represented by Whole Exom Sequencing (WES) has been increasingly widely used in the field of genetic disease diagnosis due to its high throughput and high cost performance. Because the detection range of WES covers all the coding regions of about 20,000 genes in the human genome, it can not only detect pathogenic SMN1 homozygous deletions, but also help to identify those with the clinical phenotype spectrum of SMA in etiology. Cases with similar but different genetic causes have unique advantages in accurate diagnosis of neuromuscular diseases. However, there is another centromere copy of the SMN gene in the human body, namely SMN2, or SMN centromere. Due to the omission of exon 7 during transcription, SMN2 only encodes a very small amount of full-length SMN protein and a large amount of SMN truncated form SMNΔ7. Because SMN1 and SMN2 genes are highly homologous, WES's conventional data analysis method is difficult to distinguish between the two, so it is considered not suitable for molecular diagnosis of SMA.
因此,本领域仍然需要一种性价比高、准确、全面的诊断方法,用于鉴定SMN1的外显子7的纯合缺失以诊断SMA,并且还能进一步鉴别SMA和与之表型类似的疾病。Therefore, there is still a need in the art for a cost-effective, accurate and comprehensive diagnostic method for identifying homozygous deletions of exon 7 of SMN1 to diagnose SMA, and to further distinguish SMA and diseases with similar phenotypes.
发明内容Summary of the invention
本发明提供了一种用于检测SMN1基因突变,特别是作为SMA最常见致病突变的SMN1纯合缺失突变的装置、方法和***,其借助高通量测序, 并使用特殊算法分析高通量测序结果,能够针对所述SMN1基因突变实现不亚于传统的“金标准”检测技术MLPA的高检出率及准确性。另外,本发明的装置、方法和***由于采用了高通量测序技术,其测序结果中还可包含与其它神经肌肉疾病相关的基因的信息,进而不仅能用于诊断SMA,还能对具有与SMA相似的临床特征的其它神经肌肉病实现鉴别性诊断。本发明的方法和***通过借助特殊算法,消除了本领域之前认为高通量测序如全外显子组测序不适于检测SMN1纯合缺失突变时面临的问题,由此完成了本发明。The present invention provides a device, method and system for detecting SMN1 gene mutations, especially SMN1 homozygous deletion mutations, which are the most common pathogenic mutations in SMA, which rely on high-throughput sequencing and use special algorithms to analyze high-throughput The sequencing result can achieve a high detection rate and accuracy of the SMN1 gene mutation that is no less than that of the traditional "gold standard" detection technology MLPA. In addition, due to the high-throughput sequencing technology used in the device, method, and system of the present invention, the sequencing results can also contain information about genes related to other neuromuscular diseases, and can not only be used to diagnose SMA, but also Differential diagnosis is achieved for other neuromuscular diseases with similar clinical features of SMA. The method and system of the present invention eliminate the problem that high-throughput sequencing, such as whole-exome sequencing, is not suitable for detecting SMN1 homozygous deletion mutations in the prior art through the use of special algorithms, thereby completing the present invention.
第一方面,本发明涉及一种用于检测受试者的SMN1基因中纯合突变的分析装置,其中所述分析装置包括:In the first aspect, the present invention relates to an analysis device for detecting homozygous mutations in the SMN1 gene of a subject, wherein the analysis device includes:
读取模块,其用于读取通过测序获得的信息,所述信息包括多个包含SMN1基因的外显子7第840位点的reads;A reading module for reading information obtained by sequencing, the information including a plurality of reads at position 840 of exon 7 of the SMN1 gene;
计算模块,所述计算模块计算(SMN1外显子7第840位点为碱基C的reads数)/(包含SMN1外显子7第840位点的总reads数)的比值,A calculation module that calculates the ratio of (the number of reads at base C at position 840 of exon 7 of SMN1)/(the total number of reads at position 840 of SMN1 exon 7),
判定模块,当所述比值等于0或接近于0时,判定所述受试者为存在SMN1外显子7的纯合缺失的阳性受试者,否则则判定所述受试者为不存在SMN1外显子7的纯合缺失阴性受试者。The determination module, when the ratio is equal to or close to 0, determines that the subject is a positive subject with a homozygous deletion of SMN1 exon 7, otherwise it is determined that the subject does not have SMN1 Subjects negative for homozygous deletion of exon 7.
在第一方面的实施方案中,所述分析装置中的计算模块在执行所述计算之前,过滤掉平均质量值20以下的reads序列,优选过滤掉评价质量值25以下的reads序列。In the implementation of the first aspect, before performing the calculation, the calculation module in the analysis device filters out read sequences with an average quality value of 20 or less, preferably filters read sequences with an evaluation quality value of 25 or less.
在第一方面的实施方案中,所述分析装置中的计算模块在执行所述计算之前,过滤掉SMN1外显子7第840位点的质量值小于10的reads。优选地,所述计算模块在执行所述计算之前,过滤掉SMN1外显子7第840位点的质量值小于20的reads。更优选地,所述计算模块在执行所述计算之前,过滤掉SMN1外显子7第840位点的质量值小于25的reads。In an implementation of the first aspect, the calculation module in the analysis device filters out reads with a mass value of less than 10 at position 840 of exon 7 of SMN1 before performing the calculation. Preferably, the calculation module filters out reads with a mass value of less than 20 at position 840 of exon 7 of SMN1 before performing the calculation. More preferably, the calculation module filters out reads with a mass value of less than 25 at position 840 of exon 7 of SMN1 before performing the calculation.
在第一方面的实施方案中,所述分析装置中的计算模块在执行所述计算之前,去除PCR扩增的重复序列。In an implementation of the first aspect, the calculation module in the analysis device removes the repetitive sequence amplified by PCR before performing the calculation.
在第一方面的实施方案中,所述接近于0的比值是小于0.1的比值。In an embodiment of the first aspect, the ratio close to 0 is a ratio less than 0.1.
在第一方面的实施方案中,所述的分析装置用于SMA的诊断。在进一步的实施方案中,所述SMA是在遗传上与SMN的突变相关的。在更进一步的实施方案中,所述SMA选自SMA-I型、SMA-II型、SMA-III型和SMA-IV型。进一步或可选地,所述分析装置用于SMA与其他同SMA具有类似 表型的疾病的鉴别性诊断。在优选的实施方案中,所述同SMA具有类似表型的疾病为神经肌肉疾病。In an embodiment of the first aspect, the analysis device is used for the diagnosis of SMA. In a further embodiment, the SMA is genetically related to a mutation of SMN. In a further embodiment, the SMA is selected from the group consisting of type SMA-I, type SMA-II, type SMA-III, and type SMA-IV. Further or alternatively, the analysis device is used for the differential diagnosis of SMA and other diseases with similar phenotypes to SMA. In a preferred embodiment, the disease having a similar phenotype to SMA is a neuromuscular disease.
第二方面,本发明涉及用于检测受试者的SMN1基因中纯合突变的***,其中所述***包括:In a second aspect, the present invention relates to a system for detecting homozygous mutations in the SMN1 gene of a subject, wherein the system includes:
测序装置,所述测序装置对多个扩增子进行测序,所述多个扩增子通过扩增来自受试者的样品中的核酸获得并且包含SMN1基因的外显子7第840位点,所述测序产生多个包含SMN1基因的外显子7第840位点的reads;和第一方面所述的分析装置。A sequencing device that sequenced a plurality of amplicons obtained by amplifying nucleic acids in a sample from a subject and containing the 840th position of exon 7 of the SMN1 gene, The sequencing generates a plurality of reads containing the 840th position of exon 7 of the SMN1 gene; and the analysis device described in the first aspect.
在一个实施方案中,本发明的测序是高通量测序。在优选的实施方案中,本发明使用的高通量测序选自下组:对SMN1基因的测序、对SMN1外显子7的测序、包含SMN1基因或其第7外显子的Panel测序、全基因组测序(Whole genome sequencing,WGS)、全外显子组测序(Whole exom sequencing,WES)或临床外显子组测序(Clinical exom sequencing,CES)。在更优选的实施方案中,本发明的高通量测序是全外显子组测序或临床外显子组测序。In one embodiment, the sequencing of the present invention is high-throughput sequencing. In a preferred embodiment, the high-throughput sequencing used in the present invention is selected from the group consisting of: sequencing of SMN1 gene, sequencing of SMN1 exon 7, sequencing of Panel containing SMN1 gene or its seventh exon, whole Genome Sequencing (Whole Genome Sequencing, WGS), Whole Exom Sequencing (WES), or Clinical Exom Sequencing (CES). In a more preferred embodiment, the high-throughput sequencing of the present invention is whole exome sequencing or clinical exome sequencing.
在一个实施方案中,本发明的分析装置读取的信息中SMN1外显子7第840位点的reads数为10至100万个。例如,本发明的分析装置读取的信息中的reads数为至少10个,至少50个,至少100个,至少1000个,至少10000个,至少10万个或至少100万个。In one embodiment, the number of reads at position 840 of SMN1 exon 7 in the information read by the analysis device of the present invention is 100,000 to 1 million. For example, the number of reads in the information read by the analysis device of the present invention is at least 10, at least 50, at least 100, at least 1,000, at least 10,000, at least 100,000, or at least 1 million.
在进一步的实施方案中,所述***还包括扩增装置,所述扩增装置对来自所述受试者的包含核酸的样本进行扩增以产生多个包含SMN1基因的外显子7第840位点的扩增子,将所述多个扩增子用于所述测序装置。In a further embodiment, the system further includes an amplification device that amplifies a nucleic acid-containing sample from the subject to generate a plurality of exons containing the SMN1 gene. The amplicons of the site, the multiple amplicons are used in the sequencing device.
在一个实施方案中,本发明的分析装置或***用于诊断脊髓性肌萎缩症(SMA)。在进一步的实施方案中,所述SMA是在遗传上与SMN的突变相关的。在更进一步的实施方案中,所述SMA选自SMA-I型、SMA-II型、SMA-III型和SMA-IV型。进一步或可选地,所述分析装置或***用于SMA与其他同SMA具有类似表型的疾病的鉴别性诊断。在优选的实施方案中,所述同SMA具有类似表型的疾病为神经肌肉疾病。In one embodiment, the analysis device or system of the present invention is used to diagnose spinal muscular atrophy (SMA). In a further embodiment, the SMA is genetically related to a mutation of SMN. In a further embodiment, the SMA is selected from the group consisting of type SMA-I, type SMA-II, type SMA-III, and type SMA-IV. Further or alternatively, the analysis device or system is used for the differential diagnosis of SMA and other diseases with similar phenotypes to SMA. In a preferred embodiment, the disease having a similar phenotype to SMA is a neuromuscular disease.
在一个实施方案中,本发明的分析装置或***用于鉴别性诊断SMA和与SMA表型相类似的疾病。在具体的实施方案中,所述与SMA表型相类似的疾病选自:Becker型肌营养不良症、Bethlem肌病、Kleefstra综合征、Merosin缺乏性先天性肌肉萎缩症、Ullrich型先天性肌营养不良、X连锁肌 管性肌病、X连锁中央核肌病、YWHAE基因Miller-Dieker综合征、先天性糖基化病1A型(OMIM:212065)、先天性肌无力综合征4A型、先天性肌病(早发,伴心肌病)、巨颅伴皮层下海绵样囊中个性脑白质病1型、常染色体显性下肢遗传脊髓性肌萎缩症、常染色体隐性肌硬化症、常染色体隐性遗传远端型脊髓型肌萎缩2型(OMIM:605726)、杜氏肌营养不良/进行性假肥大性肌营养不良症、肌萎缩厕所硬化症(ASL)16型(OMIM:614373)、肢带型肌营养不良症2J型(OMIM:608807)、胼胝体发育不全伴周围神经病变、遗传性肌病伴早起呼吸衰竭和遗传性运动感觉性神经病VI型。In one embodiment, the analysis device or system of the present invention is used to differentially diagnose SMA and diseases similar to the SMA phenotype. In a specific embodiment, the disease similar to the SMA phenotype is selected from: Becker type muscular dystrophy, Bethlem myopathy, Kleefstra syndrome, Merosin deficiency congenital muscular dystrophy, Ullrich type congenital muscular nutrition Poor, X-linked myotube myopathy, X-linked central nucleus myopathy, YWHAE gene Miller-Dieker syndrome, congenital glycosylation type 1A (OMIM: 212065), congenital myasthenia syndrome type 4A, congenital Myopathy (early onset, with cardiomyopathy), giant skull with subcortical spongiform sac with individual leukoencephalopathy type 1, autosomal dominant lower extremity spinal muscular atrophy, autosomal recessive muscular sclerosis, autosomal recessive Genetic Distal Spinal Cord Muscular Atrophy Type 2 (OMIM:605726), Duchenne Muscular Dystrophy/Progressive Duchenne Muscular Dystrophy, Amyotrophic Toilet Sclerosis (ASL) Type 16 (OMIM:614373), Limb Girdle Muscular dystrophy type 2J (OMIM:608807), corpus callosum hypoplasia with peripheral neuropathy, hereditary myopathy with early-onset respiratory failure, and hereditary motor sensory neuropathy type VI.
在一个实施方案中,本发明所述分析装置中的模块之间可以通过有线连接和无线连接。In one embodiment, the modules in the analysis device of the present invention may be connected via wired and wireless connections.
第三方面,本发明涉及一种用于检测受试者的SMN1基因中纯合突变的方法,所述方法包括:In a third aspect, the present invention relates to a method for detecting a homozygous mutation in the SMN1 gene of a subject, the method comprising:
(1)读取来自测序的信息,所述信息包括多个包含SMN1基因的外显子7第840位点的reads;(1) Read the information from sequencing, the information includes multiple reads containing the 840th position of exon 7 of the SMN1 gene;
(2)计算(SMN1外显子7第840位点为碱基C的reads数)/(包含SMN1外显子7第840位点的总reads数)的比值;和(2) Calculate the ratio of (the number of reads with base C at position 840 of exon 7 of SMN1)/(the total number of reads including the number of reads at position 840 of SMN1 exon 7); and
(3)当所述比值等于0或接近于0时,判定所述受试者为存在SMN1外显子7的纯合缺失的阳性受试者,否则将所述受试者判定为不存在SMN1外显子7的纯合缺失阴性受试者。(3) When the ratio is equal to or close to 0, the subject is determined to be a positive subject with a homozygous deletion of SMN1 exon 7, otherwise the subject is determined to be absent of SMN1 Subjects negative for homozygous deletion of exon 7.
第四方面,本发明涉及一种机器可读的介质,其包含机器可读代码,所述代码在由机器实施时执行如下操作以检测受试者的SMN1基因中纯合突变的存在:In a fourth aspect, the present invention relates to a machine-readable medium, which contains a machine-readable code that, when implemented by a machine, performs the following operations to detect the presence of a homozygous mutation in the SMN1 gene of a subject:
(1)读取来自测序的信息,所述信息包括多个包含SMN1基因的外显子7第840位点的reads;(1) Read the information from sequencing, the information includes multiple reads containing the 840th position of exon 7 of the SMN1 gene;
(2)计算(SMN1外显子7第840位点为碱基C的reads数)/(包含SMN1外显子7第840位点的总reads数)的比值;和(2) Calculate the ratio of (the number of reads with base C at position 840 of exon 7 of SMN1)/(the total number of reads including the number of reads at position 840 of SMN1 exon 7); and
(3)当所述比值等于0或接近于0时,判定所述受试者为存在SMN1外显子7的纯合缺失的阳性受试者,否则将所述受试者判定为不存在SMN1外显子7的纯合缺失阴性受试者。(3) When the ratio is equal to or close to 0, the subject is determined to be a positive subject with a homozygous deletion of SMN1 exon 7, otherwise the subject is determined to be absent of SMN1 Subjects negative for homozygous deletion of exon 7.
在第三和第四方面的实施方案中,在执行第(2)步的计算之前,过滤掉SMN1外显子7第840位点的质量值小于10的reads。优选地,在执行所述 计算之前,过滤掉SMN1外显子7第840位点的质量值小于20的reads。更优选地,在执行所述计算之前,过滤掉SMN1外显子7第840位点的质量值小于25的reads。In the embodiments of the third and fourth aspects, before performing the calculation in step (2), reads with a mass value of less than 10 at position 840 of exon 7 of SMN1 are filtered out. Preferably, before performing the calculation, the reads with a mass value of less than 20 at position 840 of exon 7 of SMN1 are filtered out. More preferably, before performing the calculation, filter out reads with a mass value of less than 25 at position 840 of exon 7 of SMN1.
在第三和第四方面的实施方案中,在执行第(2)步的计算之前,去除PCR扩增的重复序列。In the embodiments of the third and fourth aspects, the repetitive sequence amplified by PCR is removed before the calculation in step (2) is performed.
在第三和第四方面的实施方案中,所述接近于0的比值是小于0.1的比值。In embodiments of the third and fourth aspects, the ratio close to 0 is a ratio less than 0.1.
在第三和第四方面的实施方案中,所述方法或所述机器可读的介质用于SMA的诊断。在进一步的实施方案中,所述SMA是在遗传上与SMN的突变相关的。在更进一步的实施方案中,所述SMA选自SMA-I型、SMA-II型、SMA-III型和SMA-IV型。进一步或可选地,所述方法或所述机器可读的介质用于SMA与其他同SMA具有类似表型的疾病的鉴别性诊断。在优选的实施方案中,所述同SMA具有类似表型的疾病为神经肌肉疾病。在具体的实施方案中,所述与SMA具有类似表型的疾病选自:Becker型肌营养不良症、Bethlem肌病、Kleefstra综合征、Merosin缺乏性先天性肌肉萎缩症、Ullrich型先天性肌营养不良、X连锁肌管性肌病、X连锁中央核肌病、YWHAE基因Miller-Dieker综合征、先天性糖基化病1A型(OMIM:212065)、先天性肌无力综合征4A型、先天性肌病(早发,伴心肌病)、巨颅伴皮层下海绵样囊中个性脑白质病1型、常染色体显性下肢遗传脊髓性肌萎缩症、常染色体隐性肌硬化症、常染色体隐性遗传远端型脊髓型肌萎缩2型(OMIM:605726)、杜氏肌营养不良/进行性假肥大性肌营养不良症、肌萎缩厕所硬化症(ASL)16型(OMIM:614373)、肢带型肌营养不良症2J型(OMIM:608807)、胼胝体发育不全伴周围神经病变、遗传性肌病伴早起呼吸衰竭和遗传性运动感觉性神经病VI型。In embodiments of the third and fourth aspects, the method or the machine-readable medium is used for the diagnosis of SMA. In a further embodiment, the SMA is genetically related to a mutation of SMN. In a further embodiment, the SMA is selected from the group consisting of type SMA-I, type SMA-II, type SMA-III, and type SMA-IV. Further or alternatively, the method or the machine-readable medium is used for the differential diagnosis of SMA and other diseases with similar phenotypes to SMA. In a preferred embodiment, the disease having a similar phenotype to SMA is a neuromuscular disease. In a specific embodiment, the disease having a similar phenotype to SMA is selected from: Becker type muscular dystrophy, Bethlem myopathy, Kleefstra syndrome, Merosin deficiency congenital muscular dystrophy, Ullrich type congenital muscular nutrition Poor, X-linked myotube myopathy, X-linked central nucleus myopathy, YWHAE gene Miller-Dieker syndrome, congenital glycosylation type 1A (OMIM: 212065), congenital myasthenia syndrome type 4A, congenital Myopathy (early onset, with cardiomyopathy), giant skull with subcortical spongiform sac with individual leukoencephalopathy type 1, autosomal dominant lower extremity spinal muscular atrophy, autosomal recessive muscular sclerosis, autosomal recessive Genetic Distal Spinal Cord Muscular Atrophy Type 2 (OMIM:605726), Duchenne Muscular Dystrophy/Progressive Duchenne Muscular Dystrophy, Amyotrophic Toilet Sclerosis (ASL) Type 16 (OMIM:614373), Limb Girdle Muscular dystrophy type 2J (OMIM:608807), corpus callosum hypoplasia with peripheral neuropathy, hereditary myopathy with early-onset respiratory failure, and hereditary motor sensory neuropathy type VI.
在第三和第四方面的实施方案中,所述来自测序的信息中SMN1外显子7第840位点的reads数为10至100万个,例如至少10个,至少50个,至少100个,至少1000个,至少10000个,至少10万个或至少100万个。所述测序是高通量测序。在优选的实施方案中,本发明使用的高通量测序选自下组:对SMN1基因的测序、对SMN1外显子7的测序、包含SMN1基因或其第7外显子的Panel测序、全基因组测序(WGS)、全外显子组测序(WES)或临床外显子组测序(CES)。在更优选的实施方案中,本发明的高通量测序 是全外显子组测序或临床外显子组测序。In embodiments of the third and fourth aspects, the number of reads at position 840 of SMN1 exon 7 in the information from the sequencing is 100,000 to 1 million, for example, at least 10, at least 50, or at least 100 reads. , At least 1,000, at least 10,000, at least 100,000, or at least 1 million. The sequencing is high-throughput sequencing. In a preferred embodiment, the high-throughput sequencing used in the present invention is selected from the group consisting of: sequencing of SMN1 gene, sequencing of SMN1 exon 7, sequencing of Panel containing SMN1 gene or its seventh exon, whole Genome Sequencing (WGS), Whole Exome Sequencing (WES) or Clinical Exome Sequencing (CES). In a more preferred embodiment, the high-throughput sequencing of the present invention is whole exome sequencing or clinical exome sequencing.
第五方面,本发明涉及一种设备,其包含本发明第四方面所述的机器可读的介质。In the fifth aspect, the present invention relates to a device that includes the machine-readable medium according to the fourth aspect of the present invention.
第六方面,本发明涉及本发明的第一方面所述的分析装置、第二方面所述的***、第四方面的介质和第五方面的设备用于诊断SMA的用途或鉴别性诊断SMA与其他同SMA具有类似表型的疾病的用途。In the sixth aspect, the present invention relates to the analysis device described in the first aspect of the present invention, the system described in the second aspect, the medium in the fourth aspect, and the equipment in the fifth aspect of the present invention for use in diagnosing SMA or differentially diagnosing SMA and Other uses for diseases with similar phenotypes to SMA.
第七方面,本发明涉及诊断SMA的方法,包括:(1)获得受试者的测序信息,所述测序信息包括多个包含SMN1基因的外显子7第840位点的reads;(2)计算(SMN1外显子7第840位点为碱基C的reads数)/(包含SMN1外显子7第840位点的总reads数)的比值,当所述比值等于0或接近于0时,判定所述受试者为存在SMN1外显子7的纯合缺失的阳性受试者,并诊断为患有SMA。在一个实施方案中,所述受试者的测序信息来自高通量测序,优选地所述高通量测序选自下组:对SMN1基因的测序、对SMN1外显子7的测序、包含SMN1基因或其第7外显子的Panel测序、全基因组测序、全外显子组测序或临床外显子组测序。在一个实施方案中,所述SMA在遗传上与SMN的突变相关。在更进一步的实施方案中,所述SMA选自SMA-I型、SMA-II型、SMA-III型和SMA-IV型。In a seventh aspect, the present invention relates to a method for diagnosing SMA, which includes: (1) obtaining sequencing information of a subject, the sequencing information including a plurality of reads at position 840 of exon 7 of the SMN1 gene; (2) Calculate the ratio of (the 840th position of SMN1 exon 7 is the number of reads at base C)/(including the total number of reads at the 840th position of SMN1 exon 7), when the ratio is equal to or close to 0 , The subject was determined to be a positive subject with a homozygous deletion of SMN1 exon 7 and was diagnosed with SMA. In one embodiment, the sequencing information of the subject comes from high-throughput sequencing. Preferably, the high-throughput sequencing is selected from the group consisting of: sequencing of SMN1 gene, sequencing of SMN1 exon 7, including SMN1 Panel sequencing, whole genome sequencing, whole exome sequencing or clinical exome sequencing of genes or their 7th exon. In one embodiment, the SMA is genetically related to a mutation of SMN. In a further embodiment, the SMA is selected from the group consisting of type SMA-I, type SMA-II, type SMA-III, and type SMA-IV.
在一个实施方案中,所述方法用于鉴别性诊断SMA与其他同SMA具有类似表型的疾病。在优选的实施方案中,所述同SMA具有类似表型的疾病为神经肌肉疾病。In one embodiment, the method is used to differentially diagnose SMA from other diseases that have a similar phenotype to SMA. In a preferred embodiment, the disease having a similar phenotype to SMA is a neuromuscular disease.
在一个实施方案中,在进行步骤(2)的所述计算之前,过滤掉SMN1外显子7第840位点的质量值小于10的reads。优选地,在进行步骤(2)的所述计算之前,过滤掉SMN1外显子7第840位点的质量值小于20的reads,更优选过滤掉SMN1外显子7第840位点的质量值小于25的reads。In one embodiment, before the calculation in step (2), reads with a mass value of less than 10 at position 840 of exon 7 of SMN1 are filtered out. Preferably, before the calculation in step (2), reads with a mass value of less than 20 at position 840 of SMN1 exon 7 are filtered out, and more preferably, the quality value at position 840 of SMN1 exon 7 is filtered out Less than 25 reads.
在一个实施方案中,在进行步骤(2)的所述计算之前,去除PCR扩增的重复序列。In one embodiment, before the calculation of step (2), the repetitive sequence amplified by PCR is removed.
在一个实施方案中,所述接近于0的比值是小于0.1的比值。In one embodiment, the ratio close to 0 is a ratio less than 0.1.
附图说明Description of the drawings
图1示例性显示了将样本经历基因组测序之后,通过比对算法Burrows-Wheeler将获得的reads序列随机分配到SMN1和SMN2的处理方式。Figure 1 exemplarily shows the processing method of randomly assigning the obtained read sequences to SMN1 and SMN2 through the comparison algorithm Burrows-Wheeler after the sample is subjected to genome sequencing.
图2的表格列出了将本发明用于鉴别性诊断时,在通过本发明诊断为非SMN1纯合缺失的受试者中,为21名受试者进行了进一步诊断,并确诊为为其他神经肌肉疾病。图2中列出了所述21名受试者的年龄、突变类型和诊断出的疾病。The table in Figure 2 lists that when the present invention is used for differential diagnosis, among the subjects diagnosed as non-SMN1 homozygous deletion by the present invention, 21 subjects were further diagnosed and diagnosed as other subjects. Neuromuscular disease. Figure 2 lists the ages, mutation types, and diagnosed diseases of the 21 subjects.
发明详述Detailed description of the invention
除非另有说明,否则本文公开的一些方法的实践采用免疫学,生物化学,化学,分子生物学,微生物学,细胞生物学,基因组学和重组DNA的常规技术,这些技术在本领域的技术范围内。参见例如Sambrook和Green,Molecular Cloning:A Laboratory Manual,4th Edition(2012);系列分子生物学(F.M.Ausubel,et al.eds.);系列方法在酶学(Academic Press,Inc。),PCR 2:A Practical Approach(M.J.Machersrs,B.D.Hames and G.R.Taylor eds.(1995)),Harlow and Lane,eds.(1988)Antibodies,A Laboratory Manual,and Culture of Animal Cells:A Manual of Basic Technique and Specialized Applications,6th Edition(R.I.Breshney,ed.(2010))。Unless otherwise specified, the practice of some of the methods disclosed herein uses conventional techniques of immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics, and recombinant DNA, which are within the technical scope of the art. Inside. See, for example, Sambrook and Green, Molecular Cloning: A Laboratory Manual, 4th Edition (2012); series of molecular biology (FMAusubel, et al. eds.); series of methods in Enzymology (Academic Press, Inc.), PCR 2: A Practical Approach (MJMachersrs, BDHames and GRTaylor eds. (1995)), Harlow and Lane, eds. (1988) Antibodies, A Laboratory Manual, and Culture of Animal Cells: A Manual of Basic Techniques and 6 Specialized Applications Edition (RIBreshney, ed. (2010)).
术语“约”或“近似”意指在本领域普通技术人员确定的特定值的可接受误差范围内,这将部分取决于如何测量或确定该值,即,测量***的局限性。例如,根据本领域的实践,“约”可以表示在1或大于1的标准偏差内。或者,“约”可表示给定值的最多20%,最多10%,最多5%或最多1%的范围。或者,特别是对于生物***或过程,该术语可以表示数值的一个数量级,优选地在5倍内,更优选地在2倍内。在申请和权利要求中描述特定值的情况下,除非另有说明,否则应当假定术语“约”意味着在特定值的可接受误差范围内。The term "about" or "approximately" means within the acceptable error range of a particular value determined by a person of ordinary skill in the art, which will depend in part on how the value is measured or determined, that is, the limitations of the measurement system. For example, according to the practice in the art, "about" can mean within 1 or more than 1 standard deviation. Alternatively, "about" can mean a range of up to 20%, up to 10%, up to 5%, or up to 1% of a given value. Or, particularly for biological systems or processes, the term may represent an order of magnitude of the value, preferably within 5 times, more preferably within 2 times. Where specific values are described in the application and claims, unless otherwise stated, it should be assumed that the term "about" means within the acceptable error range of the specific value.
术语“受试者”、“个体”或“患者”在本发明的上下文中可以互换使用,指脊椎动物,优选哺乳动物,例如啮齿类动物、灵长类动物,更优选人。The terms "subject", "individual" or "patient" are used interchangeably in the context of the present invention and refer to vertebrates, preferably mammals, such as rodents, primates, and more preferably humans.
如本文所用的术语“基因”是指核酸(例如DNA,例如基因组DNA和cDNA)及其相应的编码RNA转录物的核苷酸序列。如本文所用,关于基因组DNA的术语包括***的非编码区以及调节区,并且可包括5'和3'末端。在一些用途中,该术语包括转录序列,包括5'和3'非翻译区(5'-UTR和3'-UTR),外显子和内含子。在一些基因中,转录区域将包含编码多肽的“开放阅读框”。在该术语的一些用途中,“基因”仅包含编码多肽所必需的编码序 列(例如,“开放阅读框”或“编码区”)。在一些情况下,基因不编码多肽,例如核糖体RNA基因(rRNA)和转移RNA(tRNA)基因。在一些情况下,术语“基因”不仅包括转录序列,而且还包括非转录区域,包括上游和下游调节区,增强子和启动子。基因可以指生物基因组中其天然位置中的“内源基因”或天然基因。基因可以指“外源基因”或非天然基因。非天然基因可以指通常不在宿主生物体中发现但通过基因转移引入宿主生物体的基因。非天然基因也可以指不在生物体基因组中的天然位置的基因。非天然基因还可以指天然存在的核酸或多肽序列,其包含突变,***和/或缺失(例如,非天然序列)。The term "gene" as used herein refers to nucleic acids (e.g., DNA, e.g., genomic DNA and cDNA) and their corresponding nucleotide sequences encoding RNA transcripts. As used herein, the term referring to genomic DNA includes inserted non-coding regions and regulatory regions, and may include 5'and 3'ends. In some uses, the term includes transcribed sequences, including 5'and 3'untranslated regions (5'-UTR and 3'-UTR), exons and introns. In some genes, the transcribed region will contain an "open reading frame" encoding the polypeptide. In some uses of the term, a "gene" contains only the coding sequence necessary to encode a polypeptide (e.g., an "open reading frame" or "coding region"). In some cases, genes do not encode polypeptides, such as ribosomal RNA genes (rRNA) and transfer RNA (tRNA) genes. In some cases, the term "gene" includes not only transcribed sequences, but also non-transcribed regions, including upstream and downstream regulatory regions, enhancers and promoters. A gene can refer to an "endogenous gene" or a natural gene in its natural location in the genome of an organism. Genes can refer to "foreign genes" or non-natural genes. Non-native genes may refer to genes that are not normally found in the host organism but are introduced into the host organism by gene transfer. Non-natural genes can also refer to genes that are not in their natural locations in the genome of an organism. Non-natural genes can also refer to naturally occurring nucleic acid or polypeptide sequences that contain mutations, insertions, and/or deletions (e.g., non-natural sequences).
如本文所用的术语“核苷酸”通常是指碱-糖-磷酸盐组合。核苷酸可包含合成核苷酸。核苷酸可包含合成的核苷酸类似物。核苷酸可以是核酸序列的单体单元(例如脱氧核糖核酸(DNA)和核糖核酸(RNA))。术语核苷酸可包括核糖核苷三磷酸腺苷三磷酸(ATP),尿苷三磷酸(UTP),三磷酸胞嘧啶(CTP),三磷酸鸟苷(GTP)和脱氧核糖核苷三磷酸如dATP,dCTP,dITP,dUTP,dGTP,dTTP或其衍生物。这些衍生物可包括,例如,[αS]dATP,7-脱氮-dGTP和7-脱氮-dATP,以及赋予含有它们的核酸分子核酸酶抗性的核苷酸衍生物。本文使用的术语核苷酸可以指双脱氧核糖核苷三磷酸(ddNTP)及其衍生物。双脱氧核糖核苷三磷酸的说明性实例可包括但不限于ddATP,ddCTP,ddGTP,ddITP和ddTTP。核苷酸可以通过众所周知的技术进行未标记或可检测标记。标记也可以用量子点进行。可检测标记可包括例如放射性同位素,荧光标记,化学发光标记,生物发光标记和酶标记。The term "nucleotide" as used herein generally refers to a base-sugar-phosphate combination. Nucleotides may include synthetic nucleotides. Nucleotides may include synthetic nucleotide analogs. Nucleotides may be monomeric units of nucleic acid sequences (e.g., deoxyribonucleic acid (DNA) and ribonucleic acid (RNA)). The term nucleotide may include ribonucleoside adenosine triphosphate (ATP), uridine triphosphate (UTP), cytosine triphosphate (CTP), guanosine triphosphate (GTP) and deoxyribonucleoside triphosphates such as dATP, dCTP , DITP, dUTP, dGTP, dTTP or derivatives thereof. These derivatives may include, for example, [αS]dATP, 7-deaza-dGTP and 7-deaza-dATP, and nucleotide derivatives that confer nuclease resistance to nucleic acid molecules containing them. The term nucleotide as used herein may refer to dideoxyribonucleoside triphosphate (ddNTP) and its derivatives. Illustrative examples of dideoxyribonucleoside triphosphates may include, but are not limited to, ddATP, ddCTP, ddGTP, ddITP, and ddTTP. Nucleotides can be unlabeled or detectably labeled by well-known techniques. Marking can also be done with quantum dots. Detectable labels can include, for example, radioisotopes, fluorescent labels, chemiluminescent labels, bioluminescent labels, and enzyme labels.
术语“多核苷酸”,“寡核苷酸”和“核酸”可互换使用,指任何长度的聚合形式的核苷酸,脱氧核糖核苷酸或核糖核苷酸,或其类似物,可以是单链、双链或多链形式。多核苷酸对细胞可以是外源的或内源的。多核苷酸可以存在于无细胞环境中。多核苷酸可以是其基因或片段。多核苷酸可以是DNA。多核苷酸可以是RNA。多核苷酸可以具有任何三维结构,并且可以执行已知或未知的任何功能。多核苷酸可包含一种或多种类似物(例如改变的主链,糖或核碱基)。The terms "polynucleotide", "oligonucleotide" and "nucleic acid" are used interchangeably and refer to nucleotides, deoxyribonucleotides or ribonucleotides, or their analogues, of any length in polymerized form. It is in single-stranded, double-stranded or multi-stranded form. The polynucleotide can be exogenous or endogenous to the cell. The polynucleotide may exist in a cell-free environment. The polynucleotide can be a gene or a fragment thereof. The polynucleotide may be DNA. The polynucleotide may be RNA. A polynucleotide can have any three-dimensional structure and can perform any function, known or unknown. A polynucleotide may contain one or more analogs (e.g., altered backbone, sugar or nucleobases).
本文所述“脊肌萎缩症”,或称脊髓性肌萎缩症(SMA),其主要的致病基因为SMN1。基于目前医学遗传学的共识,SMA主要与两个高度同源的基因密切相关,SMN1与SMN2,这两个基因主要通过7号外显子和8号外显子上的两个基因位点进行区分(Qu YJ et al,PMID:27425821,J Mol Diagn,2016)。 SMN1和SMN2有五个碱基对的差异,但是这五个碱基差异都不会导致氨基酸序列的变化,但是由于其中在SMN2外显子7中的一个C到T的变化影响了外显子的剪接,导致从SMN2获得的全长转录物大幅减少。大部分正常个体都有2份拷贝的SMN1基因与2份拷贝的SMN2基因。SMN2基因由于发生了外显子7的跳跃,只能产生少量的全长SMN mRNA,导致其补偿作用有限。因此,如果某个个体两份拷贝的SMN1基因都失去功能,也就是出现SMN1的纯合缺失时,该个体必然会患上SMA疾病。与之相对,杂合型缺失SMN1的个体同尚有一份正常的SMN1等位基因,因此其不会展现出SMA疾病的表型,将这样的个体称为SMN1缺失突变的“携带者”。As used herein, "spinal muscular atrophy", or spinal muscular atrophy (SMA), has the main pathogenic gene SMN1. Based on the current consensus of medical genetics, SMA is mainly closely related to two highly homologous genes, SMN1 and SMN2. These two genes are mainly distinguished by two gene loci on exon 7 and exon 8 ( Qu YJ et al, PMID: 27425821, J Mol Diagn, 2016). There are five base pair differences between SMN1 and SMN2, but these five base differences will not cause changes in the amino acid sequence, but because a C to T change in exon 7 of SMN2 affects the exon Splicing, resulting in a substantial reduction in the full-length transcripts obtained from SMN2. Most normal individuals have 2 copies of SMN1 gene and 2 copies of SMN2 gene. Due to the skipping of exon 7 of the SMN2 gene, only a small amount of full-length SMN mRNA can be produced, resulting in limited compensation. Therefore, if an individual loses function in both copies of SMN1 gene, that is, when a homozygous deletion of SMN1 occurs, the individual will inevitably suffer from SMA disease. In contrast, individuals with heterozygous deletion of SMN1 also have a normal SMN1 allele, so they will not exhibit the phenotype of SMA disease. Such individuals are called "carriers" of SMN1 deletion mutations.
本文所述SMN1基因,与同源基因SMN2之间的差别之一在于c.840位点的碱基差别(SMN1为C,SMN2为T),该位点处于外显子7中,因此也同时体现为外显子7的差别(SMN1的外显子7是以c.840位点是碱基C为特征,而SMN2的外显子7是以c.840位点是碱基T为特征)。若SMN1基因的外显子7上的c.840位点的碱基C变异为T,则使得SMN1的基因序列变为SMN2的序列,故这种点突变的变异可理解为等同于不存在SMN1的外显子7的正常拷贝。One of the differences between the SMN1 gene described herein and the homologous gene SMN2 is the base difference at c.840 (SMN1 is C, SMN2 is T), which is in exon 7, so it is also at the same time Reflected in the difference of exon 7 (the exon 7 of SMN1 is characterized by the base C at c.840, and the exon 7 of SMN2 is characterized by the base T at c.840) . If the base C of c.840 on exon 7 of the SMN1 gene is changed to T, the gene sequence of SMN1 becomes the sequence of SMN2, so this point mutation variation can be understood as equivalent to the absence of SMN1 A normal copy of exon 7.
在2018年5月中国国家卫生健康委员会、科技部、工业和信息化部、国家药品监督管理局、国家中医药管理局等五部门联合发布的第一批罕见病目录中,脊髓性肌萎缩症位列121种疾病之一。SMA发病时间早,是一种儿科神经退行性疾病,严重者一般很难存活超过2岁。广义上的SMA还包括一些与SMN基因不相关且更为罕见的类型。因此,在本发明的上下文中,“SMA”指由SMN基因的纯合缺失点突变导致的类型,除非另行具体指明。临床上根据发病时间和严重程度,将与5号染色体上SMN基因内的纯合缺失突变相关的SMA进一步分为I型、II型、III型和IV型这四种亚型,各个亚型的临床表现存在差异。总的来说,发病越早的亚型越为严重。也有观点认为SMN2的拷贝数可能对SMA的严重程度有影响(
Figure PCTCN2021085974-appb-000001
M等,Am J Hum Genet 2002;70:358–368)。
In the first batch of rare disease catalogs jointly issued by the National Health Commission, the Ministry of Science and Technology, the Ministry of Industry and Information Technology, the National Medical Products Administration, and the State Administration of Traditional Chinese Medicine in May 2018, spinal muscular atrophy Ranked among 121 diseases. SMA has an early onset and is a pediatric neurodegenerative disease. In severe cases, it is generally difficult to survive beyond 2 years of age. Broadly speaking, SMA also includes some rarer types that are not related to the SMN gene. Therefore, in the context of the present invention, "SMA" refers to the type caused by a homozygous deletion point mutation of the SMN gene, unless specifically indicated otherwise. Clinically, according to the time and severity of onset, the SMA associated with homozygous deletion mutations in the SMN gene on chromosome 5 is further divided into four subtypes: type I, type II, type III, and type IV. There are differences in clinical manifestations. In general, the earlier the onset, the more severe the subtype. There are also opinions that the copy number of SMN2 may have an impact on the severity of SMA (
Figure PCTCN2021085974-appb-000001
M et al., Am J Hum Genet 2002; 70:358-368).
SMA-I型又称Werdnig Hoffman病,是最严重的亚型,其发病时间早,部分病例甚至早在胎儿期就会出现胎动减弱、变少的情况。其他病例也会在出生后数个月内发病,并在发病后一年内因呼吸衰竭死亡,一般无法存活超过2岁。SMA-I型患者的临床表现包括对称性肌无力,大运动减少,6个月 后不能独坐;肌肉弛缓,腱反射减低或消失;肌肉萎缩,但由于婴儿脂肪多而不易被发现;肋间肌麻痹;运动脑神经受损等。SMA-II型也称中间型或慢性SMA,发病比SMA-I型稍迟,通常在7月龄至1岁半发病,进展相对I型更为缓慢。临床表现包括严重的肌无力,患儿大多可以独坐但无法独立行走。SMA-III型也称为Kugelberg-Welander病,是儿科SMA中发病最晚、表现最轻的类别。症状包括肌无力、肌肉萎缩等。SMA-IV型为成年发病型SMA,通常在二十岁至四十岁之间特别是三十几岁时确诊,SMA-IV型患者通常具有4至6个SMN2基因拷贝,可以部分弥补因SMN1纯合缺失导致的SMN蛋白的缺乏问题。SMA-I, also known as Werdnig Hoffman's disease, is the most serious subtype. Its onset time is early. In some cases, fetal movement may be weakened and lessened even as early as the fetal period. Other cases will also develop within a few months after birth, and die from respiratory failure within one year after the onset, and generally cannot survive beyond 2 years of age. The clinical manifestations of patients with SMA-I include symmetrical muscle weakness, reduced major exercises, and inability to sit alone after 6 months; muscle relaxation, decreased or disappeared tendon reflexes; muscle atrophy, but it is difficult to be found due to excessive fat in infants; intercostal space Muscle paralysis; motor cranial nerve damage, etc. Type SMA-II is also called intermediate type or chronic SMA. The onset is slightly later than type SMA-I, usually from 7 months to 1 year and a half, and the progress is slower than type I. Clinical manifestations include severe muscle weakness, and most children can sit alone but cannot walk independently. SMA-III type is also known as Kugelberg-Welander disease, which is the latest and least manifested category of pediatric SMA. Symptoms include muscle weakness and muscle atrophy. Type SMA-IV is an adult-onset type of SMA, which is usually diagnosed between the ages of 20 and 40, especially in the 30s. Patients with type SMA-IV usually have 4 to 6 copies of the SMN2 gene, which can partially compensate for SMN1. The lack of SMN protein caused by homozygous deletion.
综合来看,本发明所述的SMA的临床表现包括例如肌无力、肌张力低、肌肉弛缓、肌肉萎缩、肌麻痹、脊柱侧凸和弯曲,以及因此导致的行动不便、呼吸问题、进食和吞咽问题、大运动发育迟缓等。另外,SMA患者还会在肌电图检查中表现出异常,展现出去神经支配现象。这些状况在临床上可以帮助SMA的诊断和识别,但是由于存在其他症状和表现类似的疾病,通常在临床上还是需要分子学的检测来确诊SMA。本发明的检测手段运用了高通量测序的结果,因此同时可以包含与其他遗传病相关的分子诊断信息,特别是与SMA具有类似临床表现的其他遗传病相关的分子诊断信息。利用这一点就可以同时诊断受试者是否患有SMA和这些其他遗传病,在确认患者所患疾病不是SMA的情况下,仍能够为疾病确诊提供相关信息,做到“鉴别性诊断”。因此在本发明的上下文中,“鉴别性诊断”意指确定患者所患疾病,并排除其他与之具有类似的症状、临床表现的疾病的诊断。Taken together, the clinical manifestations of SMA according to the present invention include, for example, muscle weakness, low muscle tone, muscle relaxation, muscle atrophy, muscle paralysis, scoliosis and curvature, as well as the resulting inconvenience, breathing problems, eating and swallowing Problems, developmental delays in big sports, etc. In addition, SMA patients will also show abnormalities in electromyography, showing the phenomenon of innervation. These conditions can help the diagnosis and identification of SMA clinically, but due to the presence of other symptoms and similar diseases, molecular testing is usually needed to diagnose SMA clinically. The detection method of the present invention uses the results of high-throughput sequencing, so it can contain molecular diagnostic information related to other genetic diseases, especially molecular diagnostic information related to other genetic diseases with similar clinical manifestations of SMA. Using this point, it is possible to diagnose whether the subject has SMA and these other genetic diseases at the same time. When it is confirmed that the patient's disease is not SMA, it can still provide relevant information for the diagnosis of the disease and achieve "differential diagnosis". Therefore, in the context of the present invention, "differential diagnosis" means to determine the patient's disease and to exclude other diseases with similar symptoms and clinical manifestations.
本文所述高通量测序也称为大规模平行测序、下一代测序(Next generation sequencing,NGS),其特征在于可以对基因组的同一位置得到多个非重复片段的测序reads,从而提高测序结果数据的深度。在本发明的上下文中,“高通量测序”、“二代测序”、“下一代测序”、“NGS”可以互换使用。二代测序通常被认为包括使用焦磷酸测序法和DNA聚合酶的454测序、使用边合成边测序法和DNA聚合酶的Solexa测序、使用连接酶测序法和DNA连接酶的SoLiD测序、使用半导体测序法和DNA聚合酶的Ion Torrent测序等测序方法。在本发明的上下文中,所述高通量测序法指能够实现深度测序的测序方法,其能够获得样品有关基因组的特定位点的多个测序reads。如本领域通常理解的,“reads”指高通量测序过程中由每个反应产生的序列,通 过对这些序列的读取形成测序的原始数据。通过对相互重叠的reads的拼接能够获得重叠群(contig),这一过程通常由测序拼接软件来完成。通过对重叠群的分析可以进一步匹配其中的重叠部分,并确定重叠群在基因组中的顺序,由顺序已知的重叠群组成的更长的scaffold。在能够实现获得多重reads这一测序结果的前提下,本发明的高通量测序不限于具体的测序原理、方法、仪器和/或试剂。在一个实施方案中,本发明所述的读取模块中的信息是这种由reads构成的原始数据。在一个实施方案中,所述由reads构成的原始数据通过本发明的***中包含的测序装置获得。The high-throughput sequencing described in this article is also called massively parallel sequencing and next generation sequencing (Next Generation Sequencing, NGS), which is characterized in that it can obtain multiple non-repetitive sequence reads from the same position in the genome, thereby improving the sequencing result data depth. In the context of the present invention, "high-throughput sequencing", "next-generation sequencing", "next-generation sequencing", and "NGS" can be used interchangeably. Next-generation sequencing is generally considered to include 454 sequencing using pyrosequencing and DNA polymerase, Solexa sequencing using sequencing by synthesis and DNA polymerase, SoLiD sequencing using ligase sequencing and DNA ligase, and semiconductor sequencing Ion Torrent sequencing of DNA polymerase and other sequencing methods. In the context of the present invention, the high-throughput sequencing method refers to a sequencing method capable of realizing deep sequencing, which can obtain multiple sequencing reads at specific sites in the genome of a sample. As commonly understood in the art, "reads" refer to the sequences generated by each reaction in the high-throughput sequencing process, and the original sequencing data are formed by reading these sequences. Contigs can be obtained by splicing overlapping reads, and this process is usually completed by sequencing splicing software. Through the analysis of the contigs, the overlapping parts can be further matched, and the order of the contigs in the genome can be determined. A longer scaffold composed of contigs whose order is known. On the premise that the sequencing result of multiple reads can be obtained, the high-throughput sequencing of the present invention is not limited to specific sequencing principles, methods, instruments and/or reagents. In one embodiment, the information in the reading module of the present invention is such raw data composed of reads. In one embodiment, the raw data composed of reads is obtained by the sequencing device included in the system of the present invention.
在关于测序特别是高通量测序或称二代测序的上下文中,为每个碱基赋予质量值(Q)来描述测序结果的准确度。例如,某个碱基的质量值为Q20的含义是在碱基识别(base calling)的过程中,对该碱基的识别结果给出的错误率为10的负2次方,即错误率为1%而正确率为99%,质量值为Q30的含义是错误率为0.1%而正确率为99.9%,以此类推,因此质量值越高代表该碱基被错误测序的概率越小。Q20大于等于90%的含义是,对于一定量的测序数据而言,其中90%的碱基数据的质量值能够达到Q20或更好。具体到本发明中,就测序的整体质量而言,Q20大于等于90%,并且Q30大于等于85%。“平均质量值”是指就整个基因组所包括的碱基位置而言的整体平均质量值。In the context of sequencing, especially high-throughput sequencing or second-generation sequencing, each base is assigned a quality value (Q) to describe the accuracy of the sequencing result. For example, the quality value of a certain base Q20 means that in the process of base calling, the error rate given by the recognition result of the base is 10 to the minus 2 power, that is, the error rate is 1% and the correct rate is 99%. The quality value of Q30 means that the error rate is 0.1% and the correct rate is 99.9%, and so on. Therefore, the higher the quality value, the lower the probability of the base being sequenced incorrectly. The meaning of Q20 being greater than or equal to 90% is that for a certain amount of sequencing data, the quality value of 90% of the base data can reach Q20 or better. Specifically in the present invention, in terms of the overall quality of sequencing, Q20 is greater than or equal to 90%, and Q30 is greater than or equal to 85%. "Average quality value" refers to the overall average quality value in terms of base positions included in the entire genome.
对于本发明的分析装置而言,所述计算模块在执行所述计算之前,过滤掉平均质量值20以下的reads序列,优选过滤掉平均质量值25以下的reads序列,更优选过滤掉平均质量值30以下的reads序列。可选地或者除此之外,所述计算模块在执行所述计算之前,优选过滤掉原始数据中SMN1外显子7第840位点的质量值不到20的reads。这种过滤意味着去除了就SMN1外显子7第840位点而言测序的正确率低于99%的reads。For the analysis device of the present invention, before performing the calculation, the calculation module filters out read sequences with an average quality value of 20 or less, preferably filters read sequences with an average quality value of 25 or less, and more preferably filters out the average quality value. Sequence of reads below 30. Optionally or in addition, before performing the calculation, the calculation module preferably filters out reads with a mass value of less than 20 at position 840 of SMN1 exon 7 in the original data. This filtering means that reads whose sequencing accuracy is less than 99% for the 840th position of SMN1 exon 7 are removed.
可选地或者除此之外,所述计算模块在执行所述计算之前,去除PCR扩增的重复序列。Optionally or in addition, the calculation module removes the repetitive sequence amplified by PCR before performing the calculation.
在关于测序特别是高通量测序或称二代测序的上下文中,“覆盖度(coverage)”指将测序结果组装后获得的基因组序列的大小占整个基因组大小的比例。测序中往往无法获得覆盖100%的基因组序列的测序结果,这是由基因组的固有组成和测序方法的不足所导致的,比如基因组中含有一些高GC含量区域、重复序列等复杂结构。In the context of sequencing, especially high-throughput sequencing or second-generation sequencing, “coverage” refers to the proportion of the size of the genome sequence obtained after the sequencing results are assembled to the size of the entire genome. In sequencing, it is often impossible to obtain sequencing results covering 100% of the genome sequence. This is caused by the inherent composition of the genome and the insufficiency of sequencing methods. For example, the genome contains some complex structures such as high GC content regions and repetitive sequences.
在关于测序特别是高通量测序或称二代测序的上下文中,“深度(depth)” 或“测序深度”指测序中测序得到的总碱基数与待测基因组大小的比值。举例来说,测序深度为10X意味着获得的总数据量是整个基因组的十倍,基因组中的每个单一碱基被平均测序或读取了10次。在本发明的实施方案中,就测序的整体而言,深度在50x以上,优选60x以上,更优选70x以上,甚至更优选80x以上,甚至还更优选90x以上,还更优选100x以上。在具体的事实方案中,“深度”也意味着测序结果中包含特定位点的reads数。因此在具体的一个实施方案中,所述测序的单样本测序深度在80x以上。In the context of sequencing, particularly high-throughput sequencing or second-generation sequencing, "depth" or "sequencing depth" refers to the ratio of the total number of bases sequenced during sequencing to the size of the genome to be tested. For example, a sequencing depth of 10X means that the total amount of data obtained is ten times that of the entire genome, and each single base in the genome has been sequenced or read 10 times on average. In the embodiment of the present invention, the depth of sequencing as a whole is 50x or more, preferably 60x or more, more preferably 70x or more, even more preferably 80x or more, even more preferably 90x or more, and even more preferably 100x or more. In the specific factual scheme, "depth" also means that the number of reads at a specific site is included in the sequencing result. Therefore, in a specific embodiment, the sequencing depth of the single sample for sequencing is more than 80x.
因此,本发明所述的测序获得的信息中的10x覆盖度大于85%是指就被测序的整体序列而言,有85%的区域获得了至少10x的覆盖度。在本发明的具体实施方案中,所述测序获得的信息的10x的覆盖度大于85%,优选大于90%,更优选大于95%。Therefore, the 10x coverage in the information obtained by sequencing described in the present invention is greater than 85%, which means that in terms of the overall sequence to be sequenced, 85% of the regions have at least 10x coverage. In a specific embodiment of the present invention, the 10x coverage of the information obtained by the sequencing is greater than 85%, preferably greater than 90%, and more preferably greater than 95%.
另外,在具体讨论SMN1基因第7外显子区域的测序深度时,测序深度意味着包含感兴趣的SMN1基因第7外显子区域840位点的reads数。如本文所述对包含SMN1基因第7外显子区域的高通量测序,是可以借助这种测序,得到SMN1基因第7外显子区域的深度测序数据,即可以得到该区域的多个非重复片段的测序reads数据,该区域的总reads数一般不小于10X,不小于15X、不小于20X、不小于30X、不小于40X、不小于50X、不小于60X、不小于70X、不小于80X、不小于90X、不小于100X。在优选的实施方案中,所述区域的测序平均Q20≥90%,Q30≥85%。In addition, when specifically discussing the sequencing depth of the 7th exon region of the SMN1 gene, the sequencing depth means the number of reads containing the 840 sites in the 7th exon region of the SMN1 gene of interest. As described in this article, the high-throughput sequencing of the region containing the 7th exon of the SMN1 gene can be used to obtain deep sequencing data of the 7th exon region of the SMN1 gene. Sequence reads data of repeated fragments, the total number of reads in this area is generally not less than 10X, not less than 15X, not less than 20X, not less than 30X, not less than 40X, not less than 50X, not less than 60X, not less than 70X, not less than 80X, Not less than 90X, not less than 100X. In a preferred embodiment, the sequencing of the region has an average Q20≥90% and Q30≥85%.
本文所述SMN1的第7外显子纯合缺失,可以体现为包括但不限于三种情况:1)纯合缺失,即在SMN1基因所在的染色体基因座的绝对坐标位置上,SMN1基因两个等位基因的第7外显子拷贝数均缺失;2)纯合点突变,SMN1基因的两个等位基因的外显子7上的c.840位点的碱基C变异为T,等同于SMN1的两个等位基因的外显子7均缺失(即纯合缺失);3)缺失点突变杂合体,在SMN1基因所在的染色体基因座的绝对坐标位置上,SMN1基因一个等位基因的第7外显子拷贝数缺失,并且另一个等位基因的外显子7上的c.840位点的碱基C变异为T。这三种情况均视同于SMN1的第7外显子的纯合缺失,即不存在正常的SMN1拷贝。The homozygous deletion of exon 7 of SMN1 described herein can be embodied as including but not limited to three situations: 1) Homozygous deletion, that is, at the absolute coordinate position of the chromosome locus where the SMN1 gene is located, two SMN1 genes The copy number of exon 7 of the allele is missing; 2) Homozygous point mutation, the base C of c.840 on exon 7 of the two alleles of SMN1 gene is changed to T, which is equivalent to Exon 7 of both alleles of SMN1 are missing (ie homozygous deletion); 3) For deletion point mutation heterozygotes, in the absolute coordinate position of the chromosome locus where the SMN1 gene is located, one allele of the SMN1 gene The copy number of exon 7 is missing, and the base C at position c.840 on exon 7 of the other allele is changed to T. These three conditions are all regarded as homozygous deletion of exon 7 of SMN1, that is, there is no normal copy of SMN1.
本文所述的SMN1的第7外显子的纯合缺失,无论是前述三种情况的哪一种,NGS数据都会体现为SMN1基因的c.840位点体现为碱基C的缺失,该位置的碱基全部为碱基T,故该位置为碱基C的reads数与该位置为 任意碱基的reads数(即该位置为C和T的reads数之和)之比(以下或表示为“R”其等于(SMN1外显子7第840位点为碱基C的reads数)/(包含SMN1外显子7第840位点的总reads数)为0。For the homozygous deletion of exon 7 of SMN1 described in this article, no matter which of the above three situations, the NGS data will be reflected in the c.840 position of the SMN1 gene as a deletion of base C. This position The bases of are all bases T, so this position is the ratio of the number of reads of base C to the number of reads of any base at this position (that is, the position is the sum of the number of reads of C and T) (hereinafter or expressed as "R" is equal to (the number of reads at base C at position 840 of exon 7 of SMN1)/(the total number of reads including the number of reads at position 840 of SMN1 exon 7) is 0.
除此之外,考虑到高通量测序存在技术局限性,本文的检测手段还将所述比值R接近于0的结果也判定为SMN1纯合缺失阳性。具体而言,高通量测序存在一定概率的测序错误,比如将SMN1基因的c.840位点的T错误的检测为C,则会造成R事实上本应为0但根据测序数据得出的结果却不为0的情况。随着测序技术的发展,目前测序错误的概率较低,比如Illumima的Hiseq系列或NOVAseq系列测序仪的测序错误概率大约为千分之一。因此,即使出现罕见的测序错误,如果针对c.840位点的总reads数越大,就会使得计算比值R时的分母越大,相应地错误对R值的影响就会越小,即便R本应为0但不为0,但其数值仍接近于0。在本发明的上下文中,接近于0具体可以是选自下组的数值,例如小于等于0.05即5%,小于等于0.03即3%,小于等于0.02或2%,小于等于0.01或1%,小于等于0.005或0.5%,小于等于0.003或0.3%,小于等于0.002或0.2%,甚至小于等于0.001或0.1%。具体来说,在测序使用Illumima的Hiseq系列或NOVAseq系列测序仪或碱基错误率相当的测序仪完成的情况下,将C/C+T接近于0的情况定义为小于等于0.1或10%,小于等于0.05或5%,小于等于0.01或1%,小于等于0.005或0.5%,小于等于0.003或0.3%,小于等于0.002或0.2%,甚至小于等于0.001或0.1%,在这些情况下,也将检测结果判定为SMN1纯合缺失阳性。因此,本发明的R值接近于0也可以意味着其是一个小于等于测序体系***误差的值。In addition, considering the technical limitations of high-throughput sequencing, the detection method herein also judges the result of the ratio R close to 0 as SMN1 homozygous deletion positive. Specifically, high-throughput sequencing has a certain probability of sequencing errors. For example, if the T error at position c.840 of the SMN1 gene is detected as C, it will cause R to be 0 in fact but based on the sequencing data. The result is not 0. With the development of sequencing technology, the probability of sequencing errors is currently low. For example, the probability of sequencing errors of Illumima's Hiseq series or NOVAseq series sequencers is about one in a thousand. Therefore, even if there are rare sequencing errors, if the total number of reads for the c.840 site is larger, the denominator when calculating the ratio R will be larger, and accordingly the impact of the error on the R value will be smaller, even if R It should be 0 but not 0, but its value is still close to 0. In the context of the present invention, close to 0 may specifically be a value selected from the following group, for example, less than or equal to 0.05 or 5%, less than or equal to 0.03 or 3%, less than or equal to 0.02 or 2%, less than or equal to 0.01 or 1%, and less than Equal to 0.005 or 0.5%, less than or equal to 0.003 or 0.3%, less than or equal to 0.002 or 0.2%, or even less than or equal to 0.001 or 0.1%. Specifically, in the case of sequencing using Illumima's Hiseq series or NOVAseq series sequencers or sequencers with equivalent base error rate, the case where C/C+T is close to 0 is defined as less than or equal to 0.1 or 10%, Less than or equal to 0.05 or 5%, less than or equal to 0.01 or 1%, less than or equal to 0.005 or 0.5%, less than or equal to 0.003 or 0.3%, less than or equal to 0.002 or 0.2%, even less than or equal to 0.001 or 0.1%, in these cases, The test result was judged to be positive for SMN1 homozygous deletion. Therefore, the R value of the present invention close to 0 can also mean that it is a value less than or equal to the systematic error of the sequencing system.
本发明的方法中通过计算比值而非SMN1的c.840位点为C的绝对reads数是否为零来预测拷贝数,其优势在于避免了因检测过程中产生的错误,如测序错误等而导致的非零情况归为SMN1不为纯合缺失的情况,从而导致漏检。举例来说,若以SMN1的c.840位点为C的reads数是否为零来判断是否为纯合缺失,那么在测序错误率为千分之一的情况下,SMN1的c.840位点每测序1000X就会出现一次测序错误,假定测序错误正好为C时,当该位点测序1万X时,C可能出现10X,当该位点测序为10万X时,C可能出现100X,测序错误会导致C的绝对reads数较大,但是利用R=C/总reads数,则可以发现此时R的比值非常低,甚至接近于测序错误的***误差,因 此应该通过R来判断纯合缺失,而不应该使用C的绝对reads数。In the method of the present invention, the copy number is predicted by calculating the ratio instead of whether the absolute number of reads at c.840 of SMN1 is zero or not. The advantage is that it avoids errors in the detection process, such as sequencing errors, etc. The non-zero case of SMN1 is classified as a case where SMN1 is not a homozygous deletion, which leads to missed detection. For example, if the number of reads at c.840 of SMN1 is zero to determine whether it is a homozygous deletion, then in the case of a sequencing error rate of one in a thousand, c.840 of SMN1 A sequencing error will occur every 1000X sequenced. Assuming that the sequencing error is exactly C, when the site is sequenced 10,000X, C may appear 10X, and when the site is sequenced 100,000X, C may appear 100X. Sequencing Errors will cause the absolute number of reads of C to be large, but by using R=C/total number of reads, you can find that the ratio of R is very low at this time, even close to the systematic error of sequencing errors. Therefore, R should be used to judge homozygous deletions. , And the absolute number of C reads should not be used.
本文所述包含SMN1基因或其第7外显子的Panel测序,是指对包含SMN1基因或其第7外显子的一个以上的基因的组合(即Panel)进行测序。The Panel sequencing containing the SMN1 gene or its 7th exon described herein refers to sequencing a combination of more than one gene containing the SMN1 gene or its 7th exon (ie Panel).
本文所述术语“全基因组测序(Whole genome sequencing,WGS)”,或“全外显子组测序(Whole exom sequencing,WES)”均按照本领域通常的理解来解读。其中,全外显子测序(WES)策略是一种诊断遗传病常用的NGS策略,其利用探针捕获并富集外显子区域的DNA序列,再进行高通量测序发现与蛋白质变异相关的基因突变。外显子组测序的对象是基因组中的蛋白质编码区域,这些区域只占全基因组的不到2%,因而外显子组测序相较于基因组测序而言降低了实验和分析成本,可以实现更低的价格、更短的测序时间以及更深的覆盖度。The term "Whole Genome Sequencing (WGS)" or "Whole Exom Sequencing (WES)" as used herein is interpreted according to the common understanding in the art. Among them, the whole exome sequencing (WES) strategy is a commonly used NGS strategy for diagnosing genetic diseases. It uses probes to capture and enrich the DNA sequence of the exon region, and then performs high-throughput sequencing to find out the related protein mutations. Gene mutation. The object of exome sequencing is the protein coding regions in the genome. These regions only account for less than 2% of the whole genome. Therefore, compared with genome sequencing, exome sequencing reduces the cost of experiment and analysis, and can achieve better results. Low price, shorter sequencing time and deeper coverage.
本文所述临床外显子组测序(Clinical exom sequencing,CES),或称医学外显子组测序,是指将多个已知的致病基因进行测序的策略。The clinical exome sequencing (CES), or medical exome sequencing described in this article, refers to the strategy of sequencing multiple known disease-causing genes.
本文所述MLPA,是指多重连接依赖探针扩增技术(Multiplex ligation-dependent probe amplification,MLPA),MLPA技术最早是由荷兰学者Dr.Schouten于2002年提出的一种针对待测核酸中靶序列进行定性和定量分析的检测技术。原理是利用简单的探针和靶序列DNA进行杂交,之后通过连接、PCR扩增,产物通过毛细管电泳分离及数据收集,最后利用软件对收集的数据进行分析得出结论的一种技术。是一种在同一反应管内检测多达50种核苷酸序列拷贝数变化的方法。该技术可以同时鉴定几十个基因或位点的缺失和***。它是一种灵敏的技术,可以快速有效地定量核酸序列。它在世界各地的许多实验室进行,可用于检测基因的拷贝数变化(如缺失或复制),识别DNA的甲基化状态,检测单核苷酸多态性(SNPs)和点突变,量化mRNA。因此,它被应用于许多研究和诊断领域,如细胞遗传学、癌症研究、人类遗传学等。该技术曾是用于检测SMN1第7外显子拷贝数的主流方法。The MLPA mentioned in this article refers to multiple ligation-dependent probe amplification (MLPA). The MLPA technology was first proposed by the Dutch scholar Dr. Schouten in 2002 to target the target sequence in the nucleic acid to be tested. Detection technology for qualitative and quantitative analysis. The principle is to use a simple probe to hybridize with the target sequence DNA, then ligate and amplify by PCR, separate the products by capillary electrophoresis and collect data, and finally use software to analyze the collected data to draw conclusions. It is a method to detect up to 50 kinds of nucleotide sequence copy number changes in the same reaction tube. This technology can identify deletions and insertions of dozens of genes or sites at the same time. It is a sensitive technique that can quantify nucleic acid sequences quickly and efficiently. It is carried out in many laboratories around the world and can be used to detect gene copy number changes (such as deletion or duplication), identify the methylation status of DNA, detect single nucleotide polymorphisms (SNPs) and point mutations, and quantify mRNA . Therefore, it is used in many research and diagnostic fields, such as cytogenetics, cancer research, human genetics, etc. This technique used to be the mainstream method for detecting the copy number of exon 7 of SMN1.
本发明的采样装置、扩增装置、测序装置、分析装置可以整合在一起,也可以为物理上各自独立的装置。当它们在物理上处于各自独立的状态下时,对于这些装置之间的距离没有限制,只要这些装置能够实现其在本发明的***或方法中承担的功能即可。The sampling device, amplification device, sequencing device, and analysis device of the present invention can be integrated together, or can be physically independent devices. When they are in a physically independent state, there is no limitation on the distance between these devices, as long as these devices can realize the functions they undertake in the system or method of the present invention.
用于本发明的测序的受试者样本基因组DNA的制备可以通过本领域技术人员公知的方法和/或试剂盒来进行。受试者样本可以是体液、细胞、组织 等,优选是血液。The preparation of the genomic DNA of the subject sample for sequencing of the present invention can be carried out by methods and/or kits known to those skilled in the art. The subject sample may be body fluids, cells, tissues, etc., preferably blood.
本发明的方法和***还可以用于鉴别性诊断。“鉴别性诊断”在本文的上下文中意指在多种疾病具有相似的临床表现形式的情况下,诊断并确定受试者所患疾病为所述多种疾病中的哪一种疾病。可以通过本发明的方法和***与SMA进行鉴别性诊断的疾病是那些可以通过基因测序诊断的遗传性疾病,其与SMA具有一种或多种相似的临床表现,所述临床表现包括但不限于肌无力、肌张力低、肌肉弛缓、肌肉萎缩、肌麻痹,以及因此导致的行动不便、呼吸问题、进食和吞咽问题、大运动发育迟缓、脊柱侧凸和弯曲等。。在一些优选的实施方案中,所述与SMA具有相似临床表现形式的疾病通常也为运动神经元疾病,特别是下肢运动神经元疾病。具体的实例包括但不限于Becker型肌营养不良症、Bethlem肌病、Kleefstra综合征、Merosin缺乏性先天性肌肉萎缩症、Ullrich型先天性肌营养不良、X连锁肌管性肌病、X连锁中央核肌病、YWHAE基因Miller-Dieker综合征、先天性糖基化病1A型(OMIM:212065)、先天性肌无力综合征4A型、先天性肌病(早发,伴心肌病)、巨颅伴皮层下海绵样囊中个性脑白质病1型、常染色体显性下肢遗传脊髓性肌萎缩症、常染色体隐性肌硬化症、常染色体隐性遗传远端型脊髓型肌萎缩2型(OMIM:605726)、杜氏肌营养不良/进行性假肥大性肌营养不良症、肌萎缩厕所硬化症(ASL)16型(OMIM:614373)、肢带型肌营养不良症2J型(OMIM:608807)、胼胝体发育不全伴周围神经病变、遗传性肌病伴早起呼吸衰竭和遗传性运动感觉性神经病VI型。The method and system of the present invention can also be used for differential diagnosis. "Differential diagnosis" in the context herein means to diagnose and determine which of the multiple diseases the subject is suffering from when multiple diseases have similar clinical manifestations. The diseases that can be differentially diagnosed with SMA by the method and system of the present invention are those genetic diseases that can be diagnosed by gene sequencing, which have one or more similar clinical manifestations with SMA, including but not limited to Muscle weakness, low muscle tone, muscle relaxation, muscle atrophy, muscle paralysis, as well as the resulting inconvenience, breathing problems, eating and swallowing problems, developmental delays in grand motor development, scoliosis and bending, etc. . In some preferred embodiments, the disease with a clinical manifestation similar to that of SMA is usually also a motor neuron disease, especially a lower extremity motor neuron disease. Specific examples include, but are not limited to, Becker type muscular dystrophy, Bethlem myopathy, Kleefstra syndrome, Merosin deficiency congenital muscular dystrophy, Ullrich type congenital muscular dystrophy, X-linked myotube myopathy, X-linked central Nuclear myopathy, YWHAE gene Miller-Dieker syndrome, congenital glycosylation type 1A (OMIM: 212065), congenital myasthenia syndrome type 4A, congenital myopathy (early onset, with cardiomyopathy), giant skull With subcortical spongiform sac with individual leukoencephalopathy type 1, autosomal dominant lower extremity spinal muscular atrophy, autosomal recessive muscle sclerosis, autosomal recessive inherited distal spinal muscular atrophy type 2 (OMIM :605726), Duchenne muscular dystrophy/progressive pseudohypertrophic muscular dystrophy, amyotrophic toilet sclerosis (ASL) type 16 (OMIM:614373), limb-girdle muscular dystrophy type 2J (OMIM:608807), Corpus callosum hypoplasia with peripheral neuropathy, hereditary myopathy with early-onset respiratory failure and hereditary motor sensory neuropathy type VI.
具体实施方式Detailed ways
为了更全面地理解和应用本发明,下文将参考实施例和附图详细描述本发明,所述实施例仅是意图举例说明本发明,而不是意图限制本发明的范围。本发明的范围由后附的权利要求具体限定。In order to understand and apply the present invention more comprehensively, the present invention will be described in detail below with reference to embodiments and drawings. The embodiments are only intended to illustrate the present invention and not to limit the scope of the present invention. The scope of the present invention is specifically defined by the appended claims.
在本发明的实施例中采用了一种诊断遗传病常用的NGS策略——全外显子测序(WES)策略,并结合本发明所述的NGS数据分析方法,用于检测SMN1基因是否存在第七外显子纯合缺失(实施例1),以及其他与SMA表型相近的神经肌肉病的基因是否存在致病突变,进而用于鉴别性诊断(实施例2)。在下文中将这种方法统称为全外显子测序或WES。另外,为了验证使用本发明的方法和***借助高通量测序(具体来说WES)和数据分析检出SMA 的准确性(accuracy),还对临床拟诊为SMA的患者样本以SMA诊断的“金标准”方法MLPA进行了诊断,将MLPA的诊断结果做为诊断参照,与采用本发明的方法和***获得的结果进行比对(实施例3)。In the embodiment of the present invention, a commonly used NGS strategy for diagnosing genetic diseases—whole exome sequencing (WES) strategy, combined with the NGS data analysis method of the present invention, is used to detect whether the SMN1 gene has the first The homozygous deletion of seven exons (Example 1), and whether there are pathogenic mutations in other neuromuscular disease genes with similar phenotypes to SMA, is then used for differential diagnosis (Example 2). In the following, this method is collectively referred to as whole exome sequencing or WES. In addition, in order to verify the accuracy of detecting SMA with the help of high-throughput sequencing (specifically WES) and data analysis using the method and system of the present invention, samples of patients who were clinically diagnosed as SMA were also diagnosed with SMA. MLPA was diagnosed using the "gold standard" method, and the diagnosis result of MLPA was used as a diagnostic reference, and compared with the result obtained by using the method and system of the present invention (Example 3).
实施例1.在临床拟诊为SMA的受试者中检测SMN1纯合缺失Example 1. Detection of SMN1 homozygous deletion in subjects who are clinically suspected of being SMA
本实施例涉及使用本发明的检测体系在临床拟诊为SMA的受试者中检测SMN1纯合缺失。This example relates to the use of the detection system of the present invention to detect SMN1 homozygous deletion in subjects who are clinically diagnosed as SMA.
受试者的选择Subject's choice
本实施例的受试者为2015年6月至2018年7月期间在医院就诊的患者,于就诊期间从这些患者处取得了外周全血生物样本。入组病例均具有神经肌肉病特征性表型,由送检医师提供对应于每位患者的临床特征描述及各项特殊检查结果(患者的姓名等隐私信息已被隐藏)。对于样本检测结果将用于临床科研及数据发表的用途,在研究开始前已获得患者或监护人以及参加研究的家系成员的书面知情同意。The subjects of this embodiment are patients who were treated in the hospital from June 2015 to July 2018, and peripheral whole blood biological samples were obtained from these patients during the period of treatment. The enrolled cases all have the characteristic phenotype of neuromuscular disease, and the sending physician provides a description of the clinical characteristics and various special examination results corresponding to each patient (private information such as the patient's name has been hidden). For sample test results that will be used for clinical scientific research and data publication purposes, written informed consent has been obtained from the patient or guardian and family members participating in the study before the start of the study.
本实施例的受试者如表1中所示,240名受试者均在临床上被拟诊为SMA,其中男性140名(58.3%),女性100名(41.7%),绝大多数受试者为儿童。The subjects of this example are as shown in Table 1. 240 subjects were clinically suspected of being SMA, of which 140 were males (58.3%) and 100 were females (41.7%). The vast majority of them were affected. The subjects are children.
DNA的提取和全外显子组(WES)测序DNA extraction and whole exome (WES) sequencing
从来自患者的血液样品获得了基因组DNA(DNA要求浓度大于50ng/ul,总量达1μg),将获得的基因组DNA通过超声进行破碎,在两端连接接头(Illumina,San Diego,CA),加上标示样本的Index序列,PCR扩增之后和生物素标记的探针杂交捕获目标序列。使用NimbleGen SeqCap EZ v2 Enrichment Kit(47Mbp)富集芯片和SeqCap EZ Choice Kits(捕获最大7Mbp的定制区域,包含SMN1和SMN2基因)进行DNA捕获。采用Illumina hiseq 2500高通量测序仪进行测序。全外显子测序上机过程中,保证单样本测序深度(测序总数据量base个数/上述定制区域长度)80x以上,测序平均Q20≥90%,Q30≥85%,PE+SE百分比≥95%,且10x以上覆盖度≥95%。数据分析使用碱基识别方法(calling method),对变体进行注释(annotate)。Genomic DNA was obtained from a blood sample from a patient (the required concentration of DNA is greater than 50ng/ul, and the total amount is 1μg). The obtained genomic DNA was fragmented by ultrasound, and adapters (Illumina, San Diego, CA) were connected at both ends. The index sequence of the sample is marked on the upper part, and the target sequence is captured by hybridization with a biotin-labeled probe after PCR amplification. Use NimbleGen SeqCap EZ v2 Enrichment Kit (47Mbp) enrichment chip and SeqCap EZ Choice Kits (capture a custom region up to 7Mbp, including SMN1 and SMN2 genes) for DNA capture. Sequencing was performed using the Illumina hiseq 2500 high-throughput sequencer. During the whole exome sequencing process, ensure that the single-sample sequencing depth (the total number of sequencing data bases/the length of the above-mentioned customized region) is more than 80x, the average sequencing Q20≥90%, Q30≥85%, and the PE+SE percentage ≥95 %, and the coverage of 10x or more is ≥95%. Data analysis uses a calling method to annotate variants.
使用本发明的算法检测SMN1纯合缺失Use the algorithm of the present invention to detect SMN1 homozygous deletion
SMN1和SMN2基因是高度相似的同源基因,共有5个碱基的差异,其中1个位于外显子7中,1个位于外显子8中,另外三个位于内含子中,而外显子7中含有终止密码子,外显子8不编码氨基酸,因此两者编码区仅有1个碱基的差异,即外显子7中的差异。具体来说,SMN1基因染色体坐标chr5:70247773(NM_000344.3:c.840)为C,SMN2基因染色体坐标位置上chr5:69372353(NM_017411.3:c.840)为T。SMN1 and SMN2 genes are highly similar homologous genes, with a total of 5 base differences, of which one is located in exon 7, one is located in exon 8, and the other three are located in introns. Exon 7 contains a stop codon, and exon 8 does not encode an amino acid. Therefore, there is only one base difference between the two coding regions, that is, the difference in exon 7. Specifically, the SMN1 gene chromosome coordinate chr5:70247773 (NM_000344.3:c.840) is C, and the SMN2 gene chromosome coordinate position chr5:69372353 (NM_017411.3:c.840) is T.
全外显子组测序范围不包含内含子区域,因此本发明的算法利用该单一位点进行拷贝数计算。考虑到所使用的短序列比对算法来自Burrows-Wheeler比对软件(aligner software),其为一种容忍错配的比对算法,因此会将实际上来自SMN1和SMN2的reads序列随机比对分配给这两个基因(如图1所示)。在图1中,SMN2下的reads片段代表了被算法分配到SMN2的reads数或深度;SMN1下的reads片段代表了被算法分配到SMN1的reads数或深度。但由于这是在容忍错配的情况下进行的随机分配,因此SMN2之下实际上包含真正应为SMN2的reads数或深度,将其称为T2,和实际为C但被错误地分配到SMN2基因上的reads数或深度,将其称为C2。被分配到SMN1基因的reads也存在类似的情况,其中实际上包含真正为C的reads数或深度,将其称为C1,和实际为T但被错误地分配到SMN1基因上的reads数或深度,将其称为T1。因而需要计算两个位点上reads序列的深度,所有真正为C的reads数或称深度(C=C1+C2),以及真正为T的reads数或称深度(T=T1+T2)。由于C的检测值和SMN1实际存在的拷贝数成正比,且T的检测值和SMN2实际存在的拷贝数成正比,认为可以通过C:(C+T)的比例R来推算SMN1是否是纯合缺失。当SMN1发生纯合缺失时C碱基测序深度为0,则C/(C+T)比值R为0;如果SMN2也同时表现为纯合缺失,则C=T=0。The whole exome sequencing range does not include intron regions, so the algorithm of the present invention uses this single site for copy number calculation. Considering that the short sequence alignment algorithm used comes from Burrows-Wheeler alignment software (aligner software), which is a mismatch-tolerant alignment algorithm, so the read sequences actually from SMN1 and SMN2 are randomly aligned and allocated Give these two genes (as shown in Figure 1). In Figure 1, the reads segment under SMN2 represents the number or depth of reads allocated to SMN2 by the algorithm; the reads segment under SMN1 represents the number or depth of reads allocated to SMN1 by the algorithm. However, since this is a random allocation under the condition of tolerating mismatches, SMN2 actually contains the number or depth of reads that should be SMN2, which is called T2, and it is actually C but was incorrectly allocated to SMN2 The number or depth of reads on a gene is called C2. A similar situation exists for the reads assigned to the SMN1 gene, which actually contains the number or depth of reads that are actually C, which is called C1, and the number or depth of reads that are actually T but are incorrectly assigned to the SMN1 gene. , Call it T1. Therefore, it is necessary to calculate the depth of the read sequence at the two sites, the number of reads that are really C, or the depth (C=C1+C2), and the number of reads that are really T, or the depth (T=T1+T2). Since the detection value of C is proportional to the actual copy number of SMN1, and the detection value of T is proportional to the actual copy number of SMN2, it is believed that the ratio R of C:(C+T) can be used to infer whether SMN1 is homozygous. Missing. When SMN1 has a homozygous deletion, the C-base sequencing depth is 0, then the C/(C+T) ratio R is 0; if SMN2 also shows a homozygous deletion, C=T=0.
考虑到建库、捕获、PCR、测序各个步骤都可能会引入碱基错误,数据统计是在对不可靠的reads序列进行了过滤后进行的。过滤的标准包括:过滤掉原始数据平均质量值20以下的reads序列,通过samtools软件去除PCR扩增的重复序列,过滤掉c.840位点碱基测序质量Q20以下的reads,最终得到支持C与T的reads。另外,为了避免因***误差导致的漏检,将算法设定C:(C+T)<0.1或者C去重复深度<3作为SMN1纯合缺失(SMA阳性)的判定阈值,否则判定为无SMN1纯合缺失(SMA阴性)。Considering that base errors may be introduced in each step of library construction, capture, PCR, and sequencing, data statistics are performed after filtering unreliable read sequences. Filtering criteria include: filter out reads with an average quality value of less than 20 in the original data, remove PCR amplified repetitive sequences through samtools software, filter out reads with a base sequencing quality of less than Q20 at c.840, and finally get support for C and T's reads. In addition, in order to avoid missed detection due to system errors, the algorithm sets C:(C+T)<0.1 or C deduplication depth<3 as the judgment threshold of SMN1 homozygous deletion (SMA positive), otherwise it is judged as no SMN1 Homozygous deletion (SMA negative).
根据上述方法获得的数据诊断数据如表1所示,在240名受试者中,诊断为SMN1纯合缺失的受试者共122名。The diagnostic data obtained according to the above method is shown in Table 1. Among 240 subjects, 122 subjects were diagnosed with SMN1 homozygous deletion.
本申请实施例的全部计数资料采用统计软件SPSS 16.0,用成组t检验方法检验统计学显著性意义,p<0.05定义为有显著性统计学意义。The statistical software SPSS 16.0 is used for all the count data in the examples of this application, and the group t test method is used to test statistical significance, and p<0.05 is defined as statistical significance.
表1Table 1
Figure PCTCN2021085974-appb-000002
Figure PCTCN2021085974-appb-000002
*ns:不显著*ns: not significant
实施例2.对非SMN1纯合缺失的受试者的鉴别性诊断Example 2. Differential diagnosis of subjects with non-SMN1 homozygous deletion
采用全外显子检测遗传病的通用方法分析受试者(特别是在实施例1中鉴定为非SMN1纯合缺失的118名受试者)是否具有其他与SMA表型相近的神经肌肉疾病,以实现鉴别性诊断。The general method of detecting genetic diseases using all exons was used to analyze whether subjects (especially the 118 subjects identified as non-SMN1 homozygous deletions in Example 1) have other neuromuscular diseases similar to SMA phenotypes, To achieve differential diagnosis.
所述通用方法的具体步骤包括:The specific steps of the general method include:
1)原始数据产量统计:去接头污染,过滤掉平均质量值低于20的reads,从reads末端过滤掉质量值低于20的碱基。1) Raw data output statistics: remove the linker contamination, filter out reads with an average quality value of less than 20, and filter out bases with a quality value of less than 20 from the end of the reads.
2)比对:数据与参考序列比对统计(比对软件BWA),参考基因组采用hg19基因组。2) Alignment: data and reference sequence comparison statistics (comparison software BWA), the reference genome uses the hg19 genome.
3)变异检测:用GATK对比对结果进行比对重排和质量矫正,然后使用GATK的HaplotypeCaller算法call突变。3) Mutation detection: Use GATK to compare the results for comparison, rearrangement and quality correction, and then use GATK's HaplotypeCaller algorithm to call mutations.
4)突变假阳性过滤:根据测序深度、突变质量,对检测得到的单核苷酸变异(SNV)、***缺失(Indel)进行过滤筛选,得到高质量可靠的突变:突变深度至少达到2x,突变率>10%,突变质量值>20的突变。4) Mutation false positive filtering: According to the sequencing depth and mutation quality, filter and screen the detected single nucleotide variants (SNV) and indels (Indel) to obtain high-quality and reliable mutations: mutation depth is at least 2x, mutation The mutation rate is> 10%, and the mutation quality value is> 20.
5)突变注释:根据SNV和Indel在基因上的位置,分析得到氨基酸变化影响,剪切影响,UTR,内含子突变影响等。5) Mutation annotation: According to the position of SNV and Indel on the gene, analyze the effect of amino acid change, shear effect, UTR, intron mutation effect, etc.
6)筛选出的变异对蛋白功能影响的预测:利用Provean,SIFT,Polyphen2_HDIV,Polyphen2_HVAR,mutationtaster,M-CAP,REVEL危害性预测软件基于同源比对,蛋白结构的保守性等的算法,预测筛选出的变异对蛋白质的影响。6) Prediction of the effects of the selected mutations on the protein function: Predictive screening using Provean, SIFT, Polyphen2_HDIV, Polyphen2_HVAR, mutationtaster, M-CAP, REVEL hazard prediction software based on algorithms such as homology comparison, protein structure conservation, etc. The impact of the resulting variation on protein.
7)使用MaxEntScan软件对剪切位点附近的突变做剪切危害性预测。7) Use MaxEntScan software to predict the splicing hazard of mutations near the splicing site.
8)关联dbSNP,1000genome突变频率,ExAC数据库,OMIM,Swiss-var数据库,注释已报道的疾病基因和已报道的致病位点,注释已报道突变的MAF等。8) Associate dbSNP, 1000genome mutation frequency, ExAC database, OMIM, Swiss-var database, annotate reported disease genes and reported pathogenic sites, annotate MAFs with reported mutations, etc.
9)按照2015年ACMG国际指南进行遗传变异分级,筛选1-3级Pathogenic/Likely Pathogenic/VUS变异,结合变异所在的基因和关联的OMIM疾病遗传方式进行遗传判定,筛选出遗传模式支持致病的变异。9) According to the 2015 ACMG International Guidelines for genetic variation classification, screening 1-3 Pathogenic/Likely Pathogenic/VUS variants, combining the genes where the mutations are located and the associated OMIM disease inheritance methods for genetic judgment, and screening genetic patterns that support pathogenicity Mutations.
10)将受试者的临床表型与遗传模式支持的OMIM疾病表型进行匹配,找到和患者表型匹配的疾病做为候选疾病,并结合经治临床医师的判断得出最终分子诊断结论。10) Match the subject's clinical phenotype with the OMIM disease phenotype supported by the genetic model, find the disease that matches the patient's phenotype as a candidate disease, and combine the judgment of the treated clinician to arrive at the final molecular diagnosis conclusion.
通过上述方法,将实施例1中确定为非SMN1纯合缺失的118名患者中的21名受试者诊断为其他神经肌肉疾病,具体的诊断结果参见图2中的表格。换言之,对于这21名受试者而言,他们在最初的拟诊中都存在误诊(参见表1)。Through the above method, 21 of the 118 patients who were determined to be non-SMN1 homozygous deletion in Example 1 were diagnosed as other neuromuscular diseases, and the specific diagnosis results are shown in the table in FIG. 2. In other words, for these 21 subjects, they were all misdiagnosed in the initial diagnosis (see Table 1).
实施例3.在临床为怀疑为SMA的受试者中检测SMN1纯合缺失Example 3. Detection of SMN1 homozygous deletion in clinically suspected SMA subjects
截至在2018年8月为止,发明人在全部具有神经肌肉病临床特征、但初步诊断未怀疑SMA的受试者中,对于不能排除遗传致病因素的受试者,通过如实施例1中所述的本发明的检测手段检测出56例受试者携带SMN1纯合缺失突变(参见表2)。As of August 2018, the inventors of all subjects with clinical features of neuromuscular disease but no suspicion of SMA in the preliminary diagnosis, for subjects whose genetic pathogenic factors cannot be ruled out, passed the method as described in Example 1. The described detection method of the present invention detected 56 subjects carrying SMN1 homozygous deletion mutations (see Table 2).
表2Table 2
Figure PCTCN2021085974-appb-000003
Figure PCTCN2021085974-appb-000003
Figure PCTCN2021085974-appb-000004
Figure PCTCN2021085974-appb-000004
*ns:不显著*ns: not significant
综合实施例1至3中获得的结果,可以看出本发明的方法可以针对不同情况的受试者给出以下几项综合性诊断信息。Based on the results obtained in Examples 1 to 3, it can be seen that the method of the present invention can provide the following comprehensive diagnostic information for subjects in different situations.
1)对临床初步判断为SMA的240名患者进行了鉴别诊断:1) Differential diagnosis was made on 240 patients who were initially judged to be SMA:
A.122例确诊SMA(122/240,50.8%):在实施例1中全部240例临床初步判断为SMA的患者中,122例患者经WES检测SMN1纯合突变阳性,即WES发现SMN1基因外显子7纯合缺失,并且MLPA验证结果均为阳性,即SMN1基因0拷贝;A. 122 cases of confirmed SMA (122/240, 50.8%): Among all 240 patients who were initially clinically judged to be SMA in Example 1, 122 patients were tested positive for SMN1 homozygous mutation by WES, that is, WES found that SMN1 was extra-gene Exon 7 is homozygously deleted, and the MLPA verification results are all positive, that is, 0 copies of SMN1 gene;
B.22例避免了误判为SMA,并确诊为其他疾病(22/240,9.2%):全部240例临床初步判断为SMA的患者中,22例患者经WES检测SMN1纯合突变阴性,即WES发现并非SMN1基因外显子7的纯合缺失,并且MLPA验证结果均为阴性,即SMN1基因大于0拷贝,并且WES检出其它导致患者神经肌肉疾患的相关致病性基因变异(具体情况参见图2);B. 22 cases avoided the misjudgment of SMA and were diagnosed with other diseases (22/240, 9.2%): Of all 240 patients who were initially diagnosed as SMA, 22 patients were negative for SMN1 homozygous mutation by WES test, that is WES found that it was not a homozygous deletion of exon 7 of the SMN1 gene, and the MLPA verification results were all negative, that is, the SMN1 gene was greater than 0 copies, and WES detected other related pathogenic gene variants that caused the patient’s neuromuscular disease (see for details) figure 2);
C.96例排除了SMN1第7外显子纯合缺失型SMA(96/240 40.0%):全部240例临床初步判断为SMA的患者中,96例患者经WES检测SMN1纯合突变阴性,即WES发现并非SMN1基因外显子7的纯合缺失,并且MLPA阴性验证结果均为阴性,即SMN1基因大于0拷贝,且WES未检出其它导致患者神经肌肉疾患的致病变异。C. 96 cases of SMN1 exon 7 homozygous deletion SMA (96/240 40.0%): Among all 240 patients who were initially clinically judged to be SMA, 96 patients were negative for SMN1 homozygous mutations by WES, that is WES found that it was not a homozygous deletion of exon 7 of the SMN1 gene, and the negative verification results of MLPA were all negative, that is, the SMN1 gene was greater than 0 copies, and WES did not detect other pathogenic variants that caused the patient's neuromuscular disease.
2)对临床未考虑SMA、但实际是SMA的患者,避免了漏诊SMA2) For patients who have not considered SMA in clinical practice, but are actually SMA, avoiding missed diagnosis of SMA
D.避免遗漏SMA诊断:具有神经肌肉病临床表型,在接受检测前临床上未被怀疑为SMA,经WES检出SMN1纯合缺失,并由随后的MLPA验证所证实的携带者共56名。D. Avoid missing the diagnosis of SMA: There are 56 carriers who have the clinical phenotype of neuromuscular disease and were not clinically suspected of being SMA before the test. The SMN1 homozygous deletion was detected by WES and confirmed by subsequent MLPA verification. .
实施例4.用MLPA方法对本发明的检测结果进行验证Example 4. Using MLPA method to verify the detection results of the present invention
为了验证本发明的方法的准确性,发明人对所有受试者样本均采用了现 有技术中的金标准检测技术多重连接依赖探针扩增(MLPA)进行了验证,并将其结果与通过本发明的检测方法获得的结果进行了比较。In order to verify the accuracy of the method of the present invention, the inventors used the gold standard detection technology in the prior art to verify the multiple ligation dependent probe amplification (MLPA) on all subject samples, and compared the results with the approved method. The results obtained by the detection method of the present invention were compared.
多重连接依赖性探针扩增用于检测SMN1/SMN2拷贝数变异,作为WES检测结果的验证手段。每次实验采用取自3名健康人的血样作为对照,其年龄及性别分布与入组受试者比较的统计学检验无显著性差异。MLPA试剂盒采用荷兰MRC-Holland公司P060产品,包含30对探针,可以特异性检测SMN1与SMN2基因第7和第8号外显子的拷贝数(其中SMN1第7外显子由于决定了此基因功能的完整性,故其拷贝数等同于等位基因的数目);该试剂盒中的4种探针检测SMN1或SMN2基因序列(表3),其他的探针均用来检测其他染色体作为参照。特异性检测SMN1基因的外显子7探针位于183nt位置,其检测到的杂合缺失即表明SMA携带。特异性检测SMN1基因外显子8的探针位于218nt位置,可检测到95%外显子7拷贝数的变化(仅检测到SMN1基因外显子8缺失不代表SMA携带)。此外试剂盒包括了检测SMN2基因外显子7(282nt)和外显子8(301nt)的探针和17对内对照物探针。Multiple ligation-dependent probe amplification is used to detect SMN1/SMN2 copy number variation, as a verification method for WES detection results. In each experiment, blood samples taken from 3 healthy people were used as controls, and their age and gender distributions were not significantly different from those of the enrolled subjects. The MLPA kit uses the P060 product of the Dutch MRC-Holland company, containing 30 pairs of probes, which can specifically detect the copy number of the 7th and 8th exons of the SMN1 and SMN2 genes (among them, the 7th exon of SMN1 determines this gene Functional integrity, so its copy number is equal to the number of alleles); the four probes in the kit detect SMN1 or SMN2 gene sequence (Table 3), and other probes are used to detect other chromosomes as a reference . The probe for specifically detecting exon 7 of SMN1 gene is located at the position of 183 nt, and the detected loss of heterozygosity indicates that SMA is carried. The probe for specifically detecting SMN1 gene exon 8 is located at 218 nt, and 95% of the copy number changes of exon 7 can be detected (only the detection of SMN1 gene exon 8 deletion does not mean that SMA is carried). In addition, the kit includes probes for detecting exon 7 (282nt) and exon 8 (301nt) of SMN2 gene and 17 pairs of internal control probes.
具体的实验流程如下。The specific experimental procedure is as follows.
1)杂交:取5μl DNA(终浓度为30ng/μl)加入EP管,98℃变性5min,冷却至25℃后滴加1.50μl多重探针及1.50μl Buffer,95℃变性1min后在60℃杂交16-24hrs。1) Hybridization: Take 5μl DNA (final concentration 30ng/μl) into EP tube, denature at 98℃ for 5min, cool to 25℃, add 1.50μl multiple probe and 1.50μl Buffer dropwise, denature at 95℃ for 1min and hybridize at 60℃ 16-24hrs.
2)连接:滴加32μl连接混合液,54℃孵育15min,98℃灭活连接酶5min。2) Connection: drop 32μl of the connection mixture, incubate at 54°C for 15min, and inactivate the ligase at 98°C for 5min.
3)扩增:取连接后的产物10μl,加入4μl PCR Buffer及26μl ddH 2O,72℃下加入10μl扩增反应液并启动PCR反应。反应条件为95℃变性30s,60℃退火30s,72℃延伸1min,共35个循环,最后72℃延伸20min。 3) Amplification: Take 10 μl of the ligated product, add 4 μl PCR Buffer and 26 μl ddH 2 O, add 10 μl amplification reaction solution at 72° C. and start the PCR reaction. The reaction conditions were denaturation at 95°C for 30s, annealing at 60°C for 30s, extension at 72°C for 1 min, a total of 35 cycles, and finally extension at 72°C for 20 min.
4)分离:取1μl扩增产物加入8.7μl Hi-Di甲酰胺(美国ABI公司)及0.30μl LIZ-500Marker(美国ABI公司)95℃变性5min,采用Genetic Analyzer-3130基因分析仪(美国ABI公司)进行毛细管电泳分离。4) Separation: Take 1μl of amplified product and add 8.7μl Hi-Di formamide (American ABI company) and 0.30μl LIZ-500Marker (American ABI company) 95℃ denaturation for 5min, using Genetic Analyzer-3130 gene analyzer (American ABI company) ) Perform capillary electrophoresis separation.
MLPA数据分析如下进行。采用Genemapper 3.0程序分析毛细管电泳分离结果,并导出图型及数据。将各目的片段峰面积除以全部内参照峰面积之和,即为该目的片段的相对峰面积(RPA),再将SMA组RPA与正常对照组平均RPA(即20个正常对照RPA的平均值)相比较而得出拷贝数比值,进而 可计算出该目的片段的拷贝数。根据荷兰MRC-Holland公司官方网站(http://www.mlpa.com)提供的拷贝数定义标准,拷贝数比值范围在0.40-0.65为1拷贝,0.80-1.20为2拷贝,1.30-1.65为3拷贝,1.75-2.15为4拷贝。若某一片段无峰信号即代表该片段缺失。当拷贝数比值临近波动范围边界时实施重复验证以确保结果准确无误。MLPA data analysis is performed as follows. Use Genemapper 3.0 program to analyze the results of capillary electrophoresis separation, and export the pattern and data. Divide the peak area of each target fragment by the sum of all internal reference peak areas, that is, the relative peak area (RPA) of the target fragment, and then the average RPA of the SMA group and the normal control group (that is, the average value of the 20 normal control RPAs) ) Is compared to obtain the copy number ratio, and then the copy number of the target fragment can be calculated. According to the copy number definition standard provided by the official website of the Dutch company MRC-Holland (http://www.mlpa.com), the copy number ratio ranges from 0.40-0.65 for 1 copy, 0.80-1.20 for 2 copies, and 1.30-1.65 for 3 Copy, 1.75-2.15 is 4 copies. If a fragment has no peak signal, it means that the fragment is missing. When the copy number ratio is close to the boundary of the fluctuation range, repeated verification is implemented to ensure that the results are accurate.
表3.MLPA引物序列Table 3. MLPA primer sequence
Figure PCTCN2021085974-appb-000005
Figure PCTCN2021085974-appb-000005
通过使用如实施例1中所述的本发明的算法分析WES测序结果,在实施例1和3涉及的296名患者中,发现了178例SMN1纯合缺失(具体数值参见表5),118例非SMN1纯合缺失。与通过上述MLPA检测获得的结果进行比对或发现,这些结果均与MLPA获得的结果一致(表4),符合率达100%,说明本发明的算法对SMN1纯合缺失的诊断准确度、灵敏度和特异性均与现有技术中的金标准MLPA技术相当。By using the algorithm of the present invention as described in Example 1 to analyze the WES sequencing results, among the 296 patients involved in Examples 1 and 3, 178 cases of SMN1 homozygous deletion were found (see Table 5 for specific values), 118 cases Non-SMN1 homozygous deletion. Comparing with the results obtained by the above-mentioned MLPA detection or finding that these results are consistent with the results obtained by MLPA (Table 4), the coincidence rate reached 100%, indicating the accuracy and sensitivity of the algorithm of the present invention for the diagnosis of SMN1 homozygous deletion And the specificity is comparable to the gold standard MLPA technology in the prior art.
表4Table 4
Figure PCTCN2021085974-appb-000006
Figure PCTCN2021085974-appb-000006
另外,如实施例3中所总结的,本发明的方法能够获得综合性的诊断结果,特别是给出了MLPA无法提供的鉴别性诊断,因而体现出了比MLPA更强大的诊断能力。In addition, as summarized in Example 3, the method of the present invention can obtain a comprehensive diagnosis result, especially a differential diagnosis that MLPA cannot provide, and thus demonstrates a more powerful diagnosis capability than MLPA.
表5.纯合缺失受试者的R值Table 5. R values of subjects with homozygous deletion
Figure PCTCN2021085974-appb-000007
Figure PCTCN2021085974-appb-000007
Figure PCTCN2021085974-appb-000008
Figure PCTCN2021085974-appb-000008
Figure PCTCN2021085974-appb-000009
Figure PCTCN2021085974-appb-000009
实施例5.受试者的临床特征分析Example 5. Analysis of the clinical characteristics of subjects
根据本发明的研究对象的数据,可以按照最终的诊断结果(如实施例3中所示分为四组,即“确诊”、“误诊”、“病因未明”、“漏诊”)对受试者进行分组,归纳了每个组别的临床特征。According to the data of the research subjects of the present invention, the subjects can be divided into four groups according to the final diagnosis results (as shown in Example 3, namely "diagnosed", "misdiagnosis", "causal unknown", and "missed diagnosis"). Divide into groups and summarize the clinical characteristics of each group.
确诊受试者:小婴儿表现为哭声低,呼吸困难,和呼吸衰竭;其余患者全部具有对称性四肢肌无力(多数以近端与下肢为主),肌张力减弱,和相应的运动功能受损;部分患者还有肌颤,神经反射减弱,和肢体肌萎缩等临床特征。Diagnosed subjects: infants showed low crying, dyspnea, and respiratory failure; the rest of the patients all had symmetrical limb weakness (mostly proximal and lower limbs), weakened muscle tone, and corresponding motor function Some patients also have clinical features such as muscle fibrillation, weakened nerve reflexes, and limb muscle atrophy.
误诊受试者:全部患者均具有肢体肌力减弱的共同特征。其余的临床表型则非SMA所特有,而可归属于其它致病性基因变异所关联疾患的特征性表型谱,如假肥大性肌营养不良(也称Duchenne型肌营养不良症或DMD)患者所具有的假性肌肥大与肌张力增高,以及其它基因突变所关联的面部畸形,发育延迟等。Misdiagnosed subjects: All patients have the common feature of weakened limb muscles. The remaining clinical phenotypes are not unique to SMA, but can be attributed to the characteristic phenotypic spectrum of diseases associated with other pathogenic gene variants, such as pseudohypertrophic muscular dystrophy (also known as Duchenne muscular dystrophy or DMD) Patients have pseudo-hypertrophy and increased muscle tone, as well as facial deformities and developmental delays associated with other gene mutations.
病因未明的受试者:多数存在肢体肌力减弱,此外为非特异性临床特征,包括痫性发作,步态异常,发育延迟,肢体肌张力增高或减退,以及脑影像学异常发现等。Subjects with unknown etiology: Most of them have weakened limb muscle strength, in addition to non-specific clinical features, including seizures, abnormal gait, developmental delay, increased or decreased limb muscle tone, and abnormal brain imaging findings.
漏诊的受试者:小婴儿表现为哭声低和呼吸困难;其余全部患者均表现为肢体肌力减弱,肌张力正常或减低,但无增高现象,做过脑影像学检查者 均未见明显异常。Missed subjects: infants showed low crying and dyspnea; all other patients showed weakened limb muscle strength, normal or decreased muscle tone, but no increase, and none of them had any obvious signs of brain imaging examination. abnormal.
以上各组病例,经治医师能够保持联系的病例在经由WES/MLPA检测明确病因诊断后,间隔6-12个月接受了病情随访,均证实了其临床特征未发生足以影响最终诊断疾病种类的变化。In the above groups of cases, the patients who were able to keep in touch with the treating physicians were followed up 6-12 months after the diagnosis of the cause was confirmed by WES/MLPA testing, and it was confirmed that the clinical features did not occur that would affect the final diagnosis of the disease. Variety.
从以上对临床特征的统计可见,各组受试者之间存在一些相同或类似的临床特征,进一步说明了需要本发明的方法,即一种比通过临床特征诊断更加准确,但又比MLPA更加全面的诊断方法。From the above statistics of clinical characteristics, it can be seen that there are some identical or similar clinical characteristics among subjects in each group, which further illustrates the need for the method of the present invention, which is more accurate than diagnosis by clinical characteristics, but more accurate than MLPA. Comprehensive diagnostic method.

Claims (24)

  1. 一种用于检测受试者的SMN1基因中纯合突变的分析装置,其中所述分析装置包括:An analysis device for detecting homozygous mutations in the SMN1 gene of a subject, wherein the analysis device includes:
    读取模块,其用于读取通过测序获得的信息,所述信息包括多个包含SMN1基因的外显子7第840位点的reads;A reading module for reading information obtained by sequencing, the information including a plurality of reads at position 840 of exon 7 of the SMN1 gene;
    计算模块,所述计算模块计算(SMN1外显子7第840位点为碱基C的reads数)/(包含SMN1外显子7第840位点的总reads数)的比值R,A calculation module that calculates the ratio R of (the number of reads at base C at position 840 of exon 7 of SMN1)/(the total number of reads at position 840 of SMN1 exon 7),
    判定模块,当所述比值等于0或接近于0时,判定所述受试者为存在SMN1外显子7的纯合缺失的阳性受试者,否则则判定所述受试者为不存在SMN1外显子7的纯合缺失阴性受试者。The determination module, when the ratio is equal to or close to 0, determines that the subject is a positive subject with a homozygous deletion of SMN1 exon 7, otherwise it is determined that the subject does not have SMN1 Subjects negative for homozygous deletion of exon 7.
  2. 权利要求1所述的分析装置,其中所述计算模块在执行所述计算之前,过滤掉平均质量值20以下的reads序列。The analysis device of claim 1, wherein the calculation module filters out read sequences with an average quality value of 20 or less before performing the calculation.
  3. 权利要求1或2的分析装置,其中所述计算模块在执行所述计算之前,过滤掉SMN1外显子7第840位点的质量值小于20的reads。The analysis device of claim 1 or 2, wherein the calculation module filters out reads with a mass value of less than 20 at position 840 of exon 7 of SMN1 before performing the calculation.
  4. 权利要求1至3中任一项的分析装置,其中所述计算模块在执行所述计算之前,去除PCR扩增的重复序列。The analysis device according to any one of claims 1 to 3, wherein the calculation module removes repetitive sequences amplified by PCR before performing the calculation.
  5. 权利要求1至4中任一项的分析装置,其中当所述比值小于0.1时,判定所述受试者为存在SMN1外显子7的纯合缺失的阳性受试者,否则则判定所述受试者为不存在SMN1外显子7的纯合缺失阴性受试者。The analysis device of any one of claims 1 to 4, wherein when the ratio is less than 0.1, it is determined that the subject is a positive subject with a homozygous deletion of SMN1 exon 7, otherwise it is determined that the The subject was a negative subject without homozygous deletion of exon 7 of SMN1.
  6. 权利要求1至5中任一项的分析装置,其用于脊髓性肌萎缩症(SMA)的诊断。The analysis device according to any one of claims 1 to 5, which is used for the diagnosis of spinal muscular atrophy (SMA).
  7. 权利要求6的分析装置,其中所述SMA为SMA-I型、SMA-II型、SMA-III型和SMA-IV型。The analysis device of claim 6, wherein the SMA is type SMA-I, type SMA-II, type SMA-III, and type SMA-IV.
  8. 权利要求1至7中任一项的分析装置,其用于鉴别性诊断SMA与其他同SMA具有类似表型的疾病。The analysis device according to any one of claims 1 to 7, which is used for differential diagnosis of SMA and other diseases with similar phenotypes to SMA.
  9. 权利要求8的分析装置,其中所述同SMA具有类似表型的疾病为神经肌肉疾病。The analysis device of claim 8, wherein the disease having a similar phenotype to SMA is a neuromuscular disease.
  10. 权利要求1至5中任一项的分析装置,其中所述读取模块读取的测序获得的信息包含的reads数为10至100万。The analysis device according to any one of claims 1 to 5, wherein the information obtained by sequencing read by the reading module contains 100,000 to 1 million reads.
  11. 一种用于检测受试者的SMN1基因中纯合突变的***,其中所述***包括:A system for detecting homozygous mutations in the SMN1 gene of a subject, wherein the system includes:
    测序装置,所述测序装置对多个扩增子进行测序,所述多个扩增子通过扩增来自受试者的样品中的核酸获得并且包含SMN1基因的外显子7第840位点,所述测序产生多个包含SMN1基因的外显子7第840位点的reads;和A sequencing device that sequenced a plurality of amplicons obtained by amplifying nucleic acids in a sample from a subject and containing the 840th position of exon 7 of the SMN1 gene, The sequencing generates a plurality of reads containing the 840th position of exon 7 of the SMN1 gene; and
    权利要求1至10中任一项所述的分析装置。The analysis device according to any one of claims 1 to 10.
  12. 权利要求11所述的***,其中所述测序是高通量测序。The system of claim 11, wherein the sequencing is high-throughput sequencing.
  13. 权利要求12所述的***,其中所述高通量测序选自下组:对SMN1基因的测序、对SMN1外显子7的测序、包含SMN1基因或其第7外显子的Panel测序、全基因组测序(Whole genome sequencing,WGS)、全外显子组测序(Whole exom sequencing,WES)或临床外显子组测序(Clinical exom sequencing,CES)。The system of claim 12, wherein the high-throughput sequencing is selected from the group consisting of: sequencing of SMN1 gene, sequencing of SMN1 exon 7, sequencing of Panel containing SMN1 gene or its seventh exon, whole Genome Sequencing (Whole Genome Sequencing, WGS), Whole Exom Sequencing (WES), or Clinical Exom Sequencing (CES).
  14. 一种机器可读介质,其包含机器可读代码,所述代码在由机器实施时执行如下操作以检测受试者的SMN1基因中纯合突变的存在:A machine-readable medium, which contains machine-readable code that, when implemented by a machine, performs the following operations to detect the presence of a homozygous mutation in the SMN1 gene of a subject:
    (1)读取来自测序的信息,所述信息包括多个包含SMN1基因的外显子7第840位点的reads;(1) Read the information from sequencing, the information includes multiple reads containing the 840th position of exon 7 of the SMN1 gene;
    (2)计算(SMN1外显子7第840位点为碱基C的reads数)/(包含SMN1外显子7第840位点的总reads数)的比值;和(2) Calculate the ratio of (the number of reads with base C at position 840 of exon 7 of SMN1)/(the total number of reads including the number of reads at position 840 of SMN1 exon 7); and
    (3)当所述比值等于0或接近于0时,判定所述受试者为存在SMN1外显子7的纯合缺失的阳性受试者,否则将所述受试者判定为不存在SMN1外显子7的纯合缺失阴性受试者。(3) When the ratio is equal to or close to 0, the subject is determined to be a positive subject with a homozygous deletion of SMN1 exon 7, otherwise the subject is determined to be absent of SMN1 Subjects negative for homozygous deletion of exon 7.
  15. 权利要求14的机器可读介质,其中在执行第(2)步的所述计算之前,过滤掉平均质量值20以下的reads序列。The machine-readable medium of claim 14, wherein before the calculation in step (2) is performed, read sequences with an average quality value of 20 or less are filtered out.
  16. 权利要求14或15的机器可读介质,其中在执行第(2)步的所述计算之前,过滤掉SMN1外显子7第840位点的质量值小于20的reads。The machine-readable medium of claim 14 or 15, wherein before performing the calculation in step (2), reads with a mass value of less than 20 at position 840 of SMN1 exon 7 are filtered out.
  17. 权利要求14至16中任一项的机器可读介质,其中所述计算模块在执行所述计算之前,去除PCR扩增的重复序列。The machine-readable medium according to any one of claims 14 to 16, wherein the calculation module removes the repetitive sequence amplified by PCR before performing the calculation.
  18. 权利要求14至17中任一项的机器可读介质,其中当所述比值小于0.1时,判定所述受试者为存在SMN1外显子7的纯合缺失的阳性受试者,否则将所述受试者判定为不存在SMN1外显子7的纯合缺失阴性受试者。The machine-readable medium of any one of claims 14 to 17, wherein when the ratio is less than 0.1, it is determined that the subject is a positive subject with a homozygous deletion of SMN1 exon 7, otherwise the The subject was judged to be a negative subject without homozygous deletion of exon 7 of SMN1.
  19. 权利要求14至18中任一项的机器可读介质,其用于脊髓性肌萎缩症(SMA)的诊断。The machine readable medium according to any one of claims 14 to 18, which is used for the diagnosis of spinal muscular atrophy (SMA).
  20. 权利要求19的机器可读介质,其中所述SMA为SMA-I型、SMA-II型、SMA-III型和SMA-IV型。The machine-readable medium of claim 19, wherein the SMA is type SMA-I, type SMA-II, type SMA-III, and type SMA-IV.
  21. 权利要求14至20中任一项的机器可读介质,其用于鉴别性诊断SMA与其他同SMA具有类似表型的疾病。The machine-readable medium according to any one of claims 14 to 20, which is used for differential diagnosis of SMA and other diseases with similar phenotypes to SMA.
  22. 权利要求21的机器可读介质,其中所述同SMA具有类似表型的疾病为神经肌肉疾病。22. The machine-readable medium of claim 21, wherein the disease having a similar phenotype to SMA is a neuromuscular disease.
  23. 权利要求14至20中任一项的机器可读介质,其中所述读取模块读取的测序获得的信息包含的reads数为10至100万。The machine-readable medium according to any one of claims 14 to 20, wherein the information obtained by sequencing read by the reading module contains 100,000 to 1 million reads.
  24. 一种设备,其包含权利要求14至23中任一项的机器可读的介质。A device comprising the machine-readable medium of any one of claims 14 to 23.
PCT/CN2021/085974 2020-04-08 2021-04-08 Method and system for detecting smn1 gene mutation by means of high-throughput sequencing WO2021204205A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010271033.5A CN111292804B (en) 2020-04-08 2020-04-08 Method and system for detecting SMN1 gene mutation by means of high-throughput sequencing
CN202010271033.5 2020-04-08

Publications (1)

Publication Number Publication Date
WO2021204205A1 true WO2021204205A1 (en) 2021-10-14

Family

ID=71027665

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/085974 WO2021204205A1 (en) 2020-04-08 2021-04-08 Method and system for detecting smn1 gene mutation by means of high-throughput sequencing

Country Status (2)

Country Link
CN (1) CN111292804B (en)
WO (1) WO2021204205A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117409856A (en) * 2023-10-25 2024-01-16 北京博奥医学检验所有限公司 Mutation detection method, system and storable medium based on single sample to be detected targeted gene region second generation sequencing data
CN117904286A (en) * 2024-03-20 2024-04-19 北京致谱医学检验实验室有限公司 System and method for determining that SMN gene mutation is located in SMN1 gene

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111292804B (en) * 2020-04-08 2021-11-26 北京智因东方诊断科技有限公司 Method and system for detecting SMN1 gene mutation by means of high-throughput sequencing
CN112201306B (en) * 2020-09-21 2024-06-04 广州金域医学检验集团股份有限公司 True and false gene mutation analysis method based on high-throughput sequencing and application thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100233688A1 (en) * 2009-03-16 2010-09-16 Kaohsiung Medical University Method for diagnosing spinal muscular atrophy
CN106834502A (en) * 2017-03-06 2017-06-13 明码(上海)生物科技有限公司 A kind of spinal muscular atrophy related gene copy number detection kit and method based on gene trap and two generation sequencing technologies
CN110699436A (en) * 2018-07-10 2020-01-17 天津华大医学检验所有限公司 Method and system for determining whether number seven exon deletion exists in SMN1 gene of sample to be detected
CN111292804A (en) * 2020-04-08 2020-06-16 北京智因东方转化医学研究中心有限公司 Method and system for detecting SMN1 gene mutation by means of high-throughput sequencing

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8283116B1 (en) * 2007-06-22 2012-10-09 Ptc Therapeutics, Inc. Methods of screening for compounds for treating spinal muscular atrophy using SMN mRNA translation regulation
WO2017020024A2 (en) * 2015-07-29 2017-02-02 Progenity, Inc. Systems and methods for genetic analysis
CN105112541B (en) * 2015-09-22 2017-12-15 山东山大附属生殖医院有限公司 Human embryos spinal muscular atrophy mutator detection kit
CN107267613B (en) * 2017-06-28 2020-10-27 安吉康尔(深圳)科技有限公司 Sequencing data processing system and SMN gene detection system
US20190112640A1 (en) * 2017-10-13 2019-04-18 Genomic Vision Method for mapping spinal muscular atrophy (“sma”) locus and other complex genomic regions using molecular combing
CN108048548A (en) * 2017-11-07 2018-05-18 北京华瑞康源生物科技发展有限公司 People's spinal muscular atrophy Disease-causing gene copy number detects PCR kit for fluorescence quantitative
CN110066860A (en) * 2018-01-21 2019-07-30 刘维亮 DNA sequencing detects SMA short-cut method and process
CN108456726A (en) * 2018-04-19 2018-08-28 深圳会众生物技术有限公司 Spinal muscular atrophy genetic test probe, primer and kit
CN109486938A (en) * 2018-12-18 2019-03-19 武汉艾迪康医学检验所有限公司 Detect method, primer and the application of SMN1 and SMN2 gene mutation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100233688A1 (en) * 2009-03-16 2010-09-16 Kaohsiung Medical University Method for diagnosing spinal muscular atrophy
CN106834502A (en) * 2017-03-06 2017-06-13 明码(上海)生物科技有限公司 A kind of spinal muscular atrophy related gene copy number detection kit and method based on gene trap and two generation sequencing technologies
CN110699436A (en) * 2018-07-10 2020-01-17 天津华大医学检验所有限公司 Method and system for determining whether number seven exon deletion exists in SMN1 gene of sample to be detected
CN111292804A (en) * 2020-04-08 2020-06-16 北京智因东方转化医学研究中心有限公司 Method and system for detecting SMN1 gene mutation by means of high-throughput sequencing

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117409856A (en) * 2023-10-25 2024-01-16 北京博奥医学检验所有限公司 Mutation detection method, system and storable medium based on single sample to be detected targeted gene region second generation sequencing data
CN117409856B (en) * 2023-10-25 2024-03-29 北京博奥医学检验所有限公司 Mutation detection method, system and storable medium based on single sample to be detected targeted gene region second generation sequencing data
CN117904286A (en) * 2024-03-20 2024-04-19 北京致谱医学检验实验室有限公司 System and method for determining that SMN gene mutation is located in SMN1 gene
CN117904286B (en) * 2024-03-20 2024-06-04 北京致谱医学检验实验室有限公司 System and method for determining that SMN gene mutation is located in SMN1 gene

Also Published As

Publication number Publication date
CN111292804A (en) 2020-06-16
CN111292804B (en) 2021-11-26

Similar Documents

Publication Publication Date Title
WO2021204205A1 (en) Method and system for detecting smn1 gene mutation by means of high-throughput sequencing
JP6522554B2 (en) Determination of nucleic acid sequence imbalance
EP2562268B1 (en) Noninvasive diagnosis of fetal aneuploidy by sequencing
TWI445854B (en) Size-based genomic analysis
JP5881420B2 (en) Autism-related genetic markers
US20140228231A1 (en) Method to estimate age of individual based on epigenetic markers in biological sample
US20130338012A1 (en) Genetic risk factors of sick sinus syndrome
EP1910569A2 (en) Genemap of the human genes associated with longevity
WO2015026967A1 (en) Methods of using low fetal fraction detection
Ricci et al. Pooled genome-wide analysis to identify novel risk loci for pediatric allergic asthma
WO2017107545A1 (en) Scap gene mutant and application thereof
US20140336181A1 (en) Use of polymorphisms for identifying individuals at risk of developing autism
Ikeda et al. Identification of sequence polymorphisms in two sulfation-related genes, PAPSS2 and SLC26A2, and an association analysis with knee osteoarthritis
EP2971126B1 (en) Determining fetal genomes for multiple fetus pregnancies
CN113444838A (en) Molecular marker for detecting COVID-19 susceptibility, kit and application
CN116083562B (en) SNP marker combination and primer set related to aspirin resistance auxiliary diagnosis and application thereof
CN113913437B (en) Familial thoracic aortic aneurysm mutant gene and application thereof
Niu et al. Plasma proteome variation and its genetic determinants in children and adolescents
WO2017060727A1 (en) Epilepsy biomarker
JP5895317B2 (en) Method for examining bone / joint disease based on single nucleotide polymorphism of chromosome 10 long arm 24 region
KR102585879B1 (en) Single nucleotide polymorphism markers for determining of probability of skin hydration and use thereof
WO2014121180A1 (en) Genetic variants in interstitial lung disease subjects
Morin et al. Genetic and epigenetic links to asthma
JP2019033738A (en) Method of predicting risk of food allergy based on gene polymorphism and kit for examination
EP2893036B1 (en) Association of vascular endothelial growth factor genetic variant with metabolic syndrome

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21785307

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21785307

Country of ref document: EP

Kind code of ref document: A1