CN111304303B - Method for predicting microsatellite instability and application thereof - Google Patents

Method for predicting microsatellite instability and application thereof Download PDF

Info

Publication number
CN111304303B
CN111304303B CN202010098341.2A CN202010098341A CN111304303B CN 111304303 B CN111304303 B CN 111304303B CN 202010098341 A CN202010098341 A CN 202010098341A CN 111304303 B CN111304303 B CN 111304303B
Authority
CN
China
Prior art keywords
microsatellite
microsatellite instability
detection
sites
abundance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010098341.2A
Other languages
Chinese (zh)
Other versions
CN111304303A (en
Inventor
吴书昌
白健
王寅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Herui Gene Technology Co ltd
Original Assignee
Fujian Herui Gene Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Herui Gene Technology Co ltd filed Critical Fujian Herui Gene Technology Co ltd
Priority to CN202010098341.2A priority Critical patent/CN111304303B/en
Publication of CN111304303A publication Critical patent/CN111304303A/en
Application granted granted Critical
Publication of CN111304303B publication Critical patent/CN111304303B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6858Allele-specific amplification
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Organic Chemistry (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a microsatellite instability detection and prediction method, a related site screening method, a model construction method and application. The site screening method comprises the following steps: i) A step of determining a microsatellite instability prediction value of a candidate microsatellite locus in a sample; and/or ii) screening the sites or combinations of sites for microsatellite instability detection using one or more indicators including a microsatellite instability prediction value for candidate microsatellite sites; the microsatellite instability prediction value is a quantification value of the difference in abundance between the major allele (allele) type and the minor allele type in candidate microsatellite loci. The method disclosed by the invention has high accuracy, and the MSI state can be accurately and stably detected only on the basis of a single tumor sample.

Description

Method for predicting microsatellite instability and application thereof
Technical Field
The invention belongs to the technical field of gene detection, and particularly relates to a microsatellite instability detection and prediction method, a related site screening method, a model construction method, a marker combination, a kit, a system, a device, a computer-readable storage medium, equipment and a related microsatellite instability quantification method.
Background
Microsatellite (microsatellite) sequences are simple tandem repeat regions that exist at millions of loci (loci) on the human genome. Microsatellite instability (Microsatellite instability, MSI) refers to the phenomenon of microsatellite sequence lengthening or shortening due to DNA mismatch repair defects (dMMR). Such somatic mutations can cause the inactivation of oncogenes or the disruption of other non-coding regulatory sequences, thereby acting as carcinogenesis. MSI development has so far become a hotspot of clinical concern, mainly due to the impact of immunotherapy on the clinic. Cancer therapies have evolved extensively over the past few years, recognizing that different cancer subtypes with unique molecular phenotypes can be treated with novel targeting therapies. MSI is reported to be present as one of these unique molecular phenotypes in a variety of cancers, including colorectal cancer, endometrial cancer, gastric cancer, prostate cancer, ovarian cancer, glioblastoma, and the like. Among the various types of cancer, particularly colorectal cancer, MSI status, particularly high levels of MSI status (MSI-H), has been considered a prognostic biomarker, and subsequently studies have also reported that MSI status was used as a prognostic biomarker for a pan-cancerous immune screening point blocking response, and the U.S. Food and Drug Administration (FDA) approved pembrolizumab as an inhibitor of apoptosis 1 (PD-1) in 2017 for the treatment of all advanced solid tumor patients with high levels of microsatellite instability (MSI-H) or dhmr without other treatment options. Clinical trials indicate that immune checkpoint inhibitors can improve the efficacy of a variety of solid tumors, making MSI-H and dMMR the first full-cancer predictors of therapeutic response. Therefore, the exploration of tools and methods for detecting MSI status is of great importance in the clinical diagnosis and prognosis of tumors.
Three methods of MSI detection have been developed up to now, immunohistochemistry (Immunohistochemical stains, IHC), polymerase chain reaction (polymerase chain reaction, PCR) and second generation sequencing (next-generation sequencing, NGS). IHC method uses FFPE tumor tissue section to determine whether there is mismatch repair function defect in cell by judging the expression condition of 4 DNA mismatch repair proteins (MLH 1, PMS2, MSH2, MSH 6). If the result shows that any protein expression is absent, judging as dMMR; if all four proteins are expressed, they are interpreted as normal mismatch repair (pMMR). The method has the advantages of wide applicability and capability of determining which MMR proteins are lost in tumors, however, IHC has the problems that some qualitative protein changes cannot be detected, MMR results are occasionally reported wrong, and result interpretation is different from person to person. The PCR method is to identify the presence or absence of dMMR in tumor cells by comparing the change in microsatellite sequence length of tumor and non-tumor tissue DNA, and is widely recognized as a gold standard diagnostic tool for MSI detection. At least 5 sites are usually detected, 1 site instability is referred to as low level microsatellite instability (MSI-L), 2 and 2 sites instability is referred to as MSI-H, and 5 sites are stable, i.e., microsatellite stability (MSS). MSI-L and MSS are equivalent to the concept of pMMR, and MSI-H is equivalent to the concept of dMMR. The PCR detection MSI method not only makes up for the leak that IHC cannot detect MSI caused by non-truncated missense mutation, but also has good repeatability, but also has the limitation that the site of gene combination (panel) is less, the flux is lower, the MSI detection sensitivity caused by dinucleotide repeated sequence is lower, and the like. NGS methods are methods for MSI detection of a sample to be tested over a whole genome or whole exome range based on second generation sequencing. MSI is used as a biomarker to distinguish MSI mutation phenotype and is not used for finding out the determined MSI mutation position, so that the NGS method solves the problems of low PCR flux and small panel, can be used for detecting common sequencing data with target mutation detection, tumor mutation load (TMB) and the like, and the detection site can be automatically adjusted according to the detection purpose, so that the detection method is time-saving, labor-saving, flexible and convenient. In the NGS method reported so far, the PCR detection result is generally used as a gold standard, and the consistency of the two results is compared as a standard for evaluating the performance of the NGS detection method. However, the NGS methods are many and mostly pairing methods, such as the published MSI detection tool mSINGS, MSIsensor, MANTIS.
Along with the wide clinical application of MSI, compared with PCR and IHG methods, NGS has the advantages of high speed, high flux and wide coverage detection sites, and becomes the development trend of MSI clinical detection. The method has higher requirements on accuracy and feasibility, particularly the excellent performance requirements in panel detection are more urgent, however, most of the reported NGS methods detect MSI based on comparison of tumor tissues and control tissues, the detection standards are uneven, performance in panel application is not ideal, and clinical application of the NGS method is limited. By considering and researching the advantages of the existing NGS method and combining with panel analysis specific case data, a new algorithm can be utilized to obtain an MSI detection method suitable for panel analysis based on single tumor samples.
Disclosure of Invention
The invention aims at overcoming the defects of the prior art and provides a detection and prediction method for microsatellite instability (MSI), a related site screening method and a related model construction method. The inventor finds that a reasonable algorithm fitting MSI distribution can greatly improve the MSI detection performance of the existing NGS (next generation sequencing) method, and more importantly, compared with the existing method, the method breaks through the limitation that the existing NGS method needs to have a control sample, and has excellent performance in panel analysis application and higher sensitivity, specificity and accuracy. According to the principle of the method, the method can be embedded into various known or to-be-developed gene combinations (panel), simple repeated sequence quantification (repeat) is carried out on a microsatellite region in a panel detection range, an MSI predicted value of each MSI site is fitted by utilizing the algorithm in the method based on the quantitative result of the microsatellite region, the MSI detection site is noise-reduced by utilizing a random forest algorithm, and an MSI classification model is constructed (figure 1), so that accurate prediction can be realized.
In order to achieve the above object, the present invention provides a screening method of a microsatellite instability (MSI) detection site or a combination of microsatellite instability detection sites, the screening method comprising:
i) A step of determining a microsatellite instability prediction value of a candidate microsatellite locus in a sample; and/or
ii) screening the sites or combinations of sites for microsatellite instability detection using one or more indicators including a microsatellite instability prediction of candidate microsatellite sites;
the microsatellite instability prediction value is a quantification value of the difference in abundance between the major allele (allele) type and the minor allele type in candidate microsatellite loci.
In particular embodiments of the invention, the abundance of an allele can be calculated by simple repeat sequencing read length number (repeat count).
In particular embodiments of the invention, the abundance of an allele can be calculated by the number of reads (reads) supported by the genotype.
Optionally, the abundance of the allele falls within the interval 0-1 by normalization; preferably, the method of allele abundance normalization may be to divide the number of read support for the allele in the candidate microsatellite locus by the sum of the number of read support for all alleles for that locus.
In the present invention, the calculation is performed by selecting the allelic type with the repeat (repeat) length between 0 and 50 bp.
In a specific embodiment of the present invention, the screening method further comprises the step of calculating a microsatellite instability prediction value of the sample from the microsatellite instability prediction values of all candidate microsatellite loci in the sample.
Further, the microsatellite instability prediction value of the sample is calculated according to the following formula:
Figure BDA0002386038770000031
wherein f (X) represents a microsatellite instability predicted value of the sample, X i1 Representing normalized major allele abundance, X i2 Indicating normalized secondary genotype abundance, i is site number or sequence, n is number of sites.
In particular embodiments of the invention, the candidate microsatellite loci may be microsatellite loci in the whole genome, microsatellite loci in an exome, microsatellite loci contained in one or more regions in the genome, or microsatellite loci of interest.
In particular embodiments of the invention, the candidate microsatellite loci may be selected from the group consisting of single nucleotide repeats, dinucleotide repeats and/or complex repeats.
In a specific embodiment of the invention, the screening method further comprises a method of denoising the detection site by machine learning.
Further, the noise reduction method may be selected from random forest (PCA), principal Component Analysis (PCA), linear Discriminant Analysis (LDA), ridge regression, lasso regression, neural network, or algorithms derived from the above methods; random forests or algorithms derived from random forests are preferred.
Further, the algorithm derived from random forests may be selected from an isolated Forest (Isolation Forest) algorithm, a TRTE (Totally Random Trees Embedding) algorithm, or an extreme tree (ExtraTrees) algorithm.
In a specific embodiment of the invention, the microsatellite instability predicted value is used as input data for machine learning.
The screening method of the present invention can use only samples derived from tumor patients or tumor tissues; the screening methods of the present invention may not use samples derived from healthy individuals or normal tissues.
The invention also provides a construction method of the microsatellite instability (MSI) detection model, which comprises the following steps:
i) A step of determining a microsatellite instability prediction value of a candidate microsatellite locus in a sample; and/or
ii) constructing a microsatellite instability detection model using one or more indicators including a microsatellite instability prediction value of the candidate microsatellite locus;
The predicted value of microsatellite instability is a quantified value of the difference in abundance between the major and minor genotypes in the candidate microsatellite loci.
In a specific embodiment of the invention, the abundance of an allele is calculated by simple repeat sequencing read length number (repeat count).
In a specific embodiment of the invention, the abundance of an allele is calculated by the genotype reads (reads) support number.
Optionally, the abundance of the allele falls within the interval 0-1 by normalization; preferably, the abundance of an allele is normalized by dividing the number of reads for the allele in the candidate microsatellite locus by the sum of the number of reads for all alleles for that locus.
In a specific embodiment of the invention, the calculation can be performed by selecting the allelic type with a repeat (repeat) length between 0 and 50 bp.
In a specific embodiment of the present invention, the construction method further comprises the step of calculating a microsatellite instability prediction value of the sample from the microsatellite instability prediction values of all candidate microsatellite loci in the sample.
Further, the microsatellite instability prediction value of the sample may be calculated according to the following formula:
Figure BDA0002386038770000041
Wherein f (X) represents a microsatellite instability predicted value of the sample, X i1 Representing normalized major allele abundance, X i2 Indicating normalized secondary genotype abundance, i is site number or sequence, n is number of sites.
In particular embodiments of the invention, the candidate microsatellite loci may be microsatellite loci in the whole genome, microsatellite loci in an exome, microsatellite loci contained in one or more regions in the genome, or microsatellite loci of interest.
In particular embodiments of the invention, the candidate microsatellite loci may be selected from the group consisting of single nucleotide repeats, dinucleotide repeats and/or complex repeats.
In a specific embodiment of the present invention, the construction method may further comprise a method of denoising the detection sites used for the model by machine learning.
Further, the noise reduction method may be selected from random forest (PCA), principal Component Analysis (PCA), linear Discriminant Analysis (LDA), ridge regression, lasso regression, neural network, or algorithms derived from the above methods; random forests or algorithms derived from random forests are preferred.
Further, the algorithm derived from random forests may be selected from an isolated Forest (Isolation Forest) algorithm, a TRTE (Totally Random Trees Embedding) algorithm, or an extreme tree (ExtraTrees) algorithm.
In a specific embodiment of the invention, the microsatellite instability predicted value is used as input data for machine learning.
The construction method of the present invention may use only samples derived from tumor patients or tumor tissues; the construction method of the present invention may not use samples derived from healthy individuals or normal tissues.
In the present invention, a construction method of a microsatellite instability (MSI) detection model is also provided, and the construction method uses the microsatellite instability detection sites or detection site combinations obtained by the screening method of the present invention to construct a model.
Further, the construction method may be a machine learning method.
The invention also provides a method for detecting or predicting microsatellite instability, which comprises the following steps:
i) The method comprises the step of screening microsatellite instability detection sites or combination of detection sites; and/or
ii) detecting the microsatellite instability detection sites or the combination of detection sites obtained by screening by using the screening method of the invention; and/or
iii) The microsatellite instability detection model constructed by the construction method is used for detection.
The invention also provides a microsatellite instability detection marker combination, which is a microsatellite marker combination at a detection site obtained by the screening method of the invention.
Further, the detection marker combinations of the present invention may comprise the markers in table 2, table 5, table 6 or table 7.
The invention also provides application of the reagent for specifically detecting the microsatellite instability detection sites or detection site combinations in the preparation of microsatellite instability detection kits and/or tumor accompanying diagnosis kits, wherein the microsatellite instability detection sites or detection site combinations are microsatellite instability detection sites or detection site combinations obtained by adopting the screening method, or detection marker combinations or microsatellite instability detection sites or detection site combinations in the detection model constructed by the construction method.
The invention also provides a microsatellite instability detection kit and/or a tumor accompanying diagnosis kit, wherein the kit comprises a reagent for specifically detecting microsatellite instability detection sites or detection site combinations, and the microsatellite instability detection sites or detection site combinations are microsatellite instability detection sites or detection site combinations obtained by adopting the screening method, or detection marker combinations or microsatellite instability detection sites or detection site combinations in a detection model constructed by the construction method.
The present invention also provides a system or device for microsatellite instability detection and/or tumor companion diagnosis, the system or device comprising:
the acquisition module is used for acquiring the measurement data of a microsatellite instability detection site or detection site combination of a subject, wherein the microsatellite instability detection site or detection site combination is obtained by adopting the screening method of the invention, or the detection marker combination is obtained by adopting the detection marker combination of the invention, or the microsatellite instability detection site or detection site combination in the detection model constructed by the construction method of the invention, the measurement data is a microsatellite instability prediction value of the detection site, and the microsatellite instability prediction value is a quantification value of the abundance difference between a major allele type and a minor allele type in candidate microsatellite sites;
the data analysis module is used for inputting the detection data of the microsatellite instability detection sites or the detection site combinations into the detection model constructed by the construction method of the invention to obtain detection results.
In particular embodiments of the invention, the system or apparatus may further comprise: a sequencing module for sequencing the subject.
In a specific embodiment of the invention, the system or apparatus further comprises: a diagnostic module for generating tumor-associated diagnostic results and/or treatment recommendations.
The present invention also provides a computer-readable storage medium including a stored computer program, the computer program comprising:
i) A program for performing the screening method of the microsatellite instability detection site or combination of microsatellite instability detection sites of the present invention; and/or
ii) a program for executing the construction method of the microsatellite instability detection model of the present invention; and/or
iii) A program for performing the microsatellite instability detection or prediction method of the present invention.
The present invention also provides an apparatus comprising a processor, a memory and a computer program stored in the memory, the computer program comprising:
i) A program for performing the screening method of the microsatellite instability detection site or combination of microsatellite instability detection sites of the present invention; and/or
ii) a program for executing the construction method of the microsatellite instability detection model of the present invention; and/or
iii) A program for performing the microsatellite instability detection or prediction method of the present invention.
The invention also provides a method for quantifying microsatellite instability, which uses a microsatellite instability predicted value, wherein the microsatellite instability predicted value is a quantized value of abundance difference between a major allele type and a minor allele type in a microsatellite locus.
In a specific embodiment of the invention, the abundance of an allele is calculated by simple repeat sequencing read length number (repeat count).
In a specific embodiment of the invention, the abundance of an allele is calculated by the genotype reads (reads) support number.
Optionally, the abundance of the allele falls within the interval 0-1 by normalization; preferably, the abundance of an allele is normalized by dividing the number of reads supported for the allele in the microsatellite locus by the sum of the number of reads supported for all alleles at that locus.
In a specific embodiment of the invention, the calculation can be performed by selecting the allelic type with a repeat (repeat) length between 0 and 50 bp.
In a specific embodiment of the present invention, the quantification method further comprises the step of calculating a microsatellite instability prediction value of the sample from the microsatellite instability prediction values of all target microsatellite loci in the sample.
Further, the microsatellite instability prediction value of the sample is calculated according to the following formula:
Figure BDA0002386038770000071
wherein f (X) represents a microsatellite instability predicted value of the sample, X i1 Representing normalized major allele abundance, X i2 Indicating normalized secondary genotype abundance, i is site number or sequence, n is number of sites.
In particular embodiments of the invention, the microsatellite loci are microsatellite loci in the whole genome, microsatellite loci in an exome, microsatellite loci contained in one or more regions in the genome, or microsatellite loci of interest.
In a specific embodiment of the invention, the microsatellite loci are selected from the group consisting of single nucleotide repeats, dinucleotide repeats and/or complex repeats.
The methods of the invention may further comprise the step of sequencing to determine the nucleotide sequence of the sample.
The methods of the invention may use only samples derived from tumor patients or tumor tissue; the methods of the invention may not use samples derived from healthy individuals or normal tissue.
The technical scheme of the invention can be used in various diagnostic and non-diagnostic application scenes of cancers. The technical scheme of the invention can be applied to tumors of any stage, such as very early tumor, medium tumor and late tumor; preferably for early stage tumors or very early stage tumors.
The beneficial effects of the invention at least comprise the following aspects:
(1) The accuracy is high, and the model performance is excellent. The sensitivity, specificity and AUC can reach 1, the prediction result is completely consistent with the PCR method serving as a gold standard, and the detection rate can reach 100% when the LOD is detected and limited to 20%, 15% and 10% compared with the existing NGS method.
(2) The MSI state can be accurately and stably detected only based on a single tumor sample, the limitation caused by the loss or sampling difficulty of a control sample is eliminated, the sequencing cost is saved on the technical level, the operation steps are simplified, and the detection speed is improved.
(3) The method can flexibly embed detection flows of different gene detection panel products, flexibly screen MSI detection sites according to panel sizes, has excellent MSI detection performance and is flexible and wide in application. For a brand new panel product, all MSI detection sites in the panel detection range can be selected, effective detection sites in the panel detection range can be screened out by adopting the method, and MSI prediction is carried out on a sample of the panel by using the algorithm and the model of the invention. For other panel products, if the detection range of the panel product is contained in the existing panel screened by the method, the loci in the panel range can be directly extracted from the detection loci screened by the existing panel, and the loci can be used for MSI prediction; for the panel products with the detection range different from the existing panel, the method can be adopted to screen out high-efficiency sites from the head and construct a prediction model.
Drawings
FIG. 1 is a schematic diagram of a process for predicting the instability of a microsatellite according to an embodiment of the present invention.
FIG. 2 is a flow chart of the construction and performance evaluation of a microsatellite instability classification model according to an embodiment of the present invention.
FIG. 3 shows the results of performance evaluation (training set samples) of the model of the present invention. The left graph is a model ROC curve; the right plot is a classification scatter plot.
FIG. 4 shows the results of performance evaluation of the inventive model (158 samples). The left graph is a data summary; the right plot is a classification scatter plot.
Detailed Description
Unless otherwise indicated, all terms used herein have the meaning commonly found in the art, and all reagents used are conventional commercial reagents in the art.
The term "genotype reads (or read length) support" as used herein refers to the number of reads obtained by sequencing that correspond to the sequence of a genotype. For microsatellite instability, the genotype is a simple series of repeat species present at the same position on the chromosome, each simple series of repeat species being referred to as a genotype. The term "simple repeat sequencing read length number" (repeatcount) refers to the number of read supports per simple repeat.
The term "sensitivity" in the present invention may refer to the number of true positives divided by the sum of the number of true positives and false negatives, and may be used to characterize the ability to correctly identify a population that is truly suffering from cancer.
The term "specificity" in the present invention may refer to the number of true negatives divided by the sum of the number of true negatives and false positives, and may be used to characterize the ability to correctly identify a population that is truly free of cancer.
The term "ROC" or "ROC curve" in the present invention may refer to a subject's working characteristic curve (receiver operating characteristic curve) that may be used to characterize the performance of a classifier. ROC curves can be generated by plotting sensitivity versus specificity at various threshold settings.
The term "AUC" in the present invention may refer to the area under the ROC curve and may be used to characterize the performance of cancer screening/prediction. AUC ranges from 0.5 to 1.0, with values closer to 1.0 indicating better screening/predictive performance of the method.
The technical contents of the present invention will be described in detail with reference to the accompanying drawings and specific embodiments. It will be appreciated by those skilled in the art that the following examples are illustrative of the present invention and should not be construed as limiting the scope of the invention.
Example 1
Clinical patient tissue samples, including tumor test samples and paired healthy human control (normal) samples, were collected for testing the MSI detection methods of the present invention and validated using PCR MSI detection methods and compared to other MSI detection tools known in the art. The PCR MSI detection result is used as a gold standard, and the consistency of the result with the MSI detection method is compared to evaluate the performance of the MSI detection method. It should be noted that the MSI detection method proposed by the present invention does not require the use of control samples, which are used in the examples section for PCR validation and other MSI detection software analysis.
As for MSI sites used for the test, a large gene combination (panel) containing 654 genes related to tumors such as rectal cancer, gastric cancer, ovarian cancer and pulmonary sarcoma cancer, covering 550 MSI sites was used for the test in this example.
The kit for detecting MSI by the PCR method is a Promega kit, MSI state information of each sample is obtained by PCR detection, and the result is used as a gold standard. The results of PCR were divided into two groups: samples detected as MSI-H (microsatellite highly unstable) are grouped and scored as MSI-H (positive); samples detected as MSI-L (microsatellite low instability) and MSS (microsatellite stability) were grouped and scored as MSS (negative). The purpose of such grouping is to correlate the PCR detection results with the results of NGS-based MSI detection methods (NGS detection methods have only two types of detection results, MSI-H and MSS, as reported in the prior art).
Example 2
The 4 gene combinations of example 1 were subjected to exon probe capture library sequencing, and 150bp Pair-End mode sequencing (Read 1:151, read2:151; index1:8, index 2:8) was performed according to the instrument standard protocol using a gene sequencer (NextSeq CN 500), finally fastq format second generation sequencing data was obtained as raw data (raw data).
And performing quality control on the obtained second-generation sequencing data by using quality control software fastp, filtering sequencing joints, low-quality bases, sequencing error fragments and the like, and filtering to obtain high-quality data (clean data).
Comparing the clean data with a reference genome hg19 by using comparison software bwa to obtain corresponding specific position information on each DNA fragment genome; the data were then deduplicated and base corrected using genecore software.
Based on the known repeat (repeat) locations of the reference genome detection region, the number of reads (or read lengths) supported for each repeat type is obtained. First, a genomic MSI candidate locus bed file is generated using the published MANTIS self-contained tool RepeatFinder, loci are read from the bed file, and the differences generated by the 0-/1-base indexes are aligned using the supplied reference genomic file. Wherein, the microsatellite repeated sequence with the length of 10-100, single base and more than 5 times of repeated times is selected. Secondly, quality control is carried out on reads matched with all repeat types on each loci, the average base quality of each read is not less than 20, the average base quantity of each read falling in a loci interval is not less than 25, the length of the reads (only counting non-shearing parts and not counting soft-clips or hard-clips) is greater than 35bp, the support number of the minimum repeat reads is greater than 1, and other involved filtering steps are software default parameters. Finally, counting the number of reads qualified for quality control for each repeat type at all sites of tumor patient and control, respectively, generating a simple repeat sequencing read length number (repeat count) result for subsequent analysis (since MANTIS is an analysis of paired samples, tumor and control samples need to be input simultaneously, and the method of the invention does not need to use control samples, so in order not to affect software operation, tumor samples are repeatedly counted once as control samples, and finally the statistical result of tumor samples is used).
Example 3
From the repeat result obtained in example 2, the allele type representing the position is obtained at each locus, the difference between the primary and secondary abundance values is calculated as the predicted value of the microsatellite instability level for each loci, and the average of the (1-unstable level predicted values) of all loci in the sample is used as the predicted value of microsatellite instability (MSIscore) of the sample, and the formula of the non-parametric algorithm is as follows:
Figure BDA0002386038770000111
wherein X is i1 Representing the main peak value, X i2 The secondary peak is represented, i is the loci order, n is the loci number.
Specifically, counting the repeat count of each detection position with the repeat length between 0 and 50bp, normalizing the repeat count to be within a range of 0 to 1, wherein the normalization mode is that the number of reads support of each repeat type of the position is divided by the sum of the numbers of reads support of all repeat types of the position, and sorting according to normalized numerical values from large to small, wherein the maximum value is a main peak value, and the secondary value is a secondary peak value.
Example 4
153 samples of tumor tissue among the samples collected in example 1 were prepared according to 8:2 and the MSI state information (MSI-H/MSS) measured by the PCR method is randomly and hierarchically sampled and divided into 122 cases of training sets and 31 cases of test sets, and then initial training set samples are further processed according to the training sets: the validation set was 8:2 and MSI status information (MSI-H/MSS) are randomly layered sampled 20 times to obtain samples of 20 sampling results (fig. 2), and 97 samples of a training set and 25 samples of a verification set are obtained in each sampling. The above samples did not contain any control samples.
The following is performed on each sample taken, with 550 candidate sites covered by the large panel of example 1 for each sample, the MSIscore is calculated using the parameterisation algorithm of example 3, and the set of samples taken each time is sorted into a matrix of training and validation sets containing the MSIscore for each sample at each detection site. For the matrix sorted by each sampling set, the weight of each candidate site is calculated by using a random forest algorithm, all candidate sites are ordered according to the times that the weight is not 0 in 20 samples, and 10 gradients are divided according to the times that the weight occurs to be greater than or equal to 0-9 (as shown in table 1).
TABLE 1 weight gradient partitioning results
Number of times of occurrence of weight >=0 >=1 >=2 >=3 >=4 >=5 >=6 >=7 >=8 >=9
Detecting point count 550 381 265 200 156 129 108 98 83 71
And carrying out prediction classification on the training set according to the detection site construction model selected by each gradient, dividing a threshold value by the training set, carrying out prediction classification on the verification set, and evaluating the performance of the evaluation model by using the prediction classification effect of the verification set. According to the result of each gradient test, drawing ROC curve to evaluate the performance of each model, considering the classification performance and stability, selecting the model of 200 detection sites contained in the gradient with the number of times of weight occurrence > =3, and defining the threshold according to the test result of the training set on the selected candidate site. The detection site selection results are shown in the following table:
TABLE 2 detection site screening results
Figure BDA0002386038770000121
Table 2 (subsequent)
Figure BDA0002386038770000131
Table 2 (subsequent)
Figure BDA0002386038770000141
The performance of the ROC curve evaluation model is drawn according to 97 training set sample prediction data, and the result is shown in figure 3, and the sensitivity, the specificity and the AUC of the model constructed by the method are all 1, so that the accurate prediction of MSI is realized.
Further, 158 samples were also used to evaluate model performance, with MSIscore > = threshold determined to be MSI-H and vice versa for MSS. The results are shown in table 3 and fig. 4:
table 3 model predictive performance:
Figure BDA0002386038770000151
MSI-H samples in 158 samples are correctly detected by the model constructed by the invention, and the MSI-H samples are completely consistent with the PCR detection result, and the consistency of the MSI-H samples and the PCR detection result is up to 100%.
To verify the highest sensitivity range of the method of the invention, we collected MSI-positive (MSI-H) reference KM12 and MSI-negative (MSS) reference NA12878, mixed MSI-H and MSS cell lines in proportion, diluted MSI-H cell line concentrations (20%, 15%, 10%, 5% respectively) to determine the lowest MSI detection limit. MSI was detected on-machine sequencing of the above cell samples using the large panel of example 1 as the sequencing panel. As shown in Table 4, MSI-H detection rate can reach 100% under the conditions of LOD 20%, LOD 15% and LOD 10%; in the case of LOD 5%, MSI-H detection rate can still reach 83%.
Table 4 detection rate at different LODs:
dilution concentration (%) 20 15 10 5
Detection rate (%) 100 100 100 83
Example 5
In order to evaluate the generalization performance of the model, three genes panel of medium, small and micro size are introduced, which respectively comprise 457, 86 and 31 genes related to tumors such as rectal cancer, gastric cancer, ovarian cancer, lung sarcoma cancer and the like, and 146, 109 and 41 MSI sites are respectively covered. 173 new samples were sequenced on-machine using the three panels described above, respectively.
Of the 200 MSI sites selected in example 4, 61, 33, and 7 sites were found in the middle, small, and mini panels, respectively, and the results are shown in tables 5-6 and SEQ ID Nos. 1-7 in Table 7.
Table 5 detection site screening results (medium panel):
Figure BDA0002386038770000152
table 5 (subsequent)
Figure BDA0002386038770000161
Table 6 detection site screening results (mini):
Figure BDA0002386038770000162
watch 6 (subsequent)
Figure BDA0002386038770000171
Table 7 detection site screening results (mini):
Figure BDA0002386038770000172
sequencing data from 173 new samples were analyzed using the above sites and the results are shown in table 8. For medium-sized, small-sized and miniature panels, all 11 MSI-H samples in 173 samples can be accurately detected by directly using 61, 33 or 7 screened sites, and the sensitivity and the specificity reach 1.
Table 8 model predictive performance in medium, small, mini panel:
Figure BDA0002386038770000173
Example 6
To test the performance improvement produced by the method of the present invention over prior art methods, it was compared to prior art MSI detection models. MANTIS, MSISensor and mSINGS are the three most commonly used MSI detection methods in the prior art, wherein the sensitivity and specificity of MANTIS are considered to be optimal (Performance evaluation for rapid detection of pan-cancer microsatellite instability with MANTIS.Oncostarget, 2017, vol.8, (No. 5), pp: 7452-7463), thus MANTIS was selected as a comparative example, MSI was predicted based on the sequencing data of the aforementioned 173 samples in 3 panels, and the MSI detection sites used by MANTIS software were all MSI detection sites covered by the 3 panel gene regions. The results are shown in Table 9:
table 9 model predictive performance in medium, small, mini panel:
Figure BDA0002386038770000181
therefore, compared with the prior art, the MSI detection model is constructed by adopting the method provided by the invention, and higher sensitivity and specificity can be obtained under the condition of using fewer detection sites.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (44)

1. A screening method for a microsatellite instability detection site or a combination of microsatellite instability detection sites, the screening method comprising:
i) A step of determining a microsatellite instability prediction value of a candidate microsatellite locus in a sample; and/or
ii) screening the sites or combinations of sites for microsatellite instability detection using one or more indicators including a microsatellite instability prediction of candidate microsatellite sites;
the step of determining the microsatellite instability predicted value of the candidate microsatellite loci in the sample comprises the following steps: carrying out exon probe capturing, library building and sequencing on the gene combination covering the candidate sites, and carrying out 150bp Pair-End mode sequencing according to instrument standard operation rules by using a gene sequencer to finally obtain fastq format second generation sequencing data serving as original data raw data; performing quality control on the obtained second generation sequencing data by using quality control software fastp, filtering a sequencing joint, low-quality bases and a sequencing error fragment, and filtering to obtain high-quality data clean data; comparing the clean data with a reference genome hg19 by using comparison software bwa to obtain corresponding specific position information on each DNA fragment genome; then using genecore software to perform de-duplication and base correction treatment on the comparison data;
Obtaining the read support number of each repeat type according to the repeat position of the known reference genome detection region; firstly, generating a genome MSI candidate locus bed file by using published software MANTIS self-contained tool repeat, reading locus from the bed file, and using the provided reference genome file to compare and correct the difference generated by 0-/1-base indexes; wherein, selecting microsatellite repeated sequences with the length of 10-100, single base and more than 5 times of repeated times; secondly, quality control is carried out on reads matched with all repeat types on each loci, the average base quality of each read is not less than 20, the average base quantity of each read falling in a loci interval is not less than 25, the length of the reads is more than 35bp, the minimum repeat number is more than 1, and other involved filtering steps are software default parameters; finally, counting the number of reads qualified in quality control corresponding to each repeat type at all sites of the tumor patient, and generating a simple repeat sequence sequencing read long number repeat count result for subsequent analysis;
the predicted value of the microsatellite instability of the candidate microsatellite locus is a quantized value of the abundance difference between the major allele type and the minor allele type in the candidate microsatellite locus, and the calculation formula is as follows: microsatellite instability prediction value of site=1- |x i1 -X i2 I (I); wherein i is site number or site order, X i1 Representing normalized major allele abundance, X i2 Representing normalized minor genotype abundance;
the screening method further comprises: calculating the microsatellite instability predicted value of the sample according to the following formula by using the microsatellite instability predicted values of all candidate microsatellite loci in the sample:
Figure FDA0004052357520000021
wherein f (X) represents a microsatellite instability predicted value of the sample, X i1 Representing normalized major allele abundance, X i2 Representing normalized secondary genotype abundance, i being the site number or the site order, n being the number of sites;
the screening method further comprises the steps of: and carrying out prediction classification on the training set according to the selected detection site construction model, dividing a threshold value by the training set, and carrying out prediction classification on the verification set.
2. The screening method of claim 1, wherein the abundance of an allele is calculated by simple repeat sequencing read a long number of repeat counts.
3. The screening method of claim 2, wherein the abundance of the allele is calculated by genotyping reads support numbers.
4. The screening method of claim 2, wherein the abundance of the allele is within the interval 0-1 by normalization; the method of allele abundance normalization is to divide the number of reads for the allele in the candidate microsatellite locus by the sum of the number of reads for all alleles for that locus.
5. The screening method according to claim 2, wherein the calculation is performed by selecting the allele type having the repeat length of between 0 and 50 bp.
6. The screening method of any one of claims 1-5, wherein the candidate microsatellite loci are microsatellite loci in a whole genome, microsatellite loci in an exome, microsatellite loci contained in one or more regions in a genome, or microsatellite loci of interest.
7. The screening method of any one of claims 1-5, wherein the candidate microsatellite loci are selected from the group consisting of single nucleotide repeats, dinucleotide repeats and/or complex repeats.
8. The screening method of any one of claims 1-5, further comprising a method of denoising the detection site by machine learning.
9. The screening method of claim 8, wherein the noise reduction method is selected from random forest, principal component analysis, linear discriminant analysis, ridge regression, lasso regression, neural network, or algorithms derived from the above methods.
10. The screening method of claim 9, wherein the noise reduction method is selected from a random forest or an algorithm derived from a random forest.
11. The screening method of claim 10, wherein the algorithm derived from random forests is selected from an isolated forest algorithm, a TRTE algorithm, or a limit tree algorithm.
12. The screening method according to claim 8, wherein the microsatellite instability prediction value is used as input data for machine learning.
13. The screening method according to any one of claims 1 to 5, wherein the screening method uses only samples derived from tumor patients or tumor tissues; or the screening method does not use samples derived from healthy individuals or normal tissue.
14. The construction method of the microsatellite instability detection model is characterized by comprising the following steps:
i) A step of determining a microsatellite instability prediction value of a candidate microsatellite locus in a sample; and/or
ii) constructing a microsatellite instability detection model using one or more indicators including a microsatellite instability prediction value of the candidate microsatellite locus;
the predicted value of the microsatellite instability is a quantized value of the abundance difference between the major allele type and the minor allele type in candidate microsatellite loci; the step of determining a predicted value of microsatellite instability of a candidate microsatellite locus in a sample and the quantized value in the screening method according to claim 1;
The microsatellite instability detection model comprises: calculating the microsatellite instability predicted value of a sample according to the microsatellite instability predicted values of all candidate microsatellite loci in the sample, wherein the microsatellite instability predicted value of the sample is calculated according to the following formula:
Figure FDA0004052357520000031
wherein f (X) represents a microsatellite instability predicted value of the sample, X i1 Representing normalized major allele abundance, X i2 Representing normalized secondary genotype abundance, i being the site number or the site order, n being the number of sites;
the construction method further comprises the following steps: and carrying out prediction classification on the training set according to the selected detection site construction model, dividing a threshold value by the training set, and carrying out prediction classification on the verification set.
15. The method of construction of claim 14, wherein the abundance of the allele is calculated by simple repeat sequencing read a long number of repeat counts.
16. The method of claim 15, wherein the abundance of the allele is calculated by genotyping reads support numbers.
17. The method of construction according to any one of claims 14 to 16, wherein the abundance of the allele is within the interval 0-1 by normalization; the method of allele abundance normalization is to divide the number of reads for the allele in the candidate microsatellite locus by the sum of the number of reads for all alleles for that locus.
18. Construction method according to any one of claims 14 to 16, characterized in that the calculation is performed by selecting the allelic type with repeat length between 0 and 50 bp.
19. The method of construction according to any one of claims 14 to 16, wherein the candidate microsatellite loci are microsatellite loci in the whole genome, microsatellite loci in an exome, microsatellite loci contained in one or more regions in the genome or microsatellite loci of interest.
20. The method of any one of claims 14 to 16, wherein the candidate microsatellite loci are selected from the group consisting of single nucleotide repeats, dinucleotide repeats and/or complex repeats.
21. The method of construction according to any one of claims 14 to 16, further comprising a method of denoising the detection sites used for the model by machine learning.
22. The method of construction according to claim 21, wherein the noise reduction method is selected from random forest, principal component analysis, linear discriminant analysis, ridge regression, lasso regression, neural network or algorithms derived from the above methods.
23. Construction method according to claim 22, characterized in that the noise reduction method is selected from random forests or algorithms derived from random forests.
24. The method of claim 23, wherein the random forest derived algorithm is selected from an isolated forest algorithm, a TRTE algorithm, or a limit tree algorithm.
25. The method of claim 21, wherein the microsatellite instability prediction value is used as input data for machine learning.
26. The method of construction according to any one of claims 14 to 16, wherein the method of construction uses only samples derived from tumor patients or tumor tissue; or the construction method does not use samples derived from healthy individuals or normal tissues.
27. A method for constructing a model for detecting a microsatellite instability, characterized in that the method uses the microsatellite instability detection site or combination of detection sites obtained by screening according to any one of claims 1 to 13.
28. The building method of claim 27, wherein the building method is a machine learning method.
29. A method for detecting or predicting microsatellite instability, the method comprising:
i) A step comprising screening for a microsatellite instability detection site or combination of detection sites according to the screening method of any one of claims 1-13; and/or
ii) detecting the microsatellite instability detection sites or combinations of detection sites obtained by screening using the screening method according to any one of claims 1 to 13; and/or
iii) Detecting using the microsatellite instability detection model constructed by the construction method according to any one of claims 14 to 28;
the detection or prediction method is not directed to the diagnosis or treatment of a disease.
30. Use of a reagent for specifically detecting a microsatellite instability detection site or a combination of detection sites in the preparation of a microsatellite instability detection kit and/or a tumor accompanying diagnostic kit, characterized in that the microsatellite instability detection site or combination of detection sites is a microsatellite instability detection site or combination of detection sites obtained according to the screening method of any one of claims 1 to 13 or a microsatellite instability detection site or combination of detection sites in a detection model constructed according to the construction method of any one of claims 14 to 28.
31. A device for microsatellite instability detection and/or tumor-associated diagnosis, the device comprising:
an acquisition module for acquiring measurement data of a microsatellite instability detection site or a combination of detection sites of a subject, the microsatellite instability detection site or the combination of detection sites being a microsatellite instability detection site or a combination of detection sites obtained according to the screening method of any one of claims 1 to 13 or a microsatellite instability detection site or a combination of detection sites in a detection model constructed according to the construction method of any one of claims 14 to 28, the measurement data being a microsatellite instability prediction value of a detection site, the microsatellite instability prediction value being a quantification value of an abundance difference between a major allele type and a minor allele type in a candidate microsatellite site;
A data analysis module for inputting the measurement data of the microsatellite instability detection sites or the combination of detection sites into a detection model constructed according to the construction method of any one of claims 14 to 28 to obtain a detection result.
32. The apparatus of claim 31, wherein the apparatus further comprises: a sequencing module for sequencing the subject.
33. The apparatus according to claim 31 or 32, characterized in that the apparatus further comprises: a diagnostic module for generating tumor-associated diagnostic results and/or treatment recommendations.
34. A computer-readable storage medium, the computer-readable storage medium comprising a stored computer program, the computer program comprising:
i) A program for performing a screening method of a microsatellite instability detection site or a combination of microsatellite instability detection sites according to any one of claims 1-13; and/or
ii) a program for executing the construction method of the microsatellite instability detection model according to any one of claims 14 to 28; and/or
iii) Program for performing a method for detecting or predicting microsatellite instability according to claim 29.
35. An apparatus for detecting microsatellite instability comprising a processor, a memory and a computer program stored in said memory, said computer program comprising:
i) A program for performing a screening method of a microsatellite instability detection site or a combination of microsatellite instability detection sites according to any one of claims 1-13; and/or
ii) a program for executing the construction method of the microsatellite instability detection model according to any one of claims 14 to 28; and/or
iii) Program for performing a method for detecting or predicting microsatellite instability according to claim 29.
36. A method for quantifying microsatellite instability, characterized in that a microsatellite instability predicted value is used for quantifying the abundance difference between a major allele type and a minor allele type in a microsatellite locus; the quantization value is the quantization value in the screening method of claim 1.
37. The method of quantification of claim 36, wherein the abundance of the allele is calculated by simple repeat sequencing read a long number of repeat counts.
38. The method of quantification of claim 37, wherein the abundance of the allele is calculated by genotyping reads support numbers.
39. The method of quantification according to any of the claims 36 to 38, wherein the abundance of the allele is within the interval 0-1 by normalization; the abundance of an allele is normalized by dividing the number of reads for the allele in the microsatellite locus by the sum of the number of reads for all alleles at that locus.
40. The method of quantification according to any of the claims 36 to 38, wherein the calculation is performed by selecting the allelic type with repeat length between 0 and 50 bp.
41. The method of any one of claims 36 to 38, further comprising the step of calculating a microsatellite instability prediction value for the sample from the microsatellite instability prediction values for all target microsatellite loci in the sample.
42. The method of claim 41, wherein the sample's microsatellite instability prediction is calculated according to the following equation:
Figure FDA0004052357520000061
wherein f (X) represents a microsatellite instability predicted value of the sample, X i1 Representing normalized major allele abundance, X i2 Indicating normalized secondary genotype abundance, i is site number or sequence, n is number of sites.
43. The method of quantification of any of claims 36 to 38, wherein the microsatellite loci are microsatellite loci in a whole genome, microsatellite loci in an exome, microsatellite loci contained in one or more regions in a genome or microsatellite loci of interest.
44. The method of quantification according to any of the claims 36 to 38, wherein the microsatellite loci are selected from the group consisting of single nucleotide repeats, dinucleotide repeats and/or complex repeats.
CN202010098341.2A 2020-02-18 2020-02-18 Method for predicting microsatellite instability and application thereof Active CN111304303B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010098341.2A CN111304303B (en) 2020-02-18 2020-02-18 Method for predicting microsatellite instability and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010098341.2A CN111304303B (en) 2020-02-18 2020-02-18 Method for predicting microsatellite instability and application thereof

Publications (2)

Publication Number Publication Date
CN111304303A CN111304303A (en) 2020-06-19
CN111304303B true CN111304303B (en) 2023-05-05

Family

ID=71156619

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010098341.2A Active CN111304303B (en) 2020-02-18 2020-02-18 Method for predicting microsatellite instability and application thereof

Country Status (1)

Country Link
CN (1) CN111304303B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111785324B (en) * 2020-07-02 2021-02-02 深圳市海普洛斯生物科技有限公司 Microsatellite instability analysis method and device
CN112442540B (en) * 2021-01-27 2021-05-14 上海仁东医学检验所有限公司 Microsatellite instability detection method, marker combination, kit and application
CN113151476B (en) * 2021-05-07 2022-08-09 北京泛生子基因科技有限公司 Microsatellite unstable site combination based on second-generation sequencing data, method and application thereof
CN113921079B (en) * 2021-12-06 2022-03-18 四川省肿瘤医院 MSI prediction model construction method based on immune related gene
CN114150067B (en) * 2022-02-07 2022-05-17 元码基因科技(北京)股份有限公司 Method, system and probe set for determining combination of sites for detecting microsatellite instability state
CN114708916B (en) * 2022-03-15 2023-11-10 至本医疗科技(上海)有限公司 Method and device for detecting stability of microsatellite, computer equipment and storage medium
CN115132327B (en) * 2022-05-25 2023-03-24 中国医学科学院肿瘤医院 Microsatellite instability prediction system, construction method thereof, terminal equipment and medium
CN115954049B (en) * 2023-03-13 2023-05-09 广州迈景基因医学科技有限公司 Microsatellite unstable locus state detection method, system and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106755501A (en) * 2017-01-25 2017-05-31 广州燃石医学检验所有限公司 It is a kind of to be based on detection microsatellite locus stability and the method for genome change while the sequencing of two generations
CN107526944A (en) * 2017-09-06 2017-12-29 南京世和基因生物技术有限公司 Sequencing data analysis method, device and the computer-readable medium of a kind of microsatellite instability
CN109830265A (en) * 2019-01-18 2019-05-31 臻悦生物科技江苏有限公司 Detect kit, reference database, preparation method and the application of MSI

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10294529B2 (en) * 2012-04-10 2019-05-21 Life Sciences Research Partners Vzw Microsatellite instability markers in detection of cancer
GB201614474D0 (en) * 2016-08-24 2016-10-05 Univ Of Newcastle Upon Tyne The Methods of identifying microsatellite instability

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106755501A (en) * 2017-01-25 2017-05-31 广州燃石医学检验所有限公司 It is a kind of to be based on detection microsatellite locus stability and the method for genome change while the sequencing of two generations
WO2018137678A1 (en) * 2017-01-25 2018-08-02 广州燃石医学检验所有限公司 Second generation sequencing-based method for simultaneously detecting microsatellite locus stability and genomic changes
CN107526944A (en) * 2017-09-06 2017-12-29 南京世和基因生物技术有限公司 Sequencing data analysis method, device and the computer-readable medium of a kind of microsatellite instability
CN109830265A (en) * 2019-01-18 2019-05-31 臻悦生物科技江苏有限公司 Detect kit, reference database, preparation method and the application of MSI

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Esko A.Kautto.Performance evaluation forrapid detection of pan-cancer microsatellite instability with MANTIS.Oncotarget.2016,第第8卷卷(第第8卷期),第7452-7463页. *
Laura等.Molecular and computational methods for the detection of microsatellite instability in cancer.Fronties in oncology.2018,第第8卷卷第1-11页. *
盛剑秋,田素丽,吕扬,陈香宇,李世荣.遗传性非息肉病性结直肠癌的微卫星不稳定研究.胃肠病学和肝病学杂志.2004,第13卷(第05期),第537-539页. *
陈玮 ; 赵丹 ; 李晓东 ; 何小雨 ; 李瑞琳 ; 牛北方 ; .肿瘤微卫星不稳定检测方法综述.计算机***应用.2018,(第10期),第43-49页. *

Also Published As

Publication number Publication date
CN111304303A (en) 2020-06-19

Similar Documents

Publication Publication Date Title
CN111304303B (en) Method for predicting microsatellite instability and application thereof
US11043283B1 (en) Systems and methods for automating RNA expression calls in a cancer prediction pipeline
CA3129831A1 (en) An integrated machine-learning framework to estimate homologous recombination deficiency
CN112805563A (en) Cell-free DNA for assessing and/or treating cancer
JP6955035B2 (en) Systems and methods for determining microsatellite instability
CN106676178B (en) Method and system for evaluating tumor heterogeneity
CN111968701B (en) Method and device for detecting somatic copy number variation of designated genome region
CN110846411B (en) Method for distinguishing gene mutation types of single tumor sample based on next generation sequencing
CN110910957A (en) Single-tumor-sample-based high-throughput sequencing microsatellite instability detection site screening method
CN113674803A (en) Detection method of copy number variation and application thereof
US20200109457A1 (en) Chromosomal assessment to diagnose urogenital malignancy in dogs
CN116580768B (en) Tumor tiny residual focus detection method based on customized strategy
CN113096728A (en) Method, device, storage medium and equipment for detecting tiny residual focus
Siegmund et al. Deriving tumor purity from cancer next generation sequencing data: applications for quantitative ERBB2 (HER2) copy number analysis and germline inference of BRCA1 and BRCA2 mutations
CN109461473B (en) Method and device for acquiring concentration of free DNA of fetus
CN113789371A (en) Method for detecting copy number variation based on batch correction
CN112837748A (en) System and method for distinguishing tumors of different anatomical origins
JPWO2019132010A1 (en) Methods, devices and programs for estimating base species in a base sequence
US20210310050A1 (en) Identification of global sequence features in whole genome sequence data from circulating nucleic acid
CN114420214A (en) Quality evaluation method and screening method of nucleic acid sequencing data
CN114220484A (en) Identification method of individual differential expression protein
CN115798584B (en) Method for simultaneously detecting forward and reverse mutation of EGFR gene T790M and C797S
CN115472294B (en) Model for predicting transformation speed of small cell transformation lung adenocarcinoma patient and construction method thereof
US20240011105A1 (en) Analysis of microbial fragments in plasma
WO2023246808A1 (en) Use of cancer-associated short exons to assist cancer diagnosis and prognosis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210521

Address after: No.28, Fazhan Road, Wenwusha Town, Changle District, Changle City, Fuzhou City, Fujian Province, 350200

Applicant after: Fujian Herui Jingchuang Gene Technology Co.,Ltd.

Address before: 350000 R & D building 7, 33 Donghu Road, digital Fujian Industrial Park, Changle City, Fuzhou City, Fujian Province

Applicant before: Fujian Herui Gene Technology Co.,Ltd.

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220402

Address after: 350000 R & D building 7, 33 Donghu Road, digital Fujian Industrial Park, Changle City, Fuzhou City, Fujian Province

Applicant after: Fujian Herui Gene Technology Co.,Ltd.

Address before: No.28, Fazhan Road, Wenwusha Town, Changle District, Changle City, Fuzhou City, Fujian Province, 350200

Applicant before: Fujian Herui Jingchuang Gene Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant