WO2023078283A1 - Methylation biomarker for breast cancer diagnosis and use thereof - Google Patents

Methylation biomarker for breast cancer diagnosis and use thereof Download PDF

Info

Publication number
WO2023078283A1
WO2023078283A1 PCT/CN2022/129181 CN2022129181W WO2023078283A1 WO 2023078283 A1 WO2023078283 A1 WO 2023078283A1 CN 2022129181 W CN2022129181 W CN 2022129181W WO 2023078283 A1 WO2023078283 A1 WO 2023078283A1
Authority
WO
WIPO (PCT)
Prior art keywords
dmr
marker combination
breast cancer
differentially methylated
methylation
Prior art date
Application number
PCT/CN2022/129181
Other languages
French (fr)
Chinese (zh)
Inventor
叶竹佳
曾柳红
王军
杨婷
张显玉
庞达
陈志伟
范建兵
Original Assignee
广州市基准医疗有限责任公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州市基准医疗有限责任公司 filed Critical 广州市基准医疗有限责任公司
Publication of WO2023078283A1 publication Critical patent/WO2023078283A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers

Definitions

  • the disclosure belongs to the field of biotechnology, and in particular relates to a methylation biomarker for breast cancer diagnosis and its application.
  • breast cancer The pathogenesis of breast cancer is mainly summarized as genetic susceptibility, endocrine disorders, virus particles transmitted through breastfeeding, etc. About 5% of breast cancers are caused by gene mutations. A large number of studies have been done in Europe and the United States on the susceptibility genes of breast cancer. Currently, BRCA-1, BRCA-2, p53, PTEN, etc. are known. Breast cancers related to these gene mutations are called hereditary breast cancer. cancer, accounting for 5% to 10% of all breast cancers.
  • “Breast cancer 21 gene detection” (Oncotype DX) is one of the most commonly used prognostic tools for breast cancer in clinical practice. The expression level is transformed into recurrence score (RS), and according to the score, it is a detection method to judge whether breast cancer patients need adjuvant chemotherapy. Individualized therapy can help.
  • "Breast cancer 21 gene detection” is mainly aimed at patients with early breast cancer who are estrogen receptor (ER) positive, human epidermal factor receptor 2 (Her2) negative, and lymph node negative. Recurrence risk prediction can also be used as the only multigene test to predict the benefit of chemotherapy and endocrine therapy in patients with estrogen receptor-positive invasive breast cancer.
  • “Breast Cancer 21 Gene Detection” consists of 16 cancer-related genes from the proliferation group, invasion group, Her2 group, estrogen group and other groups, and 5 internal reference genes from the reference genome. By detecting 21 genes and observing the interaction between them to judge the characteristics of the tumor, the recurrence index of breast cancer and the benefit ratio of chemotherapy can be predicted.
  • “Breast Cancer 70 Gene Test” (MammaPrint) also plays an important role in early breast cancer diagnosis.
  • “Breast cancer 70 gene detection” scans tumor biopsy tissue sections to find 70 genes related to whether cancer cells may proliferate or invade healthy tissues. Her2-negative patients.
  • the latest research results have found that 72 new common gene mutations will increase the risk of breast cancer in women, which brings the number of gene mutations currently known to be associated with breast cancer to 177. Many people may carry common mutated genes, and the risk of cancer caused by a single gene mutation is relatively small, but the more related mutated genes in a woman, the greater the risk of breast cancer.
  • the current breast cancer genetic detection method is a non-invasive detection, but it needs to be based on the tissue removed during the original operation (such as lumpectomy, mastectomy or core needle biopsy).
  • Mammography (Mammography, MG): The onset age of breast cancer in Chinese women is younger (45-55 years old), and the breast glands are denser than Western women, and MG is easy to miss , and the sensitivity of MG to the diagnosis of dense breast lesions is only 30%. 2.
  • B-ultrasound The positive predictive value of MG alone was 55.5%, and the positive predictive value of MG combined with B-ultrasound was 43.3%.
  • Nuclear magnetic resonance (MRI) MRI is not sensitive to tiny calcifications, and has clear contraindications. Therefore, early screening, early diagnosis and early treatment of tumors are of great significance to improving the five-year survival rate.
  • the key to early screening and early diagnosis of tumors is to establish an effective detection model and find molecular markers that can be used as screening and diagnosis.
  • cfDNA circulating-free DNA, circulating DNA
  • circulating-free DNA circulating DNA
  • circulating DNA circulating tumor DNA
  • ctDNA circulating tumor DNA, circulating tumor DNA
  • ctDNA is a sensitive and specific biomarker with wide applicability, which can be used in clinical and research of various and many different types of cancers.
  • plasma ctDNA can be used as a biomarker in the early diagnosis and screening of tumors, prediction, treatment response, monitoring tumor size and recurrence, etc.
  • the international research direction is to integrate multi-omics/multiple molecular markers, multi-gene/multi-locus to improve the sensitivity and specificity of detection technology, so as to meet the clinical demand for detection products.
  • the diagnosis of breast cancer still has defects such as low accuracy rate.
  • the present disclosure provides a biomarker with high specificity, sensitivity and accuracy for the diagnosis of breast cancer, so as to meet the clinical demand for breast cancer detection products.
  • the inventors conducted in-depth research and trial and error, by constructing a breast cancer-specific methylation database, and using paired breast cancer and its paracancerous tissues and white blood cell samples to screen out the source Breast cancer-specific methylation biomarkers based on breast cancer tissue, and a methylation model for the diagnosis of benign and malignant breast nodules and breast cancer molecular typing was constructed based on breast tissue and plasma samples, thus completing the This disclosure.
  • the present disclosure provides a methylated biomarker for the diagnosis of breast cancer, wherein the methylated biomarker includes the differentially methylated regions DMR-1-1 provided in Table 1. Any one of DMR-521 or any combination thereof.
  • the methylated biomarkers include any one of the following: at least 3, 5, 10, 20, 30, 40 of DMR-1 to DMR-521 , 50, 60, 70, 80, 90, 100, 110, 120, 130, 150, 170, 200 or more differentially methylated regions.
  • the methylation biomarkers for breast cancer diagnosis include any one of the following (a)-(p) or any combination thereof: (a) marker combination 1 Differentially methylated regions; (b) differentially methylated regions in marker combination 2; (c) differentially methylated regions in marker combination 3; (d) differentially methylated regions in marker combination 4 Region; (e) differentially methylated region in marker combination 5; (f) differentially methylated region in marker combination 6; (g) differentially methylated region in marker combination 10; (h) Differentially methylated regions in marker combination 7; (i) differentially methylated regions in marker combination 11; (j) differentially methylated regions in marker combination 12; (k) differentially methylated regions in marker combination 13 Differentially methylated region; (l) differentially methylated region in marker combination 14; (m) differentially methylated region in marker combination 15; (n) differentially methylated region in marker combination 16 Region; (o) differentially methylated region in marker combination 9
  • the breast cancer is breast cancer in a subject, and the subject is a mammal; preferably, the mammal is a human.
  • the breast cancer is selected from ER/PR/Her2 + type, ER + /PR/Her2 - type and ER- /PR- / Her2 - type breast cancer; optionally, the The breast cancer is selected from stage 0, stage I, stage II, stage III and stage IV breast cancer.
  • the diagnosis is to distinguish benign or malignant breast nodules.
  • the methylation biomarkers when distinguishing breast nodules as benign or malignant, include: differentially methylated regions in marker combination 1, differential methylation regions in marker combination 2 methylated region, differentially methylated region in marker combination 3, differentially methylated region in marker combination 4, differentially methylated region in marker combination 5, differentially methylated region in marker combination 6 Regions, differentially methylated regions in marker group 10, differentially methylated regions in marker group 7, differentially methylated regions in marker group 11, differentially methylated regions in marker group 12, and marker group 13 Any of the methylated region, the differentially methylated region in marker combination 14, the differentially methylated region in marker combination 15, and the differentially methylated region in marker combination 16 items or combinations thereof.
  • the methylated biomarkers include the differentially methylated regions in marker combination 5; and/or, the methylated biomarkers include the differentially methylated regions in marker combination 6 Differentially methylated regions; and/or, the methylated biomarkers include differentially methylated regions in marker combination 10; and/or, the methylated biomarkers include marker combination 7 differentially methylated regions in; and/or, the methylated biomarkers include differentially methylated regions in marker combination 13; and/or, the methylated biomarkers include marker The differentially methylated region in combination 11; and/or, the methylated biomarker includes the differentially methylated region in marker combination 12.
  • the methylated biomarkers are differentially methylated regions in marker combination 1, differentially methylated regions in marker combination 2, and differentially methylated regions in marker combination 3.
  • Methylated regions, differentially methylated regions in marker combination 4 differentially methylated regions in marker combination 5, differentially methylated regions in marker combination 6, differentially methylated regions in marker combination 10
  • the methylated biomarker is a differentially methylated region in marker combination 5, a differentially methylated region in marker combination 6, a differentially methylated region in marker combination 10, Differentially methylated regions in marker combination 7, differentially methylated regions in marker combination 13, differentially methylated regions in marker combination 11, or differentially methylated regions in marker combination 12.
  • the methylated biomarker is a differentially methylated region in marker combination 6.
  • the methylated biomarker is a differentially methylated region in marker combination 10.
  • the methylated biomarker is a differentially methylated region in marker combination 7.
  • the methylated biomarker is a differentially methylated region in marker combination 13.
  • the diagnosis is to identify the molecular subtype of breast cancer; preferably, the molecular subtype of breast cancer includes ER/PR/Her2 + type, ER + /PR/Her2 - type and ER - /PR - /Her2 - type breast cancer.
  • the methylated biomarkers when identifying molecular subtypes of breast cancer, include: differentially methylated regions in marker combination 9 or differentially methylated regions in marker combination 8 areas, or a combination thereof.
  • the sample to be tested when the sample to be tested is a tissue, the methylated biomarker used to identify the molecular subtype of breast cancer can select the differentially methylated region in marker combination 8 or the difference in marker combination 9 sexually methylated regions.
  • the sample to be tested is plasma, serum or blood
  • the methylated biomarkers used to identify molecular subtypes of breast cancer can select differentially methylated regions in marker combination 9.
  • the second aspect of the present disclosure provides a diagnostic kit for breast cancer, which includes a reagent for detecting the methylation status of the methylated biomarker described in the first aspect of the present disclosure in a sample to be tested.
  • the sample to be tested is plasma, serum, blood, tissue, or any combination thereof; preferably, the tissue is breast tissue, more preferably breast nodule tissue.
  • the diagnosis is to distinguish benign or malignant breast nodules.
  • the diagnosis is to identify the molecular subtype of breast cancer; preferably, the molecular subtype of breast cancer includes ER/PR/Her2 + type, ER + /PR/Her2 - type and ER - /PR - /Her2 -type .
  • the reagent is a reagent selected from the following methods for detecting methylation status: pyrosequencing method, bisulfite conversion sequencing method, methylation chip method, qPCR method, digital PCR method, second-generation sequencing method, third-generation sequencing method, genome-wide methylation sequencing method, DNA enrichment detection method, simplified bisulfite sequencing technology, HPLC method, MassArray, methylation-specific PCR, or random combination.
  • the present disclosure provides the use of the methylation biomarker according to the first aspect of the present disclosure in preparing a kit for diagnosing whether a subject suffers from breast cancer.
  • the diagnosis is to distinguish benign or malignant breast nodules; in other specific embodiments, the diagnosis is to identify molecular subtypes of breast cancer; preferably, the breast Molecular subtypes of carcinoma include ER/PR/Her2 + type, ER + /PR/Her2 - type and ER- / PR- /Her2 - type.
  • the present disclosure provides a method for diagnosing breast cancer in a fourth aspect, the diagnosing method comprising:
  • the subject to be tested is identified as having Having breast cancer and/or identifying the molecular subtype of breast cancer in the subject to be tested; preferably, the molecular subtype of breast cancer is selected from ER/PR/Her2 + type, ER + /PR/Her2 -type and ER- / PR- /Her2 - type.
  • the sample to be tested is plasma, serum, blood, tissue or any combination thereof; preferably, the tissue is breast tissue.
  • the method for detecting the methylation status of a methylated biomarker is selected from the group consisting of: pyrosequencing, bisulfite conversion sequencing, methylation chip, qPCR, digital PCR , second-generation sequencing, third-generation sequencing, genome-wide methylation sequencing, DNA enrichment detection, simplified bisulfite sequencing, HPLC, MassArray, methylation-specific PCR, or any combination thereof.
  • the present disclosure provides methylation biomarkers for early diagnosis of breast cancer, specifically, research has obtained a genomic fragment with an obviously abnormal methylation modification pattern in breast cancer tissue and plasma cfDNA, namely DMR-1 -DMR-521.
  • marker combination 1 marker combination 2, marker combination 3, marker combination 4, marker combination 5, marker combination 6, marker combination 7, marker combination 10, marker combination 11, and marker combination 12 , marker combination 13, marker combination 14, marker combination 15, and marker combination 16 all have the ability to distinguish benign from malignant breast nodules, no matter based on breast tissue samples or plasma samples.
  • marker combination 6, marker combination 10, marker combination 7, and marker combination 13 have better effects; while marker combination 6 uses relatively fewer markers and has achieved better diagnostic results.
  • Figure 1 is a heat map of the methylation rate of markers in malignant breast nodules in 521 DMR regions.
  • Fig. 2 is a ROC chart for the detection of benign and malignant breast nodules in breast tissue samples with different molecular marker combinations.
  • Fig. 3 is the ROC chart of different molecular marker combinations for the detection of benign and malignant breast nodules in plasma samples.
  • Figure 4 is the subtype classification of breast malignant nodules performed by molecular marker combination marker combination 8 on breast tissue samples.
  • the "plurality” mentioned in the present disclosure means two or more.
  • “And/or” describes the association relationship of associated objects, indicating that there may be three types of relationships, for example, A and/or B may indicate: A exists alone, A and B exist simultaneously, and B exists independently.
  • the character “/” generally indicates that the contextual objects are an "or” relationship.
  • breast cancer is used in the broadest sense and refers to all cancers that start in the breast. It includes the following subtypes: ductal carcinoma in situ, invasive ductal carcinoma (including ductal, medullary, mucinous, papillary, and cribriform breast carcinomas), invasive lobular carcinoma, inflammatory breast carcinoma , lobular carcinoma in situ, male breast cancer, Paget's disease of the nipple, and phyllodes neoplasms of the breast.
  • Stage 0 (Tis, N0, M0), Stage IA (T1, N0, M0), Stage IB (T0 or T1, N1mi, M0), Stage IIA (T0 or T1, N1 (but not N1mi), M0; or T2, N0, M0), stage IIB (T2, N1, M0; or T3, N0, M0), stage IIIA (T0 to T2, N2, M0; or T3, N1 or N2, M0), stage IIIB (T4, N0 to N2, M0), stage IIIC (any T, N3, M0) and stage IV (any T, any N, M1).
  • the term "molecular typing” of breast cancer refers to a breast cancer classification method based on the gene expression profile of breast cancer tumor tissue.
  • the molecular typing system of breast cancer includes but is not limited to PAM50 (Prosigna) (see, for example Parker, J.S.et al., Supervised risk predictor of breast cancer based onintrinsic subtypes.J.Clin.Oncol.2009,27:1160 -1167) and breast cancer 72 gene molecular typing (see, for example, Yang B. et al., An assessment of prognostic immunity markers in breast cancer. NP J breast cancer, 2018, 4:35.
  • PAM50 will breast cancer Divided into four subtypes: Luminal A (Luminal A), Luminal B (Luminal B), Basal-like (Basal-like) and Her2-enriched (Her2-enriched).
  • breast cancer 72 gene Molecular typing divides breast cancer into luminal A, luminal B, basal, Her2-enriched, and immune-enhanced types.
  • the mammary gland Carcinoma is divided into ER/PR/Her2 + type, ER + /PR/Her2 - type and ER - /PR - /Her2 - type breast cancer.
  • ER/PR/Her2 + breast cancer includes Lumina B and Her2 positive, and other Her2 overexpression subtypes;
  • ER + /PR/Her2 - breast cancer includes Lumina A and Her2 negative, and Lumina B and Her2 negative ;
  • ER- / PR- /Her2 - type breast cancer namely triple-negative breast cancer (TNBC).
  • TNBC triple-negative breast cancer
  • breast nodule is a symptom commonly seen in breast hyperplasia (which can form breast cysts) and breast neoplastic diseases, including benign breast tumors (such as breast fibroids, phyllodes, etc.) and breast malignancies Tumor (breast cancer).
  • subject refers to an organism, or a part or component thereof, to which the provided compositions, methods, kits, devices and systems may be administered or applied.
  • the subject can be a mammal or a cell, tissue, organ or part of the mammal.
  • mammal refers to any kind of mammal, preferably a human (including a human, human subject or human patient).
  • Subjects and mammals include, but are not limited to, farm animals, sport animals, pets, primates, horses, dogs, cats, and rodents such as mice and rats.
  • sample refers to any substance, including biological samples, that may contain target molecules that need to be analyzed.
  • biological sample refers to any sample obtained from a living or viral (or prion) source or other source of macromolecules and biomolecules, and includes samples from which nucleic acids, proteins, and/or other macromolecules can be obtained. Any cell type or tissue of the subject of the molecule.
  • a biological sample may be a sample obtained directly from a biological source or a processed sample. For example, isolated nucleic acid that is amplified constitutes a biological sample.
  • Biological samples include, but are not limited to, body fluids (e.g., blood, plasma, serum, cerebrospinal fluid, synovial fluid, urine, sweat, semen, feces, sputum, tears, mucus, amniotic fluid, etc.), exudates, bone marrow samples, Ascites, pelvic flushing fluid, pleural fluid, spinal fluid, lymphatic fluid, eye fluid, extracts from nasal, throat, or genital swabs, cell suspensions of digested tissues, or extracts of fecal matter, as well as extracts from human, animal ( For example, tissue and organ samples of non-human mammals) and plants, and processed samples derived therefrom.
  • body fluids e.g., blood, plasma, serum, cerebrospinal fluid, synovial fluid, urine, sweat, semen, feces, sputum, tears, mucus, amniotic fluid, etc.
  • amplification generally refers to the process of producing multiple copies of a desired sequence.
  • Multiple copies means at least two copies.
  • Copy does not necessarily imply perfect sequence complementarity or identity to the template sequence.
  • copies may include nucleotide analogs such as deoxyinosine, deliberate sequence changes (such as those introduced by primers containing sequences that are hybridizable but not complementary to the template), and/or Sequence error.
  • Sequence determination and the like include determination of information on the nucleotide base sequence of a nucleic acid. Such information may include the identification or determination of partial or full sequence information for a nucleic acid. Sequence information can be determined with varying degrees of statistical reliability or confidence. In one aspect, the term includes determining the identity and order of a plurality of contiguous nucleotides in a nucleic acid.
  • sequencing includes sequence determination using methods that determine many (typically thousands to billions) of nucleic acid sequences in an essentially parallel fashion, That is, in this method, DNA templates are not prepared for sequencing one at a time, but are done in a batch process, and in this method many sequences are preferably read in parallel, or using ultra-high Flux serial process reads.
  • Such methods include, but are not limited to, pyrosequencing (e.g., as commercialized by 454 Life Sciences, Inc., Branford, CT); commercially available); sequencing by synthesis using modified nucleotides (e.g., TruSeq TM and HiSeq TM technologies as commercialized by Illumina, Inc., San Diego, CA, commercialized by Helicos Biosciences Corporation, Cambridge, MA) HeliScope TM ; and PacBio RS commercialized by Pacific Biosciences of California, Inc., Menlo Park, CA), sequenced by ion detection technology (e.g., Ion Torrent TM technology, Life Technologies, Carlsbad, CA); DNA nanosphere sequencing (Complete Genomics, Inc., Mountain View, CA); highly parallel sequencing methods such as nanopore-based sequencing technology (for example, developed by Oxford Nanopore Technologies, LTD, Oxford, UK).
  • modified nucleotides e.g., TruSeq TM and HiSeq TM technologies as commercialized by Illumina, Inc., San
  • methylation refers to the methylation of cytosine at the C5 or N4 position of cytosine, the N6 position of adenine, or other types of nucleic acid methylation.
  • In vitro amplified DNA is usually unmethylated because typical in vitro DNA amplification methods do not preserve the methylation pattern of the amplified template.
  • unmethylated DNA or “methylated DNA” may also refer to amplified DNA whose original template is unmethylated or methylated, respectively.
  • “methylation status", “methylation profile” and “methylation status" of a nucleic acid molecule refer to the presence or absence of one or more methylated nucleotide bases in a nucleic acid molecule.
  • a nucleic acid molecule comprising methylated cytosines is considered methylated (eg, the methylation state of the nucleic acid molecule is methylated).
  • a nucleic acid molecule that does not contain any methylated nucleotides is considered unmethylated.
  • the statistically methylated region in this paper refers to at least 3 consecutive CpG regions within the window of 200bp.
  • the statistical methylation degree value ie, beta ( ⁇ ) value
  • beta ( ⁇ ) value refers to the ratio of methylated reads to all reads in a certain methylated region of the sample, with a value between 0 and 1, in which at least 3 consecutive CpGs exist Methylated reads are regarded as methylated reads, otherwise they are unmethylated reads, and the sum of methylated reads and unmethylated reads is all reads.
  • the model construction of random forest and other algorithms is carried out on the ⁇ value of the methylation region of the sample to generate the probability value of each classification.
  • Benign and malignant nodules binary classification threshold (Cutoff) is defined by the malignant probability value of the sample in the test set under the maximum value of Youden Index (Youden Index). If the malignant probability value of the sample is greater than or equal to the threshold value, it is predicted as a malignant nodule , otherwise it is a benign nodule.
  • the multi-classification of malignant subtypes calculates the probability value of each classification subtype for the sample, and the classification subtype corresponding to the maximum probability value of the sample is the prediction result of the malignant subtype classification of the sample.
  • the methylation status of a particular nucleic acid sequence can indicate the methylation status of each base in the sequence Methylation status, or can indicate the methylation status of a subset of bases (for example, one or more cytosines) within the sequence, or can indicate information about the methylation density of a region within the sequence, where the Precise information on where methylation occurs within the sequence.
  • Methylation status can optionally be represented or indicated by a "methylation value” (eg, representing a methylation frequency, fraction, ratio, percentage, etc.).
  • Methylation values can be generated, for example, by quantifying the amount of intact nucleic acid present after restriction digestion with a methylation-dependent restriction enzyme, or comparing amplification profiles after bisulfite treatment, or comparing sub- Sequences of bisulfate-treated and untreated nucleic acids.
  • the value ie, the methylation value
  • represents the methylation status and thus can be used as a quantitative indicator of the methylation status in multiple copies of the locus. This is particularly useful when it is desired to compare the methylation status of sequences in a sample to a threshold or reference value.
  • methylation frequency or “percent methylation (%)” refers to the number of instances of a molecule or locus that is methylated relative to the number of instances of the molecule or locus that is unmethylated.
  • a methylation state describes the state of methylation of a nucleic acid (eg, a genomic sequence).
  • methylation status refers to the property of a nucleic acid fragment associated with methylation at a particular genomic locus. Such properties include, but are not limited to: whether any cytosine (C) residues within the DNA sequence are methylated, the location of the methylated C residues, the frequency of methylated C throughout any particular region of the nucleic acid or percentage, and allelic differences in methylation due to, for example, differences in allelic origin.
  • C cytosine
  • methylation status also refer to the relative, absolute concentration or pattern of methylated C or unmethylated C throughout any particular region of nucleic acid in a biological sample .
  • cytosine (C) residues within a nucleic acid sequence may be referred to as “hypermethylated” or have “increased methylation” if they are methylated, whereas cytosine (C) residues within a DNA sequence If the group is not methylated, it can be called “hypomethylated” or has “reduced methylation”.
  • a nucleic acid sequence is considered to be different from another nucleic acid sequence if cytosine (C) residues within the nucleic acid sequence are methylated compared to another nucleic acid sequence (e.g., from a different region or from a different individual, etc.). Compared to hypermethylation or having increased methylation.
  • a DNA sequence is considered to be identical to another nucleic acid sequence if the cytosine (C) residues within the DNA sequence are not methylated compared to the other nucleic acid sequence (e.g., from a different region or from a different individual, etc.) The sequence is hypomethylated or has reduced methylation compared to the sequence.
  • methylation pattern refers to the collective sites of methylated and unmethylated nucleotides on a certain region of a nucleic acid. Two nucleotides may have the same or similar methylation frequency or methylation percentage, but when the number of methylated and unmethylated nucleotides is the same or similar across the region but methylated and unmethylated Different positions of methylated nucleotides have different methylation patterns. Sequences are said to be “differentially methylated” or to have methylation differences" or have “different methylation status”.
  • differential methylation refers to a difference in the level or pattern of nucleic acid methylation in a cancer positive sample compared to the level or pattern of nucleic acid methylation in a cancer negative sample. It can also refer to a difference in the level or pattern between patients whose cancer recurred after surgery and those who did not. Differential methylation as well as specific levels or patterns of DNA methylation are diagnostic and predictive biomarkers, eg once the correct cut-off values or predictive properties are defined.
  • Methylation state frequencies can be used to describe a population of individuals or a sample from a single individual. For example, a nucleotide locus with a methylation state frequency of 50% is methylated in 50% of the instances and unmethylated in 50% of the instances. Such frequencies can be used, for example, to describe the degree of methylation of a nucleotide locus or nucleic acid region in a population of individuals or a collection of nucleic acids. Thus, when methylation in a first population or cluster of nucleic acid molecules differs from methylation in a second population or cluster of nucleic acid molecules, the frequencies of the methylation states of the first population or cluster will be different Frequency of methylation status in a second population or cluster.
  • Such frequencies can also be used, for example, to describe the degree of methylation of a nucleotide locus or nucleic acid region in a single individual.
  • such frequencies can be used to describe the degree of methylation or unmethylation at a nucleotide locus or region of a nucleic acid in a group of cells obtained from a tissue sample.
  • sensitivity of a given marker refers to the percentage of samples that report a DNA methylation value above the threshold for distinguishing malignant from benign nodule samples.
  • a positive is defined as a histologically confirmed malignant nodule with a reported DNA methylation value above a threshold (e.g., a range associated with disease)
  • a false negative is defined as a reported DNA methylation value Histologically confirmed malignant nodules with methylation values below a threshold (eg, a range associated with the absence of disease).
  • the value of sensitivity thus reflects the probability that a DNA methylation measurement for a given marker from a known diseased sample will be within the range of disease-associated measurements.
  • the clinical relevance of a calculated sensitivity value reflects an estimate of the probability that a given marker, when applied to a subject with a clinical condition, will detect the presence of that condition.
  • specificity of a given marker refers to the percentage of benign nodule samples that report a DNA methylation value below the threshold for distinguishing malignant from benign nodule samples.
  • negatives are defined as histologically confirmed benign nodule samples with reported DNA methylation values below a threshold (e.g., a range associated with the absence of disease), and false positives are defined as all Histologically confirmed benign nodule samples with reported DNA methylation values above a threshold (eg, range associated with disease).
  • the value of specificity thus reflects the probability that a DNA methylation measurement for a given marker from a sample of a known benign nodule will be within the range of measurements associated with the absence of disease.
  • the clinical relevance of a calculated specificity value reflects an estimate of the probability that a given marker, when applied to a patient without a clinical condition, will detect the absence of that condition.
  • AUC is an abbreviation for "Area Under the Curve”. Specifically, it refers to the area under the receiver operating characteristic (ROC) curve.
  • a ROC curve is a plot of the ratio of true positives versus the ratio of false positives for different possible block cut points of a diagnostic test. It shows a tradeoff between sensitivity and specificity depending on the cut point chosen (any increase in sensitivity will be accompanied by a decrease in specificity).
  • the area under the ROC curve (AUC) is a measure of the accuracy of a diagnostic test (larger area is better; 1 is optimal; a random test will have a ROC curve with an area of 0.5 on the diagonal; see: J.P.Egan. (1975 ) Signal Detection Theory and ROC Analysis, Academic Press, NewYork).
  • diagnosis test applications include the detection or identification of a disease state or condition in a subject, determining the likelihood that a subject will develop a given disease or condition, determining that a subject with a disease or condition will respond to treatment The likelihood of responding, determining the prognosis (or possible progression or regression thereof) of a subject with a disease or condition, and determining the effect of treatment on a subject with a disease or condition.
  • a diagnosis can be used to detect the presence or likelihood that a subject will have a malignant nodule or that such a subject will respond favorably to a compound (eg, a drug, eg, a drug) or other treatment.
  • diagnosis also means determining the type of disease, such as but not limited to molecular typing of breast cancer.
  • the term "marker,” “biomarker,” or “molecular marker” refers to a substance (e.g., a nucleic acid) capable of diagnosing cancer by distinguishing cancer cells from normal cells (e.g., based on their methylation or nucleic acid regions).
  • DMRs differentially Methylated Regions
  • Step 1 Using the TruSeq Methy Capture EPIC library construction kit, 338 breast tissue samples (55 benign samples and 283 malignant samples) were used to mix 71 breast tissue sample pools, including 11 benign sample pools and 24 ER pools + /PR/Her2 - type malignant sample pool (1 QC failed), 25 ER/PR/Her2 + type malignant sample pools, and 10 ER - /PR - /Her2 - type malignant sample pools, constructed AnchorDx breast Cancer-specific methylation databases;
  • Step 2 Preliminary screening of breast cancer-specific differentially methylated regions from the AnchorDx breast cancer-specific methylation database constructed in step 1, which contains 13,676 large-scale methylated regions, 13,676 A larger range of methylated regions contains 129,794 markers containing 3 consecutive CpG (3-CpG) differentially methylated regions; and synthesizes markers containing the 129,794 3-CpG differentially methylated regions Breast cancer specific methylation detection panel (panel);
  • Step 3 According to 112 pairs of paired breast cancer tissues and their plasma samples, based on the breast cancer-specific methylation detection panel in step 2, under the filter conditions of delta value>0.05 and FDR ⁇ 0.01, further screen the source The specific methylation signals in leukocytes were filtered through the paired leukocyte samples of 40 breast cancer samples, and finally 35,814 breast cancer-specific 3-CpG differences were screened A marker for methylated regions.
  • Step 4 Based on breast tissue and plasma samples respectively, and based on the markers of 35,814 breast cancer-specific 3-CpG differentially methylated regions obtained in step 3, use Random Forest to screen and construct a marker that can identify benign and malignant breast nodules Methylation biomarkers and models, finally screening models involving differentially methylated regions of marker combinations 1-7 and marker combinations 10-12;
  • Step 5 Based on breast tissue and plasma samples respectively, and based on the markers of 35,814 breast cancer-specific 3-CpG differentially methylated regions obtained in step 3, use Random Forest to screen and construct molecules that can detect malignant breast nodules Typed methylation biomarkers and models, and finally screen models involving differentially methylated regions of marker combinations 8-9.
  • the present disclosure provides a methylation biomarker for the diagnosis of breast cancer, wherein the methylation biomarker includes the differentially methylated regions listed in Table 1 Any one of DMR-1 to DMR-521 or any combination thereof.
  • the methylated biomarkers include any one of the following: at least 3, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210 , 220 or more differentially methylated regions.
  • the methylated biomarkers include any of the following (a)-(p):
  • the marks in the model (model) column in Table 1, such as marker combination 1, marker combination 2, marker combination 3, etc. indicate the type of molecular marker combination to which the DMR of this row belongs, such as in Table 1 DMR-1 belongs to marker group 5, marker group 6, marker group 7, marker group 10 and group 13 at the same time.
  • the combinations for detecting benign and malignant breast nodules based on breast tissue samples and/or plasma samples include marker combination 1, marker combination 2, marker combination 3, marker combination 4, marker combination 5, marker combination 6, and marker combination 7 , marker combination 10, marker combination 11, marker combination 12, marker combination 13, marker combination 14, marker combination 15, and marker combination 16; combination for identifying subtypes of breast malignant nodules based on breast tissue: marker combination 8; based on breast tissue and /or the combination of plasma to identify subtypes of malignant breast nodules: marker combination 9.
  • the differentially methylated regions in the marker combination 1 include: DMR-27, DMR-67 and DMR-72.
  • the differentially methylated regions in the marker combination 2 include: DMR-25, DMR-27, DMR-67, DMR-72 and DMR-79.
  • the differentially methylated regions in the marker combination 3 include: DMR-13, DMR-15, DMR-25, DMR-27, DMR-33, DMR-66, DMR-67, DMR-72, DMR-73 and DMR-79.
  • the differentially methylated regions in the marker combination 4 include: DMR-2, DMR-7, DMR-11, DMR-13, DMR-15, DMR-17, DMR-18, DMR-19, DMR-20 , DMR-22, DMR-25, DMR-26, DMR-27, DMR-33, DMR-44, DMR-66, DMR-67, DMR-72, DMR-73 and DMR-79.
  • the differentially methylated regions in the marker combination 5 include: DMR-1, DMR-2, DMR-4, DMR-6, DMR-7, DMR-9, DMR-10, DMR-11, DMR-12 , DMR-13, DMR-14, DMR-15, DMR-17, DMR-18, DMR-19, DMR-20, DMR-22, DMR-25, DMR-26, DMR-27, DMR-28, DMR -30, DMR-31, DMR-32, DMR-33, DMR-36, DMR-37, DMR-41, DMR-43, DMR-44, DMR-45, DMR-46, DMR-47, DMR-48 , DMR-49, DMR-50, DMR-51, DMR-52, DMR-53, DMR-54, DMR-57, DMR-64, DMR-65, DMR-66, DMR-67, DMR-72, DMR -73, DMR-77, DMR-79 and DMR-80.
  • the differentially methylated regions in the marker combination 6 include: DMR-1 to DMR-80.
  • the differentially methylated regions in the marker combination 7 include: DMR-1 ⁇ DMR-14, DMR-16 ⁇ DMR-21, DMR-23 ⁇ DMR-27, DMR-29 ⁇ DMR-34, DMR-36 ⁇ DMR-38, DMR-40 ⁇ DMR-48, DMR-50 ⁇ DMR-54, DMR-57 ⁇ DMR-68, DMR-70, DMR-72 ⁇ DMR-78, DMR-80 ⁇ DMR-103.
  • the differentially methylated regions in the marker combination 8 include: DMR-104-DMR-189.
  • the differentially methylated regions in the marker combination 9 include: DMR-18, DMR-190-DMR-249.
  • the differentially methylated regions in the marker combination 10 include: DMR-1 to DMR-103.
  • the differentially methylated regions in the marker combination 11 include: DMR-3, DMR-4, DMR-9, DMR-10, DMR-12, DMR-14, DMR-15, DMR-16, DMR-21 , DMR-22, DMR-24, DMR-26, DMR-27, DMR-29, DMR-30, DMR-31, DMR-34, DMR-38, DMR-39, DMR-40, DMR-50, DMR -52, DMR-53, DMR-54, DMR-55, DMR-58, DMR-59, DMR-65, DMR-66, DMR-67, DMR-70, DMR-72, DMR-73, DMR-74 , DMR-76, DMR-77, DMR-79, DMR-81, DMR-82, DMR-85, DMR-94, DMR-97, DMR-99, DMR-101, DMR-102.
  • the differentially methylated regions in the marker combination 12 include: DMR-3, DMR-4, DMR-9, DMR-10, DMR-12, DMR-14, DMR-15, DMR-16, DMR-21 , DMR-22, DMR-24, DMR-26, DMR-27, DMR-28, DMR-29, DMR-30, DMR-31, DMR-34, DMR-35, DMR-38, DMR-39, DMR -40, DMR-50, DMR-52, DMR-53, DMR-54, DMR-55, DMR-56, DMR-58, DMR-59, DMR-62, DMR-65, DMR-66, DMR-67 , DMR-70, DMR-71, DMR-72, DMR-73, DMR-74, DMR-76, DMR-77, DMR-79, DMR-81, DMR-82, DMR-85, DMR-88, DMR -90, DMR-94, DMR-97, DMR-99,
  • the differentially methylated regions in the marker combination 13 include: DMR-1 ⁇ DMR-103, DMR-250 ⁇ DMR-366;
  • the differentially methylated regions in the marker combination 14 include: DMR-23, DMR-45, DMR-58, DMR-60, DMR-77, DMR-90, DMR-96, DMR-266, DMR-283 , DMR-284, DMR-321, DMR-337, DMR-367 ⁇ DMR-414;
  • the differentially methylated regions in the marker combination 15 include: DMR-15, DMR-16, DMR-39, DMR-56, DMR-62, DMR-66, DMR-250, DMR-258, DMR-259 , DMR-260, DMR-265, DMR-266, DMR-268, DMR-273, DMR-279, DMR-283, DMR-285, DMR-288, DMR-302, DMR-321, DMR-324, DMR -327, DMR-335, DMR-337, DMR-343, DMR-349, DMR-355, DMR-357, DMR-362, DMR-380, DMR-381, DMR-384, DMR-390, DMR-400 , DMR-402, DMR-406, DMR-408, DMR-409, DMR-414 ⁇ DMR-499
  • the differentially methylated regions in the marker combination 16 include: DMR-15, DMR-55, DMR-67, DMR-101, DMR-250, DMR-265, DMR-273, DMR-279, DMR-288 , DMR-324, DMR-327, DMR-341, DMR-343, DMR-349, DMR-355, DMR-357, DMR-362, DMR-408, DMR-420, DMR-421, DMR-426, DMR -428, DMR-429, DMR-430, DMR-432, DMR-433, DMR-435, DMR-438, DMR-443, DMR-446, DMR-447, DMR-448, DMR-449, DMR-452 , DMR-453, DMR-454, DMR-455, DMR-458, DMR-466, DMR-467, DMR-469, DMR-480, DMR-483, DMR-491, DMR-495 ⁇ DMR-498, DMR -500 to DMR-5
  • Embodiment experimental method
  • the specific operation steps of plasma cfDNA extraction were carried out according to the instruction manual of MagMax TM Cell-Free DNA Isolation Kit of Life Company.
  • the extraction steps of tissue gDNA were carried out according to the operating instructions of DNeasy Blood & Tissue Kit of QIAGEN Company.
  • the extracted cfDNA (10ng) or tissue gDNA (50ng) was subjected to bisulfite conversion to deaminate the unmethylated cytosine in the DNA and convert it to uracil, while the methylated cytosine remained unchanged to obtain
  • the specific operation of the conversion is carried out according to the instructions of the EZ DNA Methylation-Lightning Kit of Zymo Research.
  • the converted sample was added to the following (Table 2) reagents for reaction;
  • the specific purification steps are as follows: 1) Take the reaction product of the previous step And centrifuged, add 166 ⁇ l 1:6 times diluted Agencourt AMPure Beads to each sample, blow and mix with a pipette; 2) Incubate at room temperature for 5min; 3) Centrifuge, place on a magnetic stand for 5min; 4) Suck off Supernatant; 5) Add 200 ⁇ l 80% ethanol (EtOH), let it stand for 30s, and absorb the ethanol; 6) Repeat step 5 once; 7) Centrifuge, place the PCR tube on a magnetic stand, and absorb the remaining ethanol; 8) Open Cover and dry the magnetic beads for 2-3 minutes; 9) Add 21 ⁇ l EB for elution, blow and mix well with a pipette, and let stand at room temperature for 3 minutes; 8) Centrifuge, place the PCR tube on a magnetic stand, and let stand for 3 minutes; 10) Pipette 20 ⁇ l of supernatant into a new PCR tube
  • the hybridization capture kit is xGen Lockdown Reagents from IDT Company, and it is operated according to the instructions.
  • the sequencer of Illumina Company was used to sequence the samples captured by hybridization to obtain the sequencing results.
  • Test Example 1 Correlation between differentially methylated regions and breast cancer
  • the extracted DNA was transformed and built a library, and the specific operations were as described in the examples;
  • the sequencer of Illumina Company was used to sequence the sample after hybridization and capture to obtain the sequencing result, and the specific operation was as described in the examples;
  • a single-region methylation level detection was performed on the differentially methylated regions DMR-1-DMR-521 in 112 breast nodule tissue samples (56 benign samples and 56 malignant samples).
  • Calculate the methylation degree ⁇ value of at least 3 consecutive CpG regions in each sample (the ratio of methylated reads to total reads in the methylated region of a sample, the value is between 0 and 1, and at least 3 consecutive Reads with methylation in all CpGs are regarded as methylated reads, otherwise they are unmethylated reads, total reads are the sum of methylated reads and unmethylated reads), and the P value calculated according to the wilcoxon test adopts the BH method
  • Calculate the FDR (False Discovery Rate) value based on the filter condition: methylation difference degree delta value (the difference between two groups of methylation ⁇ values)>0.05 and FDR ⁇ 0.01, and a total of 35,814 markers were obtained.
  • Test example 2 The performance of different molecular marker combinations in the detection of benign and malignant breast nodules in breast tissue samples
  • This test example selects a combination of different specific molecular markers.
  • the specific molecular marker combinations are: marker combination 1, marker combination 2, marker combination 3, marker combination 4, marker combination 5, marker combination 6, marker combination 7, marker combination Combination 10, marker combination 11, marker combination 12, marker combination 13, marker combination 14, marker combination 15 and marker combination 16 (as shown in Table 1) used the random forest (Random Forest) algorithm to establish benign and malignant prediction models.
  • marker combination 1, marker combination 2, marker combination 3, marker combination 4, marker combination 5, marker combination 6, marker combination 7, marker combination 10, marker combination 11, marker combination 12, marker combination 13, AUC, specificity (SP), sensitivity (SE), accuracy (ACC), positive predictive value (PPV) and negative predictive value ( NPV) are shown in Table 12 and Figure 2 respectively.
  • the AUC of each marker combination is higher than 0.996, and the AUC of marker combination 6 and marker combination 7 reaches 1.
  • the aforementioned marker combinations all showed very high specificity (SP>0.98) and positive predictive value (PPV>0.992), and the accuracy rate also reached 100%. It can be seen that the combination of the selected molecular markers has a very good detection ability for the diagnosis of benign and malignant breast nodules.
  • Table 12 The performance of different molecular marker combinations in the detection of benign and malignant breast nodules in breast tissue samples
  • Model AUC Youden_SE Youden_SP PPV NPV marker combination 1 0.996 0.981 1 1 0.985 marker combination 2 0.998 0.981 1 1 0.985 marker combination 3 0.998 0.981 1 1 0.985 marker combination 4 0.999 0.99 1 1 0.996 marker combination 5 0.999 0.99 1 1 0.996 marker combination 6 1 1 1 1 1 marker combination 7 1 0.99 1 1 0.996 marker combination 10 0.999 0.99 1 1 0.996 marker combination 11 0.999 0.99 1 1 0.996 marker combination 12 0.999 0.99 1 1 0.996 marker combination 13 0.999 0.99 1 1 0.966 marker combination 14 0.995 0.99 1 1 0.966 marker combination 15 0.997 0.952 1 1 1 0.848 marker combination 16 0.992 0.981 0.964 0.99 0.931
  • Test Example 3 The performance of different molecular marker combinations in the detection of benign and malignant breast nodules in blood samples
  • This test example selects the combination that contains different specific molecular markers, according to the parameter MeanDecreaseGini (calculate the heterogeneity of each variable on each node of the classification tree through the Gini (Gini) index) constructed in the random forest model of the training set Influence, so as to compare the importance of variables.
  • the specific molecular marker combinations are respectively marker combination 1, marker combination 2, marker combination 3, marker combination 4, marker combination 5, marker combination 6, marker combination 7 and marker combination 10, and marker combination 11, marker combination 12, marker combination Combination 13, marker combination 14, marker combination 15 and marker combination 16, as described in Table 1.
  • Set the sample number training set and test set 7:3, use the Random Forest (Random Forest) algorithm to establish a benign and malignant prediction model, and obtain the malignant nodule probability value of each sample (between 0 and 1), by ROC curve and AUC evaluation performance of nodule probability value, the higher the AUC, the better the discrimination performance.
  • the cutoff is defined under the condition of the maximum Youden index, and the probability value of the test sample malignant nodule is greater than or equal to the cutoff value to predict a malignant nodule, otherwise it is predicted to be a benign nodule.
  • marker combination 1 marker combination 2, marker combination 3, marker combination 4, marker combination 5, marker combination 6, marker combination 7, marker combination 10, marker combination 11, marker combination 12, marker combination 13, marker Combination 14, marker combination 15 and marker combination 16 for AUC ( Figure 3), specificity (SP), sensitivity (SE), accuracy (ACC), positive predictive value (PPV) and negative
  • SP specificity
  • SE sensitivity
  • ACC positive predictive value
  • NPV positive predictive value
  • Table 13 The predicted values (NPV) are shown in Table 13 respectively.
  • the AUC of marker combination 5, marker combination 6, marker combination 7, marker combination 10 and marker 13 were all higher than 0.81, and the specificity/sensitivity were 0.978/0.553, 0.957/0.617, 0.87/0702, 0.804/0.766 and 0.848/ 0.660.
  • marker combination 6 has higher performance, and its AUC is close to that of marker combination 7 with more markers, with only a difference of 0.001, consistent with marker combination 10 with more markers, and higher than other markers There are few combinations; its PPV can also reach 0.935, indicating that the accuracy rate of positive samples predicted by this marker combination is higher.
  • Table 13 The performance of different molecular marker combinations in the detection of benign and malignant breast nodules in blood samples
  • Model AUC SE SP ACC PPV NPV marker combination 1 0.656(0.544-0.769) 0.8298 0.5 0.667 0.629 0.742 marker combination 2 0.721 (0.617-0.825) 0.766 0.609 0.688 0.667 0.718 marker combination 3 0.729 (0.627-0.832) 0.553 0.848 0.699 0.788 0.65 marker combination 4 0.792 (0.702-0.882) 0.851 0.63 0.742 0.702 0.806 marker combination 5 0.823 (0.739-0.907) 0.553 0.978 0.763 0.963 0.682 marker combination 6 0.858(0.782-0.934) 0.617 0.957 0.785 0.935 0.71 marker combination 7 0.859 (0.786-0.931) 0.702 0.87 0.785 0.846 0.741 marker combination 10 0.858(0.784-0.933) 0.766 0.804 0.785 0.8 0.771 marker combination 11 0.743 (0.642-0.844) 0.77 0.63 0.7 0.68 0.73 marker
  • Test Example 4 The performance of the molecular marker combination marker combination 8 in classifying breast malignant nodules subtypes on breast nodule tissue
  • This test example adopts the experimental method of embodiment, to 265 routine breast nodule tissue samples (56 routine benign samples, 209 routine malignant samples; Wherein malignant sample has 89 routine ER/PR/Her2 + type, 88 routine ER + /PR /Her2 - type and 32 cases of ER- / PR- /Her2 - type) were detected and analyzed, and the specific detection test methods and data judgment and processing are as described in the examples.
  • the random forest (Random Forest) algorithm was used to establish a benign and malignant prediction model, and the marker combination 8 was selected to classify breast malignant nodule subtypes.
  • the combination of molecular markers can clearly classify breast nodules into benign, ER/PR/Her2 + , ER + /PR/ Her2- and ER- / PR- /Her2 - types.
  • Table 14 the sensitivity of this molecular marker combination to ER/PR/Her2 + type, ER + /PR/Her2 - type and ER - /PR - /Her2 - type breast nodule in breast tissue is respectively 0.913, 0.911 and 0.9, all reached above 0.9; and the positive predictive values were 0.925, 0.872 and 0.939 respectively.
  • This combination has very high detection sensitivity for malignant nodules of different molecular types, and the classification accuracy is very high.
  • Test Example 5 The performance of molecular marker combination marker combination 9 in breast malignant nodule subtype classification on breast tissue and blood samples
  • the mammary gland tissue sample described in Test Example 2 and the plasma sample described in Test Example 3 were respectively detected and analyzed by the experimental method of the embodiment.
  • the specific detection test method and data judgment processing are as described in the embodiment.
  • the random forest (Random Forest) algorithm was used to establish a benign and malignant prediction model, and the marker combination 9 was selected to classify the subtypes of breast nodule tissue and plasma samples respectively.
  • the sensitivity of this combination of molecular markers to ER/PR/Her2 + , ER + /PR/Her2 - and ER - /PR - /Her2 - breast nodules in breast tissue was 0.7968, 0.8376 and 0.833, respectively;
  • the positive predictive values were 0.8242, 0.7802 and 0.9382, respectively. It has good discrimination for different subtypes of malignant nodules, among which the sensitivity for ER + /PR/Her2 - type and ER - /PR - /Her2 - type nodules is high, and for ER - /PR - /Her2 -
  • the accuracy rate of positive predictive value of type nodules is higher, as shown in Table 15.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Immunology (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Oncology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention provides a methylation biomarker for breast cancer diagnosis and a use thereof. Specifically, the present invention provides a methylation biomarker for breast cancer diagnosis, and the methylation biomarker comprises any one among the differentially methylated regions DMR-1 to DMR-521 provided in Table 1, or any combination thereof. The present invention provides a biomarker for diagnosing breast cancer, thus meeting the clinical demand for breast cancer detection products.

Description

用于乳腺癌诊断的甲基化生物标记物及其应用Methylation biomarkers for breast cancer diagnosis and their applications
优先权和相关申请Priority and related applications
本公开要求2021年11月4日提交的名称为“用于乳腺癌诊断的甲基化生物标记物及其应用”的中国专利申请202111300404.9和2021年12月3日提交的名称为“用于乳腺癌诊断的甲基化生物标记物及其应用”的中国专利申请202111464539.9的优先权,该申请包括附录在内的全部内容作为参考并入本公开。This disclosure requires the Chinese patent application 202111300404.9 filed on November 4, 2021, entitled "Methylation biomarkers for breast cancer diagnosis and its application" and the Chinese patent application 202111300404.9, filed on December 3, 2021, entitled "Used for breast cancer The priority of the Chinese patent application 202111464539.9 of Methylated Biomarkers for Cancer Diagnosis and Its Application, the entire contents of which including the appendix are incorporated into this disclosure by reference.
技术领域technical field
本公开属于生物技术领域,具体涉及用于乳腺癌诊断的甲基化生物标记物及其应用。The disclosure belongs to the field of biotechnology, and in particular relates to a methylation biomarker for breast cancer diagnosis and its application.
背景技术Background technique
在中国,癌症的健康负担逐年增长,与其他大多数国家一样,乳腺癌(Breast Cancer)也成为了中国女性最常见的癌症。据中国肿瘤登记年报显示:女性乳腺癌发病率在0~24岁年龄段处较低水平,25岁后逐渐上升,50~54岁年龄段达到高峰,55岁以后逐渐下降。In China, the health burden of cancer is increasing year by year. Like most other countries, breast cancer has become the most common cancer among Chinese women. According to the China Cancer Registry Annual Report, the incidence of female breast cancer is at a low level in the 0-24 age group, gradually increases after the age of 25, reaches a peak in the 50-54 age group, and gradually decreases after the age of 55.
乳腺癌的发病机制主要归纳为遗传易感性、内分泌失调、通过哺乳传染病毒颗粒等,约5%的乳腺癌是由基因突变引起。针对乳腺癌的易感基因,欧、美国家做了大量研究,现已知的有BRCA-1、BRCA-2,还有p53、PTEN等,与这些基因突变相关的乳腺癌称为遗传性乳腺癌,占全部乳腺癌的5%~10%。The pathogenesis of breast cancer is mainly summarized as genetic susceptibility, endocrine disorders, virus particles transmitted through breastfeeding, etc. About 5% of breast cancers are caused by gene mutations. A large number of studies have been done in Europe and the United States on the susceptibility genes of breast cancer. Currently, BRCA-1, BRCA-2, p53, PTEN, etc. are known. Breast cancers related to these gene mutations are called hereditary breast cancer. cancer, accounting for 5% to 10% of all breast cancers.
“乳腺癌21基因检测”(Oncotype DX)是临床实践中最常用的乳腺癌预后工具之一,其通过对21个与乳腺癌相关的基因的mRNA表达量进行定量检测,应用特定的算法将基因表达量转化为复发评分(RS),并根据评分来判断乳腺癌病人是否需要进行辅助化疗的一种检测方法,其结果可对预测预后、复发、瘤灶转移乃至指导治疗提供信息,为患者的个体化治疗提供帮助。“乳腺癌21基因检测”主要针对***受体(ER)阳性、人表皮因子受体2(Her2)阴性、***阴性的早期乳腺癌患者,该检测不仅能够提供1-5年及5年后复发风险预测,还能作为唯一多基因检测来预测***受体阳性浸润性乳腺癌患者的化疗及内分泌治疗的获益程度。“乳腺癌21基因检测”由增殖组、侵袭组、Her2组、***组和其它组的16个癌症相关基因以及参照基因组的5个内参基因组成。通过检测21个基因,观察它们之间的相互作用来判断肿瘤特性,从而可预测乳腺癌复发指数以及接受化疗的效益比。“乳腺癌70基因检测”(MammaPrint)也在早期乳腺癌诊断中发挥了重要作用。“乳腺癌70基因检测”对肿瘤活检组织切片进行扫描,查找与癌细胞是否可能增殖或侵袭健康组织相关的70种基因,该检测针对的是***阴性、1-3枚***阳性的ER阳性和Her2阴性患者。"Breast cancer 21 gene detection" (Oncotype DX) is one of the most commonly used prognostic tools for breast cancer in clinical practice. The expression level is transformed into recurrence score (RS), and according to the score, it is a detection method to judge whether breast cancer patients need adjuvant chemotherapy. Individualized therapy can help. "Breast cancer 21 gene detection" is mainly aimed at patients with early breast cancer who are estrogen receptor (ER) positive, human epidermal factor receptor 2 (Her2) negative, and lymph node negative. Recurrence risk prediction can also be used as the only multigene test to predict the benefit of chemotherapy and endocrine therapy in patients with estrogen receptor-positive invasive breast cancer. "Breast Cancer 21 Gene Detection" consists of 16 cancer-related genes from the proliferation group, invasion group, Her2 group, estrogen group and other groups, and 5 internal reference genes from the reference genome. By detecting 21 genes and observing the interaction between them to judge the characteristics of the tumor, the recurrence index of breast cancer and the benefit ratio of chemotherapy can be predicted. "Breast Cancer 70 Gene Test" (MammaPrint) also plays an important role in early breast cancer diagnosis. "Breast cancer 70 gene detection" scans tumor biopsy tissue sections to find 70 genes related to whether cancer cells may proliferate or invade healthy tissues. Her2-negative patients.
最新的研究成果发现72个新的常见基因变异将导致女性患乳腺癌的风险上升,这使目前已知与乳腺癌相关的基因突变数量增至177个。许多人都可能携带常见的变异基因,单个基因突变致癌的风险相对较小,但女性体内的相关变异基因越多,患乳腺癌的风险越大。目前乳腺癌基因检测手段属于一种非侵入性检测,但需要基于原来的手术(例如***肿瘤切除手术、***切除手术或核心穿刺活组织检查)过程中取出的组织的基础上进行检测。The latest research results have found that 72 new common gene mutations will increase the risk of breast cancer in women, which brings the number of gene mutations currently known to be associated with breast cancer to 177. Many people may carry common mutated genes, and the risk of cancer caused by a single gene mutation is relatively small, but the more related mutated genes in a woman, the greater the risk of breast cancer. The current breast cancer genetic detection method is a non-invasive detection, but it needs to be based on the tissue removed during the original operation (such as lumpectomy, mastectomy or core needle biopsy).
对肿瘤进行早筛和早诊,可以更大概率地发现早期癌症,从而提高其五年生存率、降低死亡率。中国乳腺癌筛查现状:乳腺癌5年生存率:原位癌100%,I期84-100%,II期76-87%,III期38-77%。中国多中心研究显示,初诊乳腺癌时,15.7%I期,44.9%II期,18.7%III期,2.4%IV期。现有技术中的乳腺癌影像筛查手段包括:1.钼靶X射线(Mammography,MG):中国女性乳腺发病年龄较轻(45-55岁),乳腺腺体较西方女性致密,MG易遗漏,且MG对致密型乳腺病灶诊断敏感性仅为30%。2.B超: 单纯MG阳性预测值为55.5%,MG联合B超阳性预测值为43.3%。3.核磁共振(MRI):MRI对微小钙化灶不敏感,且有明确禁忌症。所以,肿瘤的早筛早诊早治疗对提高五年生存率具有非常大的意义,肿瘤早筛早诊的关键是建立有效的检测模型,寻找能够作为筛查与诊断的分子标记。Early screening and early diagnosis of tumors can increase the probability of early cancer detection, thereby improving its five-year survival rate and reducing mortality. Current status of breast cancer screening in China: 5-year survival rate of breast cancer: 100% for carcinoma in situ, 84-100% for stage I, 76-87% for stage II, and 38-77% for stage III. A multi-center study in China showed that when breast cancer was first diagnosed, 15.7% were stage I, 44.9% were stage II, 18.7% were stage III, and 2.4% were stage IV. Breast cancer image screening methods in the prior art include: 1. Mammography (Mammography, MG): The onset age of breast cancer in Chinese women is younger (45-55 years old), and the breast glands are denser than Western women, and MG is easy to miss , and the sensitivity of MG to the diagnosis of dense breast lesions is only 30%. 2. B-ultrasound: The positive predictive value of MG alone was 55.5%, and the positive predictive value of MG combined with B-ultrasound was 43.3%. 3. Nuclear magnetic resonance (MRI): MRI is not sensitive to tiny calcifications, and has clear contraindications. Therefore, early screening, early diagnosis and early treatment of tumors are of great significance to improving the five-year survival rate. The key to early screening and early diagnosis of tumors is to establish an effective detection model and find molecular markers that can be used as screening and diagnosis.
cfDNA(circulating-free DNA,循环DNA)是外周血中游离的核酸小片段DNA,其源自正常细胞或肿瘤细胞的代谢与凋亡,包含了体细胞突变和DNA甲基化等遗传信息。通过检测疾病特异性cfDNA片段,掌握疾病的发生、发展的技术,称为液体活检(Liquid Biopsy),与传统的组织活检相比,其有着迅速、便捷、损伤性小等众多优点。目前“液体活检”按照检测样品分类,主要有两大方向:循环肿瘤细胞(circulating tumor cells,CTCs)和循环肿瘤DNA(circulating tumor DNA,ctDNA)。2014年,来自Bert Vogelstein和Kenneth Kinzler团队的一份640例各类肿瘤的研究发现,超过75%的晚期胰腺癌、卵巢癌、结直肠癌、膀胱癌、胃食管癌、黑色素癌、肝细胞癌以及头颈癌症的患者,都能检测到ctDNA(circulating tumor DNA,循环肿瘤DNA)的存在。因而ctDNA是一种具有广泛适用性、敏感而特异的生物标志物,能够用于各式各样、多种不同类型癌症的临床和研究。2015年卢煜明教授通过cfDNA全基因组甲基化测序证明了液体活检替代组织活检在技术上和理论上的可行性;2017年张鹍教授团队利用ctDNA的甲基化定量描述了肿瘤负荷以及肿瘤来源的ctDNA图谱。Turner团队利用高通量测序寻找乳腺癌特异性体细胞突变位点,并监测其动态变化,证明ctDNA检测可早于CT发现肿瘤复发转移。最新研究发现,结合其它分析物检测血液中的ctDNA突变能更早、更好的诊断卵巢癌、肝癌、胃癌、乳腺癌、***癌、食管癌和结直肠癌这8种常见且可手术切除的癌。临床越来越多的研究表明血浆ctDNA可作为生物标志物应用在肿瘤早期诊断筛选、预测、治疗的反应,监测肿瘤大小和复发等。目前,国际上的研究方向是整合多组学/多种分子标志物、多基因/多位点来提高检测技术的灵敏度和特异性,以满足临床对检测产品的需求。目前,针对乳腺癌的诊断仍然存在准确率偏低等缺陷。cfDNA (circulating-free DNA, circulating DNA) is a small fragment of free nucleic acid DNA in peripheral blood, which originates from the metabolism and apoptosis of normal cells or tumor cells, and contains genetic information such as somatic mutation and DNA methylation. By detecting disease-specific cfDNA fragments, the technology of grasping the occurrence and development of diseases is called liquid biopsy (Liquid Biopsy). Compared with traditional tissue biopsy, it has many advantages such as rapidity, convenience, and less damage. At present, "liquid biopsy" is classified according to the test samples, and there are two main directions: circulating tumor cells (circulating tumor cells, CTCs) and circulating tumor DNA (circulating tumor DNA, ctDNA). In 2014, a study of 640 cases of various tumors from the team of Bert Vogelstein and Kenneth Kinzler found that more than 75% of advanced pancreatic cancer, ovarian cancer, colorectal cancer, bladder cancer, gastroesophageal cancer, melanoma, hepatocellular carcinoma As well as patients with head and neck cancer, the presence of ctDNA (circulating tumor DNA, circulating tumor DNA) can be detected. Therefore, ctDNA is a sensitive and specific biomarker with wide applicability, which can be used in clinical and research of various and many different types of cancers. In 2015, Professor Lu Yuming proved the technical and theoretical feasibility of liquid biopsy to replace tissue biopsy through cfDNA genome-wide methylation sequencing; in 2017, Professor Zhang Kun's team used ctDNA methylation to quantitatively describe the tumor burden and tumor origin. ctDNA profile. Turner's team used high-throughput sequencing to find breast cancer-specific somatic mutation sites and monitored their dynamic changes, proving that ctDNA detection can detect tumor recurrence and metastasis earlier than CT. The latest study found that combining other analytes to detect ctDNA mutations in the blood can diagnose ovarian cancer, liver cancer, gastric cancer, breast cancer, prostate cancer, esophageal cancer and colorectal cancer earlier and better. cancer. More and more clinical studies have shown that plasma ctDNA can be used as a biomarker in the early diagnosis and screening of tumors, prediction, treatment response, monitoring tumor size and recurrence, etc. At present, the international research direction is to integrate multi-omics/multiple molecular markers, multi-gene/multi-locus to improve the sensitivity and specificity of detection technology, so as to meet the clinical demand for detection products. At present, the diagnosis of breast cancer still has defects such as low accuracy rate.
发明内容Contents of the invention
发明要解决的问题The problem to be solved by the invention
针对现有技术中存在的问题,本公开为乳腺癌的诊断提供一种高特异性、灵敏性、准确率的生物标记物,以满足临床对乳腺癌检测产品的需求。Aiming at the problems existing in the prior art, the present disclosure provides a biomarker with high specificity, sensitivity and accuracy for the diagnosis of breast cancer, so as to meet the clinical demand for breast cancer detection products.
用于解决问题的方案solutions to problems
本发明人鉴于上述现有技术中存在的问题,进行了深入的研究、反复试验,通过构建乳腺癌特异性的甲基化数据库,用配对的乳腺癌及其癌旁组织和白细胞样本筛选出来源于乳腺癌组织的乳腺癌特异性的甲基化生物标记物,并基于乳腺组织和血浆样本构建了用于区分乳腺良恶性结节诊断和乳腺癌分子分型的甲基化模型,从而完成了本公开。In view of the problems existing in the above-mentioned prior art, the inventors conducted in-depth research and trial and error, by constructing a breast cancer-specific methylation database, and using paired breast cancer and its paracancerous tissues and white blood cell samples to screen out the source Breast cancer-specific methylation biomarkers based on breast cancer tissue, and a methylation model for the diagnosis of benign and malignant breast nodules and breast cancer molecular typing was constructed based on breast tissue and plasma samples, thus completing the This disclosure.
本公开涉及的技术方案如下:The technical solutions involved in this disclosure are as follows:
本公开在第一方面提供了一种用于乳腺癌诊断的甲基化生物标记物,其中,所述的甲基化生物标记物包括表1中提供的差异性甲基化区域DMR-1~DMR-521中的任一项或其任意组合。In the first aspect, the present disclosure provides a methylated biomarker for the diagnosis of breast cancer, wherein the methylated biomarker includes the differentially methylated regions DMR-1-1 provided in Table 1. Any one of DMR-521 or any combination thereof.
在一些具体的实施方案中,所述的甲基化生物标记物包括以下中的任一项:DMR-1~DMR-521中至少3个、5个、10个、20个、30个、40个、50个、60个、70个、80个、90个、100个、110个、120个、130个、150个、170个、200个或更多个差异性甲基化区域。In some specific embodiments, the methylated biomarkers include any one of the following: at least 3, 5, 10, 20, 30, 40 of DMR-1 to DMR-521 , 50, 60, 70, 80, 90, 100, 110, 120, 130, 150, 170, 200 or more differentially methylated regions.
在一些更优选的实施方案中,所述的用于乳腺癌诊断的甲基化生物标记物包括以下(a)-(p)中任一项或其任意组合:(a)marker组合1中的差异性甲基化区域;(b)marker组合2中的差异性甲基化区域;(c)marker组合3中的差异性甲基化区域;(d)marker组合4中的差异性甲基化区域;(e)marker组合5中 的差异性甲基化区域;(f)marker组合6中的差异性甲基化区域;(g)marker组合10中的差异性甲基化区域;(h)marker组合7中的差异性甲基化区域;(i)marker组合11中的差异性甲基化区域;(j)marker组合12中的差异性甲基化区域;(k)marker组合13中的差异性甲基化区域;(l)marker组合14中的差异性甲基化区域;(m)marker组合15中的差异性甲基化区域;(n)marker组合16中的差异性甲基化区域;(o)marker组合9中的差异性甲基化区域;(p)marker组合8中的差异性甲基化区域。In some more preferred embodiments, the methylation biomarkers for breast cancer diagnosis include any one of the following (a)-(p) or any combination thereof: (a) marker combination 1 Differentially methylated regions; (b) differentially methylated regions in marker combination 2; (c) differentially methylated regions in marker combination 3; (d) differentially methylated regions in marker combination 4 Region; (e) differentially methylated region in marker combination 5; (f) differentially methylated region in marker combination 6; (g) differentially methylated region in marker combination 10; (h) Differentially methylated regions in marker combination 7; (i) differentially methylated regions in marker combination 11; (j) differentially methylated regions in marker combination 12; (k) differentially methylated regions in marker combination 13 Differentially methylated region; (l) differentially methylated region in marker combination 14; (m) differentially methylated region in marker combination 15; (n) differentially methylated region in marker combination 16 Region; (o) differentially methylated region in marker combination 9; (p) differentially methylated region in marker combination 8.
在一些具体的实施方案中,所述的乳腺癌为受试者中的乳腺癌,所述的受试者为哺乳动物;优选地,所述的哺乳动物为人。In some specific embodiments, the breast cancer is breast cancer in a subject, and the subject is a mammal; preferably, the mammal is a human.
在一些具体的实施方案中,所述的乳腺癌选自ER/PR/Her2 +型、ER +/PR/Her2 -型和ER -/PR -/Her2 -型乳腺癌;任选地,所述的乳腺癌选自0期、I期、II期、III期和IV期乳腺癌。 In some specific embodiments, the breast cancer is selected from ER/PR/Her2 + type, ER + /PR/Her2 - type and ER- /PR- / Her2 - type breast cancer; optionally, the The breast cancer is selected from stage 0, stage I, stage II, stage III and stage IV breast cancer.
在一些具体的实施方案中,所述的诊断为区分乳腺结节为良性或恶性。In some specific embodiments, the diagnosis is to distinguish benign or malignant breast nodules.
在一些优选的实施方案中,在区分乳腺结节为良性或恶性时,所述的甲基化生物标记物包括:marker组合1中的差异性甲基化区域、marker组合2中的差异性甲基化区域、marker组合3中的差异性甲基化区域、marker组合4中的差异性甲基化区域、marker组合5中的差异性甲基化区域、marker组合6中的差异性甲基化区域、marker组合10中的差异性甲基化区域、marker组合7中的差异性甲基化区域、marker组合11中的差异性甲基化区域、marker组合12、marker组合13中的差异性甲基化区域、marker组合14中的差异性甲基化区域、marker组合15中的差异性甲基化区域和marker组合16中的差异性甲基化区域中的差异性甲基化区域中任一项或其组合。In some preferred embodiments, when distinguishing breast nodules as benign or malignant, the methylation biomarkers include: differentially methylated regions in marker combination 1, differential methylation regions in marker combination 2 methylated region, differentially methylated region in marker combination 3, differentially methylated region in marker combination 4, differentially methylated region in marker combination 5, differentially methylated region in marker combination 6 Regions, differentially methylated regions in marker group 10, differentially methylated regions in marker group 7, differentially methylated regions in marker group 11, differentially methylated regions in marker group 12, and marker group 13 Any of the methylated region, the differentially methylated region in marker combination 14, the differentially methylated region in marker combination 15, and the differentially methylated region in marker combination 16 items or combinations thereof.
在一些更优选的实施方案中,所述的甲基化生物标记物包括marker组合5中的差异性甲基化区域;和/或,所述的甲基化生物标记物包括marker组合6中的差异性甲基化区域;和/或,所述的甲基化生物标记物包括marker组合10中的差异性甲基化区域;和/或,所述的甲基化生物标记物包括marker组合7中的差异性甲基化区域;和/或,所述的甲基化生物标记物包括marker组合13中的差异性甲基化区域;和/或,所述的甲基化生物标记物包括marker组合11中的差异性甲基化区域;和/或,所述的甲基化生物标记物包括marker组合12中的差异性甲基化区域。In some more preferred embodiments, the methylated biomarkers include the differentially methylated regions in marker combination 5; and/or, the methylated biomarkers include the differentially methylated regions in marker combination 6 Differentially methylated regions; and/or, the methylated biomarkers include differentially methylated regions in marker combination 10; and/or, the methylated biomarkers include marker combination 7 differentially methylated regions in; and/or, the methylated biomarkers include differentially methylated regions in marker combination 13; and/or, the methylated biomarkers include marker The differentially methylated region in combination 11; and/or, the methylated biomarker includes the differentially methylated region in marker combination 12.
在另外一些优选的实施方案中,所述的甲基化生物标记物为marker组合1中的差异性甲基化区域、marker组合2中的差异性甲基化区域、marker组合3中的差异性甲基化区域、marker组合4中的差异性甲基化区域、marker组合5中的差异性甲基化区域、marker组合6中的差异性甲基化区域、marker组合10中的差异性甲基化区域、marker组合7中的差异性甲基化区域、marker组合11中的差异性甲基化区域、marker组合12中的差异性甲基化区域、marker组合13中的差异性甲基化区域、marker组合13中的差异性甲基化区域、marker组合14中的差异性甲基化区域、marker组合15中的差异性甲基化区域或marker组合16中的差异性甲基化区域。In other preferred embodiments, the methylated biomarkers are differentially methylated regions in marker combination 1, differentially methylated regions in marker combination 2, and differentially methylated regions in marker combination 3. Methylated regions, differentially methylated regions in marker combination 4, differentially methylated regions in marker combination 5, differentially methylated regions in marker combination 6, differentially methylated regions in marker combination 10 Differentially methylated regions, differentially methylated regions in marker combination 7, differentially methylated regions in marker combination 11, differentially methylated regions in marker combination 12, differentially methylated regions in marker combination 13 , the differentially methylated region in marker combination 13, the differentially methylated region in marker combination 14, the differentially methylated region in marker combination 15, or the differentially methylated region in marker combination 16.
进一步优选地,所述的甲基化生物标记物为marker组合5中的差异性甲基化区域、marker组合6中的差异性甲基化区域、marker组合10中的差异性甲基化区域、marker组合7中的差异性甲基化区域、marker组合13中的差异性甲基化区域、marker组合11中的差异性甲基化区域或marker组合12中的差异性甲基化区域。Further preferably, the methylated biomarker is a differentially methylated region in marker combination 5, a differentially methylated region in marker combination 6, a differentially methylated region in marker combination 10, Differentially methylated regions in marker combination 7, differentially methylated regions in marker combination 13, differentially methylated regions in marker combination 11, or differentially methylated regions in marker combination 12.
在一个优选的实施方案中,所述的甲基化生物标记物为marker组合6中的差异性甲基化区域。In a preferred embodiment, the methylated biomarker is a differentially methylated region in marker combination 6.
在一个优选的实施方案中,所述的甲基化生物标记物为marker组合10中的差异性甲基化区域。In a preferred embodiment, the methylated biomarker is a differentially methylated region in marker combination 10.
在一个优选的实施方案中,所述的甲基化生物标记物为marker组合7中的差异性甲基化区域。In a preferred embodiment, the methylated biomarker is a differentially methylated region in marker combination 7.
在一个优选的实施方案中,所述的甲基化生物标记物为marker组合13中的差异性甲基化区域。In a preferred embodiment, the methylated biomarker is a differentially methylated region in marker combination 13.
在另一些具体的实施方案中,所述的诊断为鉴别乳腺癌的分子亚型;优选地,所述的乳腺癌的 分子亚型包括ER/PR/Her2 +型、ER +/PR/Her2 -型和ER -/PR -/Her2 -型乳腺癌。 In other specific embodiments, the diagnosis is to identify the molecular subtype of breast cancer; preferably, the molecular subtype of breast cancer includes ER/PR/Her2 + type, ER + /PR/Her2 - type and ER - /PR - /Her2 - type breast cancer.
在一些优选的实施方案中,在鉴别乳腺癌的分子亚型时,所述的甲基化生物标记物包括:marker组合9中的差异性甲基化区域或marker组合8中的差异性甲基化区域,或其组合。在一些实施方案中,当待测样本为组织时,用于鉴别乳腺癌的分子亚型的甲基化生物标记物可以选择marker组合8中的差异性甲基化区域或者marker组合9中的差异性甲基化区域。在另一些实施方案中,当待测样品是血浆、血清或血液时,用于鉴别乳腺癌的分子亚型的甲基化生物标记物可以选择marker组合9中的差异性甲基化区域。In some preferred embodiments, when identifying molecular subtypes of breast cancer, the methylated biomarkers include: differentially methylated regions in marker combination 9 or differentially methylated regions in marker combination 8 areas, or a combination thereof. In some embodiments, when the sample to be tested is a tissue, the methylated biomarker used to identify the molecular subtype of breast cancer can select the differentially methylated region in marker combination 8 or the difference in marker combination 9 Sexually methylated regions. In other embodiments, when the sample to be tested is plasma, serum or blood, the methylated biomarkers used to identify molecular subtypes of breast cancer can select differentially methylated regions in marker combination 9.
本公开在第二方面提供了一种乳腺癌诊断试剂盒,其包含检测待测样品中的如本公开的第一方面所述的甲基化生物标记物的甲基化状态的试剂。The second aspect of the present disclosure provides a diagnostic kit for breast cancer, which includes a reagent for detecting the methylation status of the methylated biomarker described in the first aspect of the present disclosure in a sample to be tested.
在一些具体的实施方案中,所述的待测样品是血浆、血清、血液、组织、或其任意组合;优选地,所述的组织为乳腺组织,更优选地为乳腺结节组织。In some specific embodiments, the sample to be tested is plasma, serum, blood, tissue, or any combination thereof; preferably, the tissue is breast tissue, more preferably breast nodule tissue.
在一些具体的实施方案中,所述的诊断为区分乳腺结节为良性或恶性。In some specific embodiments, the diagnosis is to distinguish benign or malignant breast nodules.
在另一些具体的实施方案中,所述的诊断为鉴别乳腺癌的分子亚型;优选地,所述的乳腺癌的分子亚型包括ER/PR/Her2 +型、ER +/PR/Her2 -型和ER -/PR -/Her2 -型。 In other specific embodiments, the diagnosis is to identify the molecular subtype of breast cancer; preferably, the molecular subtype of breast cancer includes ER/PR/Her2 + type, ER + /PR/Her2 - type and ER - /PR - /Her2 -type .
在一些具体的实施方案中,所述的试剂为选自以下的检测甲基化状态的方法中所使用的试剂:焦磷酸测序法、重亚硫酸盐转化测序法、甲基化芯片法、qPCR法、数字PCR法、二代测序法、三代测序法、全基因组甲基化测序法、DNA富集检测法、简化亚硫酸氢盐测序技术、HPLC法、MassArray、甲基化特异PCR、或其任意组合。In some specific embodiments, the reagent is a reagent selected from the following methods for detecting methylation status: pyrosequencing method, bisulfite conversion sequencing method, methylation chip method, qPCR method, digital PCR method, second-generation sequencing method, third-generation sequencing method, genome-wide methylation sequencing method, DNA enrichment detection method, simplified bisulfite sequencing technology, HPLC method, MassArray, methylation-specific PCR, or random combination.
本公开在第三方面提供了如本公开的第一方面所述的甲基化生物标记物在制备用于诊断受试者是否罹患乳腺癌的试剂盒中的应用。In a third aspect, the present disclosure provides the use of the methylation biomarker according to the first aspect of the present disclosure in preparing a kit for diagnosing whether a subject suffers from breast cancer.
在一些具体的实施方案中,所述的诊断为区分乳腺结节为良性或恶性;另一些具体的实施方案中,所述的诊断为鉴别乳腺癌的分子亚型;优选地,所述的乳腺癌的分子亚型包括ER/PR/Her2 +型、ER +/PR/Her2 -型和ER -/PR -/Her2 -型。 In some specific embodiments, the diagnosis is to distinguish benign or malignant breast nodules; in other specific embodiments, the diagnosis is to identify molecular subtypes of breast cancer; preferably, the breast Molecular subtypes of carcinoma include ER/PR/Her2 + type, ER + /PR/Her2 - type and ER- / PR- /Her2 - type.
本公开在第四方面提供了一种乳腺癌的诊断方法,所述诊断方法包括:The present disclosure provides a method for diagnosing breast cancer in a fourth aspect, the diagnosing method comprising:
检测来自待测受试者的待测样品中的如本公开的第一方面所述的甲基化生物标记物的甲基化状态;以及detecting the methylation status of the methylated biomarker according to the first aspect of the present disclosure in a test sample from a test subject; and
所述的甲基化生物标记物的甲基化状态不同于在良性乳腺结节受试者中测定的甲基化生物标记物的甲基化状态时,将所述待测受试者鉴定为患有乳腺癌和/或鉴定所述待测受试者的乳腺癌的分子亚型;优选地,所述的乳腺癌的分子亚型选自ER/PR/Her2 +型、ER +/PR/Her2 -型和ER -/PR -/Her2 -型。 When the methylation state of the methylation biomarker is different from the methylation state of the methylation biomarker determined in the benign breast nodule subject, the subject to be tested is identified as having Having breast cancer and/or identifying the molecular subtype of breast cancer in the subject to be tested; preferably, the molecular subtype of breast cancer is selected from ER/PR/Her2 + type, ER + /PR/Her2 -type and ER- / PR- /Her2 - type.
在一些具体的实施方案中,所述的待测样品是血浆、血清、血液、组织或其任意组合;优选地,所述的组织为乳腺组织。In some specific embodiments, the sample to be tested is plasma, serum, blood, tissue or any combination thereof; preferably, the tissue is breast tissue.
在一些具体的实施方案中,检测甲基化生物标记物的甲基化状态的方法选自:焦磷酸测序法、重亚硫酸盐转化测序法、甲基化芯片法、qPCR法、数字PCR法、二代测序法、三代测序法、全基因组甲基化测序法、DNA富集检测法、简化亚硫酸氢盐测序技术、HPLC法、MassArray、甲基化特异PCR、或其任意组合。In some specific embodiments, the method for detecting the methylation status of a methylated biomarker is selected from the group consisting of: pyrosequencing, bisulfite conversion sequencing, methylation chip, qPCR, digital PCR , second-generation sequencing, third-generation sequencing, genome-wide methylation sequencing, DNA enrichment detection, simplified bisulfite sequencing, HPLC, MassArray, methylation-specific PCR, or any combination thereof.
发明的效果The effect of the invention
(1)本公开提供了用于乳腺癌早期诊断的甲基化生物标记物,具体地,研究获得了在乳腺癌组织和血浆cfDNA中明显异常甲基化修饰模式的基因组片段,即DMR-1-DMR-521。(1) The present disclosure provides methylation biomarkers for early diagnosis of breast cancer, specifically, research has obtained a genomic fragment with an obviously abnormal methylation modification pattern in breast cancer tissue and plasma cfDNA, namely DMR-1 -DMR-521.
(2)通过研究这些乳腺组织和血浆cfDNA甲基化修饰模式的基因组片段在各期乳腺癌患者及患 有乳腺结节良性疾病的人群中的甲基化修饰差异,发现以上血浆基因组片段组合都能够作为鉴别乳腺良恶性结节的诊断标记物。而且经过发明人创造性的研究发现,尤其是marker组合1、marker组合2、marker组合3、marker组合4、marker组合5、marker组合6、marker组合7、marker组合10、marker组合11、marker组合12、marker组合13、marker组合14、marker组合15、marker组合16都具有区分乳腺结节良恶性的能力,无论基于乳腺组织样本或是血浆样本。其中的marker组合6、marker组合10、marker组合7和marker组合13效果更佳;而marker组合6采用相对较少的标记物,取得了更佳的诊断效果。(2) By studying the methylation differences of these genome fragments of breast tissue and plasma cfDNA methylation patterns in breast cancer patients at various stages and people with breast nodular benign diseases, it was found that the above combinations of plasma genome fragments were all It can be used as a diagnostic marker for differentiating benign and malignant breast nodules. Moreover, through the inventor's creative research, it was found that especially marker combination 1, marker combination 2, marker combination 3, marker combination 4, marker combination 5, marker combination 6, marker combination 7, marker combination 10, marker combination 11, and marker combination 12 , marker combination 13, marker combination 14, marker combination 15, and marker combination 16 all have the ability to distinguish benign from malignant breast nodules, no matter based on breast tissue samples or plasma samples. Among them, marker combination 6, marker combination 10, marker combination 7, and marker combination 13 have better effects; while marker combination 6 uses relatively fewer markers and has achieved better diagnostic results.
(3)本公开联合分析至少1个基因组片段上的多个甲基化胞嘧啶的共甲基化特征作为判断乳腺癌发病的生物标记物准确度远远大于其它血浆生物标记物的检测,降低假阳性和假阴性的发生。(3) The joint analysis of the co-methylation characteristics of multiple methylated cytosines on at least one genome fragment as a biomarker for breast cancer is far more accurate than other plasma biomarkers, reducing The occurrence of false positives and false negatives.
附图说明Description of drawings
图1为关于521个DMR区域的marker在恶性乳腺结节中甲基化率热图。Figure 1 is a heat map of the methylation rate of markers in malignant breast nodules in 521 DMR regions.
图2为不同分子标记物组合对乳腺组织样本进行乳腺结节良恶性检测的ROC图。Fig. 2 is a ROC chart for the detection of benign and malignant breast nodules in breast tissue samples with different molecular marker combinations.
图3为不同分子标记物组合对血浆样本进行乳腺结节良恶性检测的ROC图。Fig. 3 is the ROC chart of different molecular marker combinations for the detection of benign and malignant breast nodules in plasma samples.
图4为分子标记物组合marker组合8对乳腺组织样本进行乳腺恶性结节亚型分类。Figure 4 is the subtype classification of breast malignant nodules performed by molecular marker combination marker combination 8 on breast tissue samples.
具体实施方式Detailed ways
定义definition
除非另有定义,本公开所使用的所有的技术和科学术语与属于本公开的技术领域的技术人员通常理解的含义相同。本公开的说明书中所使用的术语只是为了描述具体的实施例的目的,不用于限制本公开。Unless otherwise defined, all technical and scientific terms used in this disclosure have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Terms used in the description of the present disclosure are for the purpose of describing specific embodiments only, and are not used to limit the present disclosure.
本公开的术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤的过程、方法、装置、产品或设备没有限定于已列出的步骤或模块,而是可选地还包括没有列出的步骤,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤。The terms "including" and "having" and any variations thereof in this disclosure are intended to cover a non-exclusive inclusion. For example, a process, method, device, product or equipment that includes a series of steps is not limited to the listed steps or modules, but optionally also includes steps that are not listed, or optionally also includes for these processes, Other steps inherent in a method, product, or apparatus.
在本公开中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。The "plurality" mentioned in the present disclosure means two or more. "And/or" describes the association relationship of associated objects, indicating that there may be three types of relationships, for example, A and/or B may indicate: A exists alone, A and B exist simultaneously, and B exists independently. The character "/" generally indicates that the contextual objects are an "or" relationship.
术语“乳腺癌”以最广泛的含义使用并且是指在***中开始的所有癌症。其包括以下亚型:原位导管癌、侵入性导管癌(包括乳腺小管癌、乳腺髓样癌、乳腺黏液癌、乳腺***状癌和乳腺筛状癌)、侵入性小叶癌、炎性乳腺癌、原位小叶癌、男性乳腺癌、***佩吉特病(Paget′s disease of the nipple)和乳腺叶状肿瘤。其还包括以下阶段(如由括号中对应的TNM分类所限定):0期(Tis,N0,M0)、IA期(T1,N0,M0)、IB期(T0或T1,N1mi,M0)、IIA期(T0或T1,N1(但不是N1mi),M0;或T2,N0,M0)、IIB期(T2,N1,M0;或T3,N0,M0)、IIIA期(T0至T2,N2,M0;或T3,N1或N2,M0)、IIIB期(T4,N0至N2,M0)、IIIC期(任意T,N3,M0)和IV期(任意T,任意N,M1)。The term "breast cancer" is used in the broadest sense and refers to all cancers that start in the breast. It includes the following subtypes: ductal carcinoma in situ, invasive ductal carcinoma (including ductal, medullary, mucinous, papillary, and cribriform breast carcinomas), invasive lobular carcinoma, inflammatory breast carcinoma , lobular carcinoma in situ, male breast cancer, Paget's disease of the nipple, and phyllodes neoplasms of the breast. It also includes the following stages (as defined by the corresponding TNM classification in brackets): Stage 0 (Tis, N0, M0), Stage IA (T1, N0, M0), Stage IB (T0 or T1, N1mi, M0), Stage IIA (T0 or T1, N1 (but not N1mi), M0; or T2, N0, M0), stage IIB (T2, N1, M0; or T3, N0, M0), stage IIIA (T0 to T2, N2, M0; or T3, N1 or N2, M0), stage IIIB (T4, N0 to N2, M0), stage IIIC (any T, N3, M0) and stage IV (any T, any N, M1).
在本公开中,术语乳腺癌的“分子分型”是指基于乳腺癌肿瘤组织的基因表达谱(gene expression profile)建立的乳腺癌分类方法。可以使用的乳腺癌的分子分型***包括但不限于PAM50(Prosigna)(参见,例如Parker,J.S.et al.,Supervised risk predictor of breast cancer based onintrinsic subtypes.J.Clin.Oncol.2009,27:1160-1167)和乳腺癌72基因分子分型(参见,例如Yang B.et al.,An assessment of prognostic immunity markers in breast cancer.NP J breast cancer,2018,4:35。作为示例,PAM50将乳腺癌分为管腔A(Luminal A)、管腔B(Luminal B)、基底细胞型(Basal-like)及Her2富集型(Her2-enriched)四个亚型。作为另一示例,乳腺癌72基因分子分型将乳腺癌分为管腔A、管腔B、基底细胞型、Her2 富集型和免疫增强型。In the present disclosure, the term "molecular typing" of breast cancer refers to a breast cancer classification method based on the gene expression profile of breast cancer tumor tissue. The molecular typing system of breast cancer that can be used includes but is not limited to PAM50 (Prosigna) (see, for example Parker, J.S.et al., Supervised risk predictor of breast cancer based onintrinsic subtypes.J.Clin.Oncol.2009,27:1160 -1167) and breast cancer 72 gene molecular typing (see, for example, Yang B. et al., An assessment of prognostic immunity markers in breast cancer. NP J breast cancer, 2018, 4:35. As an example, PAM50 will breast cancer Divided into four subtypes: Luminal A (Luminal A), Luminal B (Luminal B), Basal-like (Basal-like) and Her2-enriched (Her2-enriched). As another example, breast cancer 72 gene Molecular typing divides breast cancer into luminal A, luminal B, basal, Her2-enriched, and immune-enhanced types.
在本说明书中,基于***受体(Estrogen Receptor,ER),孕激素受体(Progesterone Receptor,PR)以及人表皮生长因子受体2(Human Epidermal GrowthFactor Receptor-2,Her2)的表达,将乳腺癌分为ER/PR/Her2 +型、ER +/PR/Her2 -型和ER -/PR -/Her2 -型乳腺癌。其中,ER/PR/Her2 +型乳腺癌包括Lumina B且Her2阳性,以及其他的Her2过表达亚型;ER +/PR/Her2 -型乳腺癌包括Lumina A且Her2阴性,以及Lumina B且Her2阴性;ER -/PR -/Her2 -型乳腺癌,即三阴性乳腺癌(TNBC)。 In this specification, based on the expression of estrogen receptor (Estrogen Receptor, ER), progesterone receptor (Progesterone Receptor, PR) and human epidermal growth factor receptor 2 (Human Epidermal GrowthFactor Receptor-2, Her2), the mammary gland Carcinoma is divided into ER/PR/Her2 + type, ER + /PR/Her2 - type and ER - /PR - /Her2 - type breast cancer. Among them, ER/PR/Her2 + breast cancer includes Lumina B and Her2 positive, and other Her2 overexpression subtypes; ER + /PR/Her2 - breast cancer includes Lumina A and Her2 negative, and Lumina B and Her2 negative ; ER- / PR- /Her2 - type breast cancer, namely triple-negative breast cancer (TNBC).
如本文所用,术语“乳腺结节”是一种症状,常见于乳腺增生(可形成乳腺囊肿)及乳腺肿瘤性疾病,包括乳腺良性肿瘤(如乳腺纤维瘤、分叶状肿瘤等)以及乳腺恶性肿瘤(乳腺癌)。As used herein, the term "breast nodule" is a symptom commonly seen in breast hyperplasia (which can form breast cysts) and breast neoplastic diseases, including benign breast tumors (such as breast fibroids, phyllodes, etc.) and breast malignancies Tumor (breast cancer).
如本文所用,“受试者”是指可以对之施用或施加所提供的组合物、方法、试剂盒、装置和***的生物体或所述生物体的一部分或组分。例如,所述受试者可以是哺乳动物或所述哺乳动物的细胞、组织、器官或一部分。如本文所用,“哺乳动物”是指任何种类的哺乳动物,优选人(包括人、人受试者或人患者)。受试者和哺乳动物包括,但不限于,农场动物、运动动物、宠物、灵长类动物、马、狗、猫和啮齿类动物如小鼠和大鼠。As used herein, "subject" refers to an organism, or a part or component thereof, to which the provided compositions, methods, kits, devices and systems may be administered or applied. For example, the subject can be a mammal or a cell, tissue, organ or part of the mammal. As used herein, "mammal" refers to any kind of mammal, preferably a human (including a human, human subject or human patient). Subjects and mammals include, but are not limited to, farm animals, sport animals, pets, primates, horses, dogs, cats, and rodents such as mice and rats.
如本文所用,术语“样品”是指可能包含需要进行分析的靶分子的任何物质,包括生物样品。如本文所用,“生物样品”是指从活的或病毒性(或朊病毒的)来源或其他大分子和生物分子来源获得的任何样品,并且包括可以从之获得核酸、蛋白质和/或其他大分子的受试者的任何细胞类型或组织。生物样品可以是直接从生物来源获得的样品或者是被处理的样品。例如,被扩增的分离的核酸构成生物样品。生物样品包括,但不限于,体液(例如血液、血浆、血清、脑脊髓液、滑液、尿液、汗液、***、粪便、痰、眼泪、粘液、羊水等)、渗出液、骨髓样品、腹水、骨盆冲洗液、胸膜液、脊髓液、淋巴液、眼液、鼻、喉或生殖器拭子的提取物、消化组织的细胞悬浮液、或粪类物质的提取物、以及来自人、动物(例如非人哺乳动物)和植物的组织和器官样品,以及由此衍生出的加工样品。As used herein, the term "sample" refers to any substance, including biological samples, that may contain target molecules that need to be analyzed. As used herein, "biological sample" refers to any sample obtained from a living or viral (or prion) source or other source of macromolecules and biomolecules, and includes samples from which nucleic acids, proteins, and/or other macromolecules can be obtained. Any cell type or tissue of the subject of the molecule. A biological sample may be a sample obtained directly from a biological source or a processed sample. For example, isolated nucleic acid that is amplified constitutes a biological sample. Biological samples include, but are not limited to, body fluids (e.g., blood, plasma, serum, cerebrospinal fluid, synovial fluid, urine, sweat, semen, feces, sputum, tears, mucus, amniotic fluid, etc.), exudates, bone marrow samples, Ascites, pelvic flushing fluid, pleural fluid, spinal fluid, lymphatic fluid, eye fluid, extracts from nasal, throat, or genital swabs, cell suspensions of digested tissues, or extracts of fecal matter, as well as extracts from human, animal ( For example, tissue and organ samples of non-human mammals) and plants, and processed samples derived therefrom.
如本文所用,“扩增”通常是指产生所需序列的多个拷贝的过程。“多个拷贝”是指至少两个拷贝。“拷贝”并不一定意味着与模板序列具有完美的序列互补性或同一性。例如,拷贝可以包括核苷酸类似物如脱氧肌苷,有意的序列改变(例如通过包含与模板可杂交但不互补的序列的引物引入的序列改变),和/或在扩增过程中发生的序列错误。As used herein, "amplification" generally refers to the process of producing multiple copies of a desired sequence. "Multiple copies" means at least two copies. "Copy" does not necessarily imply perfect sequence complementarity or identity to the template sequence. For example, copies may include nucleotide analogs such as deoxyinosine, deliberate sequence changes (such as those introduced by primers containing sequences that are hybridizable but not complementary to the template), and/or Sequence error.
“序列确定”等包括确定与核酸的核苷酸碱基序列有关的信息。这样的信息可以包括对核酸的部分或全部序列信息的鉴定或确定。可以以不同程度的统计可靠性或置信度来确定序列信息。在一个方面,所述术语包括确定核酸中多个连续核苷酸的身份和顺序。"Sequence determination" and the like include determination of information on the nucleotide base sequence of a nucleic acid. Such information may include the identification or determination of partial or full sequence information for a nucleic acid. Sequence information can be determined with varying degrees of statistical reliability or confidence. In one aspect, the term includes determining the identity and order of a plurality of contiguous nucleotides in a nucleic acid.
术语“测序”、“高通量测序”或“下一代测序”包括使用这样的方法进行序列确定:所述方法以本质上平行的方式确定许多(通常数千至数十亿)个核酸序列,即在这种方法中,制备DNA模板并不是用于每次测序一个,而是以批量过程进行,并且在这种方法中许多序列优选地被并行读取,或者使用本身可以并行化的超高通量串行过程读取。此类方法包括但不限于焦磷酸测序(例如,如454Life Sciences,Inc.,Branford,CT所商业化的);通过连接进行测序(例如,如SOLiDTM技术,Life Technologies,Inc.,Carlsbad,CA所商业化的);使用修饰的核苷酸通过合成进行测序(例如,如Illumina,Inc.,San Diego,CA所商业化的TruSeq TM和HiSeq TM技术,Helicos Biosciences Corporation,Cambridge,MA所商业化的HeliScope TM;和Pacific Biosciences of California,Inc.,Menlo Park,CA所商业化的PacBio RS),通过离子检测技术进行测序(例如,Ion Torrent TM技术,Life Technologies,Carlsbad,CA);DNA纳米球测序(Complete Genomics,Inc.,Mountain View,CA);基于纳米孔的测序技术(例如,由Oxford Nanopore Technologies,LTD,Oxford,UK所开发的)等高度并行的测序方法。 The terms "sequencing", "high-throughput sequencing" or "next generation sequencing" include sequence determination using methods that determine many (typically thousands to billions) of nucleic acid sequences in an essentially parallel fashion, That is, in this method, DNA templates are not prepared for sequencing one at a time, but are done in a batch process, and in this method many sequences are preferably read in parallel, or using ultra-high Flux serial process reads. Such methods include, but are not limited to, pyrosequencing (e.g., as commercialized by 454 Life Sciences, Inc., Branford, CT); commercially available); sequencing by synthesis using modified nucleotides (e.g., TruSeq and HiSeq technologies as commercialized by Illumina, Inc., San Diego, CA, commercialized by Helicos Biosciences Corporation, Cambridge, MA) HeliScope ; and PacBio RS commercialized by Pacific Biosciences of California, Inc., Menlo Park, CA), sequenced by ion detection technology (e.g., Ion Torrent technology, Life Technologies, Carlsbad, CA); DNA nanosphere sequencing (Complete Genomics, Inc., Mountain View, CA); highly parallel sequencing methods such as nanopore-based sequencing technology (for example, developed by Oxford Nanopore Technologies, LTD, Oxford, UK).
如本文所用,“甲基化”是指在胞嘧啶的C5或N4位置、腺嘌呤的N6位置的胞嘧啶甲基化或其它类 型的核酸甲基化。体外扩增的DNA通常为未甲基化的,因为典型的体外DNA扩增方法不能保持扩增模板的甲基化模式。然而,“未甲基化的DNA”或“甲基化的DNA”也可以指其原始模板分别为未甲基化的或甲基化的扩增DNA。如本文所用,核酸分子的“甲基化状态”、“甲基化谱”和“甲基化状况”是指在核酸分子中存在或不存在一个或多个甲基化核苷酸碱基。例如,包含甲基化胞嘧啶的核酸分子被视为甲基化的(例如,核酸分子的甲基化状态为甲基化的)。不含任何甲基化核苷酸的核酸分子被视为未甲基化的。As used herein, "methylation" refers to the methylation of cytosine at the C5 or N4 position of cytosine, the N6 position of adenine, or other types of nucleic acid methylation. In vitro amplified DNA is usually unmethylated because typical in vitro DNA amplification methods do not preserve the methylation pattern of the amplified template. However, "unmethylated DNA" or "methylated DNA" may also refer to amplified DNA whose original template is unmethylated or methylated, respectively. As used herein, "methylation status", "methylation profile" and "methylation status" of a nucleic acid molecule refer to the presence or absence of one or more methylated nucleotide bases in a nucleic acid molecule. For example, a nucleic acid molecule comprising methylated cytosines is considered methylated (eg, the methylation state of the nucleic acid molecule is methylated). A nucleic acid molecule that does not contain any methylated nucleotides is considered unmethylated.
本文统计甲基化区域指在窗口200bp范围内至少3个连续的CpG区域。本文统计甲基化程度值(即,beta(β)值)指样本在某甲基化区域内甲基化reads与所有reads的比值,取值0-1间,其中至少连续3个CpG都存在甲基化的reads视为甲基化reads,否则为非甲基化reads,甲基化reads与非甲基化reads之和为所有reads。本文通过对样本的甲基化区域β值进行随机森林等算法的模型构建,产生每种分类的概率值。良恶性结节二分类阈值(Cutoff)通过测试集中样本的恶性概率值在约登指数(Youden Index)最大值条件下进行划定,样本的恶性概率值大于或等于该阈值则预测为恶性结节,否则为良性结节。恶性亚型多分类则通过对样本计算其在各个分类亚型的概率值,样本的最大概率值所对应分类亚型为该样本恶性亚型分类的预测结果。The statistically methylated region in this paper refers to at least 3 consecutive CpG regions within the window of 200bp. In this paper, the statistical methylation degree value (ie, beta (β) value) refers to the ratio of methylated reads to all reads in a certain methylated region of the sample, with a value between 0 and 1, in which at least 3 consecutive CpGs exist Methylated reads are regarded as methylated reads, otherwise they are unmethylated reads, and the sum of methylated reads and unmethylated reads is all reads. In this paper, the model construction of random forest and other algorithms is carried out on the β value of the methylation region of the sample to generate the probability value of each classification. Benign and malignant nodules binary classification threshold (Cutoff) is defined by the malignant probability value of the sample in the test set under the maximum value of Youden Index (Youden Index). If the malignant probability value of the sample is greater than or equal to the threshold value, it is predicted as a malignant nodule , otherwise it is a benign nodule. The multi-classification of malignant subtypes calculates the probability value of each classification subtype for the sample, and the classification subtype corresponding to the maximum probability value of the sample is the prediction result of the malignant subtype classification of the sample.
特定核酸序列(例如,如本文所述的甲基化生物标记物,或差异性甲基化区域DMR-1~DMR-521)的甲基化状态可以指示该序列中每一个碱基的甲基化状态,或可以指示该序列内碱基子集(例如,一个或多个胞嘧啶)的甲基化状态,或可以指示该序列内关于区域甲基化密度的信息,其中提供或不提供该序列内发生甲基化的位置的精确信息。The methylation status of a particular nucleic acid sequence (e.g., a methylation biomarker as described herein, or the differentially methylated region DMR-1 to DMR-521) can indicate the methylation status of each base in the sequence Methylation status, or can indicate the methylation status of a subset of bases (for example, one or more cytosines) within the sequence, or can indicate information about the methylation density of a region within the sequence, where the Precise information on where methylation occurs within the sequence.
甲基化状况可任选地由“甲基化值”(例如,表示甲基化频率、分率、比率、百分比等)表示或指示。可例如通过以下方式产生甲基化值:对在用甲基化依赖性限制酶进行限制消化后存在的完整核酸的量进行定量,或比较亚硫酸氢盐处理后的扩增谱,或比较亚硫酸氢盐处理与未处理的核酸的序列。因此,值(即,甲基化值)代表甲基化状况,并因此可用作在多个基因座拷贝中的甲基化状况的定量指标。当期望对样品中序列的甲基化状况与阈值或参照值进行比较时,这是特别有用的。Methylation status can optionally be represented or indicated by a "methylation value" (eg, representing a methylation frequency, fraction, ratio, percentage, etc.). Methylation values can be generated, for example, by quantifying the amount of intact nucleic acid present after restriction digestion with a methylation-dependent restriction enzyme, or comparing amplification profiles after bisulfite treatment, or comparing sub- Sequences of bisulfate-treated and untreated nucleic acids. Thus, the value (ie, the methylation value) represents the methylation status, and thus can be used as a quantitative indicator of the methylation status in multiple copies of the locus. This is particularly useful when it is desired to compare the methylation status of sequences in a sample to a threshold or reference value.
如本文所用,“甲基化频率”或“甲基化百分比(%)”是指分子或基因座为甲基化的实例数相对于分子或基因座为未甲基化的实例数。As used herein, "methylation frequency" or "percent methylation (%)" refers to the number of instances of a molecule or locus that is methylated relative to the number of instances of the molecule or locus that is unmethylated.
因此,甲基化状态描述核酸(例如,基因组序列)的甲基化的状态。此外,甲基化状态是指核酸片段在特定的基因组基因座处与甲基化相关的特性。此类特性包括但不限于:此DNA序列内的任何胞嘧啶(C)残基是否为甲基化的,甲基化C残基的位置,遍及核酸的任何特定区的甲基化C的频率或百分比,以及因例如等位基因来源中的差异而导致的甲基化中的等位基因差异。术语“甲基化状态”、“甲基化谱”和“甲基化状况”还指遍及生物样品中核酸的任何特定区的甲基化C或未甲基化C的相对、绝对浓度或模式。例如,如果使核酸序列内的胞嘧啶(C)残基甲基化,则其可称为“高甲基化”或具有“增加的甲基化”,而如果DNA序列内的胞嘧啶(C)残基未甲基化,则其可称为“低甲基化”或具有“降低的甲基化”。同样,如果核酸序列内的胞嘧啶(C)残基与另一个核酸序列(例如,来自不同的区或来自不同的个体等)相比甲基化,则该序列被视为与另一个核酸序列相比高甲基化或具有增加的甲基化。或者,如果DNA序列内的胞嘧啶(C)残基与另一个核酸序列(例如,来自不同的区或来自不同的个体等)相比未甲基化,则该序列被视为与另一个核酸序列相比低甲基化或具有降低的甲基化。另外,如本文所用的术语“甲基化模式”是指在核酸的某个区上甲基化和未甲基化核苷酸的集***点。两个核苷酸可具有相同的或相似的甲基化频率或甲基化百分比,但当甲基化和未甲基化核苷酸的数量在整个区中相同或相似但甲基化和未甲基化核苷酸的位置不同时具有不同的甲基化模式。当序列在甲基化的程度(例如,一个相对于另一个具有增加或降低的甲基化)、频率或模式不同时,将 所述序列称为“差异甲基化的”或称为具有“甲基化差异”或具有“不同的甲基化状态”。术语“差异甲基化”是指在癌症阳性样品中的核酸甲基化水平或模式与在癌症阴性样品中的核酸甲基化水平或模式相比的差异。其还可以指在手术后癌症复发的患者与未复发的患者之间的水平或模式的差异。差异甲基化以及DNA甲基化的特定水平或模式是诊断和预测性生物标记物,例如,一旦定义正确的截止值或预测特性后。Thus, a methylation state describes the state of methylation of a nucleic acid (eg, a genomic sequence). Furthermore, methylation status refers to the property of a nucleic acid fragment associated with methylation at a particular genomic locus. Such properties include, but are not limited to: whether any cytosine (C) residues within the DNA sequence are methylated, the location of the methylated C residues, the frequency of methylated C throughout any particular region of the nucleic acid or percentage, and allelic differences in methylation due to, for example, differences in allelic origin. The terms "methylation status", "methylation profile" and "methylation status" also refer to the relative, absolute concentration or pattern of methylated C or unmethylated C throughout any particular region of nucleic acid in a biological sample . For example, cytosine (C) residues within a nucleic acid sequence may be referred to as "hypermethylated" or have "increased methylation" if they are methylated, whereas cytosine (C) residues within a DNA sequence If the group is not methylated, it can be called "hypomethylated" or has "reduced methylation". Likewise, a nucleic acid sequence is considered to be different from another nucleic acid sequence if cytosine (C) residues within the nucleic acid sequence are methylated compared to another nucleic acid sequence (e.g., from a different region or from a different individual, etc.). Compared to hypermethylation or having increased methylation. Alternatively, a DNA sequence is considered to be identical to another nucleic acid sequence if the cytosine (C) residues within the DNA sequence are not methylated compared to the other nucleic acid sequence (e.g., from a different region or from a different individual, etc.) The sequence is hypomethylated or has reduced methylation compared to the sequence. In addition, the term "methylation pattern" as used herein refers to the collective sites of methylated and unmethylated nucleotides on a certain region of a nucleic acid. Two nucleotides may have the same or similar methylation frequency or methylation percentage, but when the number of methylated and unmethylated nucleotides is the same or similar across the region but methylated and unmethylated Different positions of methylated nucleotides have different methylation patterns. Sequences are said to be "differentially methylated" or to have methylation differences" or have "different methylation status". The term "differential methylation" refers to a difference in the level or pattern of nucleic acid methylation in a cancer positive sample compared to the level or pattern of nucleic acid methylation in a cancer negative sample. It can also refer to a difference in the level or pattern between patients whose cancer recurred after surgery and those who did not. Differential methylation as well as specific levels or patterns of DNA methylation are diagnostic and predictive biomarkers, eg once the correct cut-off values or predictive properties are defined.
甲基化状态频率可用来描述一群个体或得自单一个体的样品。例如,具有50%的甲基化状态频率的核苷酸基因座是50%的实例甲基化而50%的实例未甲基化。此类频率可以用来例如描述一群个体或一批核酸中核苷酸基因座或核酸区甲基化的程度。因此,当在第一群体或集群的核酸分子中的甲基化不同于在第二群体或集群的核酸分子中的甲基化时,则第一群体或集群的甲基化状态的频率将不同于第二群体或集群的甲基化状态频率。此类频率也可以用来例如描述在单一个体中核苷酸基因座或核酸区甲基化的程度。例如,此类频率可以用来描述得自组织样品的一组细胞在核苷酸基因座或核酸区的甲基化或未甲基化程度。Methylation state frequencies can be used to describe a population of individuals or a sample from a single individual. For example, a nucleotide locus with a methylation state frequency of 50% is methylated in 50% of the instances and unmethylated in 50% of the instances. Such frequencies can be used, for example, to describe the degree of methylation of a nucleotide locus or nucleic acid region in a population of individuals or a collection of nucleic acids. Thus, when methylation in a first population or cluster of nucleic acid molecules differs from methylation in a second population or cluster of nucleic acid molecules, the frequencies of the methylation states of the first population or cluster will be different Frequency of methylation status in a second population or cluster. Such frequencies can also be used, for example, to describe the degree of methylation of a nucleotide locus or nucleic acid region in a single individual. For example, such frequencies can be used to describe the degree of methylation or unmethylation at a nucleotide locus or region of a nucleic acid in a group of cells obtained from a tissue sample.
如本文所用,给定的标记物的“灵敏性”是指所报告的DNA甲基化值高于区分恶性结节与良性结节样品的阈值的样品的百分比。在一些实施方案中,将阳性定义为所报告的DNA甲基化值高于阈值(例如,与疾病相关的范围)的经组织学确认的恶性结节,并将假阴性定义为所报告的DNA甲基化值低于阈值(例如,与未患疾病相关的范围)的经组织学确认的恶性结节。灵敏性的值因此反映得自已知患病样品的给定标记物的DNA甲基化测量值将在疾病相关测量值的范围内的概率。如这里所定义,所计算的灵敏性值的临床相关性反映给定标记物在应用于患有临床病症的受试者时将检出存在该病症的概率估计。As used herein, "sensitivity" of a given marker refers to the percentage of samples that report a DNA methylation value above the threshold for distinguishing malignant from benign nodule samples. In some embodiments, a positive is defined as a histologically confirmed malignant nodule with a reported DNA methylation value above a threshold (e.g., a range associated with disease), and a false negative is defined as a reported DNA methylation value Histologically confirmed malignant nodules with methylation values below a threshold (eg, a range associated with the absence of disease). The value of sensitivity thus reflects the probability that a DNA methylation measurement for a given marker from a known diseased sample will be within the range of disease-associated measurements. As defined herein, the clinical relevance of a calculated sensitivity value reflects an estimate of the probability that a given marker, when applied to a subject with a clinical condition, will detect the presence of that condition.
如本文所用,给定的标记物的“特异性”是指所报告的DNA甲基化值低于区分恶性结节与良性结节样品的阈值的良性结节样品的百分比。在一些实施方案中,将阴性定义为所报告的DNA甲基化值低于阈值(例如,与未患疾病相关的范围)的经组织学确认的良性结节样品,并将假阳性定义为所报告的DNA甲基化值高于阈值(例如,与疾病相关的范围)的经组织学确认的良性结节样品。特异性的值因此反映得自已知良性结节样品的给定标记物的DNA甲基化测量值将在与未患疾病相关的测量值的范围内的概率。如这里所定义,所计算的特异性值的临床相关性反映给定标记物在应用于未患临床病症的患者时将检出不存在该病症的概率估计。As used herein, "specificity" of a given marker refers to the percentage of benign nodule samples that report a DNA methylation value below the threshold for distinguishing malignant from benign nodule samples. In some embodiments, negatives are defined as histologically confirmed benign nodule samples with reported DNA methylation values below a threshold (e.g., a range associated with the absence of disease), and false positives are defined as all Histologically confirmed benign nodule samples with reported DNA methylation values above a threshold (eg, range associated with disease). The value of specificity thus reflects the probability that a DNA methylation measurement for a given marker from a sample of a known benign nodule will be within the range of measurements associated with the absence of disease. As defined herein, the clinical relevance of a calculated specificity value reflects an estimate of the probability that a given marker, when applied to a patient without a clinical condition, will detect the absence of that condition.
如本文所用的术语“AUC”是“曲线下面积”的缩写。具体地,其指接收者操作特征(ROC)曲线下面积。ROC曲线是对于诊断测试的不同可能的块割点而言真阳性比率相对于假阳性比率的曲线图。其表明根据所选的切割点在灵敏性与特异性之间的折衷(灵敏性的任何增加将伴随特异性的下降)。ROC曲线下面积(AUC)是诊断测试的准确度的度量(面积越大越好;最佳为1;随机测试将具有面积为0.5的处于对角线上的ROC曲线;参见:J.P.Egan.(1975)Signal DetectionT heory and ROC Analysis,Academic Press,NewYork)。The term "AUC" as used herein is an abbreviation for "Area Under the Curve". Specifically, it refers to the area under the receiver operating characteristic (ROC) curve. A ROC curve is a plot of the ratio of true positives versus the ratio of false positives for different possible block cut points of a diagnostic test. It shows a tradeoff between sensitivity and specificity depending on the cut point chosen (any increase in sensitivity will be accompanied by a decrease in specificity). The area under the ROC curve (AUC) is a measure of the accuracy of a diagnostic test (larger area is better; 1 is optimal; a random test will have a ROC curve with an area of 0.5 on the diagonal; see: J.P.Egan. (1975 ) Signal Detection Theory and ROC Analysis, Academic Press, NewYork).
如本文所用,“诊断”测试应用包括受试者疾病状态或病症的检测或鉴定、确定受试者将患给定疾病或病症的可能性、确定患有疾病或病症的受试者将对治疗有反应的可能性、确定患有疾病或病症的受试者的预后(或其可能的进展或消退)以及确定治疗对患有疾病或病症的受试者的效果。例如,诊断可用于检测受试者患恶性结节的存在性或可能性或此类受试者将有利地对化合物(例如药物,例如药品)或其它治疗起反应的可能性。As used herein, "diagnostic" test applications include the detection or identification of a disease state or condition in a subject, determining the likelihood that a subject will develop a given disease or condition, determining that a subject with a disease or condition will respond to treatment The likelihood of responding, determining the prognosis (or possible progression or regression thereof) of a subject with a disease or condition, and determining the effect of treatment on a subject with a disease or condition. For example, a diagnosis can be used to detect the presence or likelihood that a subject will have a malignant nodule or that such a subject will respond favorably to a compound (eg, a drug, eg, a drug) or other treatment.
在本文中,“诊断”还意指确定疾病的类型,例如但不限于乳腺癌的分子分型。Herein, "diagnosing" also means determining the type of disease, such as but not limited to molecular typing of breast cancer.
如本文所用的术语“标记物”、“生物标记物”或“分子标记物”是指能够通过区分癌细胞与正常细胞(例如,基于其甲基化状态)而诊断癌症的物质(例如,核酸或核酸区域)。As used herein, the term "marker," "biomarker," or "molecular marker" refers to a substance (e.g., a nucleic acid) capable of diagnosing cancer by distinguishing cancer cells from normal cells (e.g., based on their methylation or nucleic acid regions).
实施例与测试例Example and test case
本公开下列实施例与测试例中未注明具体条件的实验方法,通常按照常规条件,例如Sambrook等人,分子克隆:实验室手册(New York:Cold Spring Harbor Laboratory Press,1989)中所述的条件,或按照制造厂商所建议的条件。实施例中所用到的各种常用化学试剂,均为市售产品。The experimental method of specific conditions not indicated in the following examples and test examples of the present disclosure is usually according to conventional conditions, such as those described in Sambrook et al., Molecular Cloning: Laboratory Handbook (New York: Cold Spring Harbor Laboratory Press, 1989) conditions, or as recommended by the manufacturer. Various commonly used chemical reagents used in the examples are all commercially available products.
简要地,本公开筛选了用于预测结乳腺结节良、恶性以及分子分型的差异性甲基化区域(Differentially Methylated Region,DMR),具体包括以下步骤:Briefly, the present disclosure screens differentially methylated regions (Differentially Methylated Regions, DMRs) for predicting benign, malignant and molecular typing of breast nodules, specifically including the following steps:
步骤1:通过TruSeq Methy Capture EPIC建库试剂盒,用338例乳腺组织样本(55例良性样本和283例恶性样本)混合成71个乳腺组织样本池,其中包括11个良性样本池,24个ER +/PR/Her2 -型恶性样本池(1个QC失败),25个ER/PR/Her2 +型恶性样本池、和10个ER -/PR -/Her2 -型恶性样本池,构建了AnchorDx乳腺癌特异性的甲基化数据库; Step 1: Using the TruSeq Methy Capture EPIC library construction kit, 338 breast tissue samples (55 benign samples and 283 malignant samples) were used to mix 71 breast tissue sample pools, including 11 benign sample pools and 24 ER pools + /PR/Her2 - type malignant sample pool (1 QC failed), 25 ER/PR/Her2 + type malignant sample pools, and 10 ER - /PR - /Her2 - type malignant sample pools, constructed AnchorDx breast Cancer-specific methylation databases;
步骤2:从步骤1中构建的AnchorDx乳腺癌特异性的甲基化数据库中初步筛选出乳腺癌特异性的差异性甲基化区域,其包含13,676个较大范围的甲基化区域,13,676个较大范围的甲基化区域中包含129,794个含3个连续CpG的(3-CpG)差异性甲基化区域的marker;并合成包含该129,794个3-CpG差异性甲基化区域的marker的乳腺癌特异性的甲基化检测面板(panel);Step 2: Preliminary screening of breast cancer-specific differentially methylated regions from the AnchorDx breast cancer-specific methylation database constructed in step 1, which contains 13,676 large-scale methylated regions, 13,676 A larger range of methylated regions contains 129,794 markers containing 3 consecutive CpG (3-CpG) differentially methylated regions; and synthesizes markers containing the 129,794 3-CpG differentially methylated regions Breast cancer specific methylation detection panel (panel);
步骤3:根据112对配对的乳腺癌组织及其血浆样本,基于步骤2中的乳腺癌特异性的甲基化检测面板,在delta值>0.05且FDR<0.01的过滤条件下,进一步筛选出来源于组织的乳腺癌差异性甲基化区域,并通过其中40例乳腺癌样本配对的白细胞样本对白细胞中特异性甲基化信号进行过滤,最后筛选到35,814个乳腺癌特异的3-CpG差异性甲基化区域的marker。Step 3: According to 112 pairs of paired breast cancer tissues and their plasma samples, based on the breast cancer-specific methylation detection panel in step 2, under the filter conditions of delta value>0.05 and FDR<0.01, further screen the source The specific methylation signals in leukocytes were filtered through the paired leukocyte samples of 40 breast cancer samples, and finally 35,814 breast cancer-specific 3-CpG differences were screened A marker for methylated regions.
步骤4:分别基于乳腺组织和血浆样本,并基于步骤3所获得的35,814个乳腺癌特异的3-CpG差异性甲基化区域的marker,使用Random Forest筛选并构建能鉴别乳腺良恶性结节的甲基化生物标记物和模型,最终筛选涉及marker组合1-7以及marker组合10-12的差异性甲基化区域的模型;Step 4: Based on breast tissue and plasma samples respectively, and based on the markers of 35,814 breast cancer-specific 3-CpG differentially methylated regions obtained in step 3, use Random Forest to screen and construct a marker that can identify benign and malignant breast nodules Methylation biomarkers and models, finally screening models involving differentially methylated regions of marker combinations 1-7 and marker combinations 10-12;
步骤5:分别基于乳腺组织和血浆样本,并基于步骤3所获得的35,814个乳腺癌特异的3-CpG差异性甲基化区域的marker,使用Random Forest筛选并构建对恶性乳腺结节能进行分子分型的甲基化生物标记物和模型,最终筛选涉及marker组合8-9的差异性甲基化区域的模型。Step 5: Based on breast tissue and plasma samples respectively, and based on the markers of 35,814 breast cancer-specific 3-CpG differentially methylated regions obtained in step 3, use Random Forest to screen and construct molecules that can detect malignant breast nodules Typed methylation biomarkers and models, and finally screen models involving differentially methylated regions of marker combinations 8-9.
具体地,在一些方面,本公开提供了一种用于乳腺癌诊断的甲基化生物标记物,其中,所述的甲基化生物标记物包括表1中所列举的差异性甲基化区域DMR-1~DMR-521中的任一项或其任意组合。Specifically, in some aspects, the present disclosure provides a methylation biomarker for the diagnosis of breast cancer, wherein the methylation biomarker includes the differentially methylated regions listed in Table 1 Any one of DMR-1 to DMR-521 or any combination thereof.
在一些实施方案中,所述的甲基化生物标记物包括以下中的任一项:DMR-1~DMR-521中至少3个、5个、10个、20个、30个、40个、50个、60个、70个、80个、90个、100个、110个、120个、130个、140个、150个、160个、170个、180个、190个、200个、210个、220个或更多个差异性甲基化区域。In some embodiments, the methylated biomarkers include any one of the following: at least 3, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210 , 220 or more differentially methylated regions.
在一些优选的实施方案中,所述的甲基化生物标记物包括以下(a)-(p)中任一项:In some preferred embodiments, the methylated biomarkers include any of the following (a)-(p):
(a)marker组合1中的差异性甲基化区域;(b)marker组合2中的差异性甲基化区域;(c)marker组合3中的差异性甲基化区域;(d)marker组合4中的差异性甲基化区域;(e)marker组合5中的差异性甲基化区域;(f)marker组合6中的差异性甲基化区域;(g)marker组合10中的差异性甲基化区域;(h)marker组合7中的差异性甲基化区域;(i)marker组合11中的差异性甲基化区域;(j)marker组合12中的差异性甲基化区域;(k)marker组合13中的差异性甲基化区域;(l)marker组合14中的差异性甲基化区域;(m)marker组合15中的差异性甲基化区域;(n)marker组合16中的差异性甲基化区域;(o)marker组合9中的差异性甲基化区域;(p)marker组合8中的差异性甲基化区域。(a) Differentially methylated region in marker combination 1; (b) Differentially methylated region in marker combination 2; (c) Differentially methylated region in marker combination 3; (d) Marker combination Differentially methylated regions in 4; (e) differentially methylated regions in marker group 5; (f) differentially methylated regions in marker group 6; (g) differentially methylated regions in marker group 10 Methylated region; (h) differentially methylated region in marker combination 7; (i) differentially methylated region in marker combination 11; (j) differentially methylated region in marker combination 12; (k) differentially methylated region in marker combination 13; (l) differentially methylated region in marker combination 14; (m) differentially methylated region in marker combination 15; (n) marker combination Differentially methylated regions in 16; (o) differentially methylated regions in marker group 9; (p) differentially methylated regions in marker group 8.
在下述表1中,根据2009年2月human genome assembly GRCh37/hg19(参见例如Rosenbloom等 (2012)“ENCODE whole-genome data in the UCSC Genome Browser:update 2012”Nucleic Acids Research40:D912-D917)对碱基编号。In Table 1 below, bases are compared according to the February 2009 human genome assembly GRCh37/hg19 (see for example Rosenbloom et al. (2012) "ENCODE whole-genome data in the UCSC Genome Browser: update 2012" Nucleic Acids Research 40: D912-D917) base number.
需要说明的是,表1中模型(model)列中所标记的例如marker组合1、marker组合2、marker组合3等,表示该行的DMR所属的分子标记物组合的种类,例如表1中的DMR-1同时属于marker组合5、marker组合6、marker组合7、marker组合10和组合13。It should be noted that the marks in the model (model) column in Table 1, such as marker combination 1, marker combination 2, marker combination 3, etc., indicate the type of molecular marker combination to which the DMR of this row belongs, such as in Table 1 DMR-1 belongs to marker group 5, marker group 6, marker group 7, marker group 10 and group 13 at the same time.
表1.乳腺癌特异的差异性甲基化区域DMR-1~DMR-521位置信息Table 1. Location information of breast cancer-specific differentially methylated regions DMR-1 to DMR-521
Figure PCTCN2022129181-appb-000001
Figure PCTCN2022129181-appb-000001
Figure PCTCN2022129181-appb-000002
Figure PCTCN2022129181-appb-000002
Figure PCTCN2022129181-appb-000003
Figure PCTCN2022129181-appb-000003
Figure PCTCN2022129181-appb-000004
Figure PCTCN2022129181-appb-000004
Figure PCTCN2022129181-appb-000005
Figure PCTCN2022129181-appb-000005
Figure PCTCN2022129181-appb-000006
Figure PCTCN2022129181-appb-000006
Figure PCTCN2022129181-appb-000007
Figure PCTCN2022129181-appb-000007
Figure PCTCN2022129181-appb-000008
Figure PCTCN2022129181-appb-000008
Figure PCTCN2022129181-appb-000009
Figure PCTCN2022129181-appb-000009
Figure PCTCN2022129181-appb-000010
Figure PCTCN2022129181-appb-000010
Figure PCTCN2022129181-appb-000011
Figure PCTCN2022129181-appb-000011
Figure PCTCN2022129181-appb-000012
Figure PCTCN2022129181-appb-000012
Figure PCTCN2022129181-appb-000013
Figure PCTCN2022129181-appb-000013
Figure PCTCN2022129181-appb-000014
Figure PCTCN2022129181-appb-000014
Figure PCTCN2022129181-appb-000015
Figure PCTCN2022129181-appb-000015
Figure PCTCN2022129181-appb-000016
Figure PCTCN2022129181-appb-000016
Figure PCTCN2022129181-appb-000017
Figure PCTCN2022129181-appb-000017
Figure PCTCN2022129181-appb-000018
Figure PCTCN2022129181-appb-000018
Figure PCTCN2022129181-appb-000019
Figure PCTCN2022129181-appb-000019
Figure PCTCN2022129181-appb-000020
Figure PCTCN2022129181-appb-000020
Figure PCTCN2022129181-appb-000021
Figure PCTCN2022129181-appb-000021
Figure PCTCN2022129181-appb-000022
Figure PCTCN2022129181-appb-000022
Figure PCTCN2022129181-appb-000023
Figure PCTCN2022129181-appb-000023
Figure PCTCN2022129181-appb-000024
Figure PCTCN2022129181-appb-000024
Figure PCTCN2022129181-appb-000025
Figure PCTCN2022129181-appb-000025
Figure PCTCN2022129181-appb-000026
Figure PCTCN2022129181-appb-000026
Figure PCTCN2022129181-appb-000027
Figure PCTCN2022129181-appb-000027
Figure PCTCN2022129181-appb-000028
Figure PCTCN2022129181-appb-000028
Figure PCTCN2022129181-appb-000029
Figure PCTCN2022129181-appb-000029
Figure PCTCN2022129181-appb-000030
Figure PCTCN2022129181-appb-000030
表1中,基于乳腺组织样本和/或血浆样本检测乳腺结节良、恶性的组合包括marker组合1、marker组合2、marker组合3、marker组合4、marker组合5、marker组合6、marker组合7、marker组合10、marker组合11、marker组合12、marker组合13、marker组合14、marker组合15和marker组合16;基于乳腺组织鉴别乳腺恶性结节亚型的组合:marker组合8;基于乳腺组织和/或血浆鉴别乳腺恶性结节亚型的组合:marker组合9。In Table 1, the combinations for detecting benign and malignant breast nodules based on breast tissue samples and/or plasma samples include marker combination 1, marker combination 2, marker combination 3, marker combination 4, marker combination 5, marker combination 6, and marker combination 7 , marker combination 10, marker combination 11, marker combination 12, marker combination 13, marker combination 14, marker combination 15, and marker combination 16; combination for identifying subtypes of breast malignant nodules based on breast tissue: marker combination 8; based on breast tissue and /or the combination of plasma to identify subtypes of malignant breast nodules: marker combination 9.
具体地,参见表1:Specifically, see Table 1:
所述marker组合1中的差异性甲基化区域包括:DMR-27、DMR-67和DMR-72。The differentially methylated regions in the marker combination 1 include: DMR-27, DMR-67 and DMR-72.
所述marker组合2中的差异性甲基化区域包括:DMR-25、DMR-27、DMR-67、DMR-72和DMR-79。The differentially methylated regions in the marker combination 2 include: DMR-25, DMR-27, DMR-67, DMR-72 and DMR-79.
所述marker组合3中的差异性甲基化区域包括:DMR-13、DMR-15、DMR-25、DMR-27、DMR-33、DMR-66、DMR-67、DMR-72、DMR-73和DMR-79。The differentially methylated regions in the marker combination 3 include: DMR-13, DMR-15, DMR-25, DMR-27, DMR-33, DMR-66, DMR-67, DMR-72, DMR-73 and DMR-79.
所述marker组合4中的差异性甲基化区域包括:DMR-2、DMR-7、DMR-11、DMR-13、DMR-15、DMR-17、DMR-18、DMR-19、DMR-20、DMR-22、DMR-25、DMR-26、DMR-27、DMR-33、DMR-44、DMR-66、DMR-67、DMR-72、DMR-73和DMR-79。The differentially methylated regions in the marker combination 4 include: DMR-2, DMR-7, DMR-11, DMR-13, DMR-15, DMR-17, DMR-18, DMR-19, DMR-20 , DMR-22, DMR-25, DMR-26, DMR-27, DMR-33, DMR-44, DMR-66, DMR-67, DMR-72, DMR-73 and DMR-79.
所述marker组合5中的差异性甲基化区域包括:DMR-1、DMR-2、DMR-4、DMR-6、DMR-7、DMR-9、DMR-10、DMR-11、DMR-12、DMR-13、DMR-14、DMR-15、DMR-17、DMR-18、DMR-19、DMR-20、DMR-22、DMR-25、DMR-26、DMR-27、DMR-28、DMR-30、DMR-31、DMR-32、DMR-33、DMR-36、DMR-37、DMR-41、DMR-43、DMR-44、DMR-45、DMR-46、DMR-47、DMR-48、DMR-49、DMR-50、DMR-51、DMR-52、DMR-53、DMR-54、DMR-57、DMR-64、DMR-65、DMR-66、DMR-67、DMR-72、DMR-73、DMR-77、DMR-79和DMR-80。The differentially methylated regions in the marker combination 5 include: DMR-1, DMR-2, DMR-4, DMR-6, DMR-7, DMR-9, DMR-10, DMR-11, DMR-12 , DMR-13, DMR-14, DMR-15, DMR-17, DMR-18, DMR-19, DMR-20, DMR-22, DMR-25, DMR-26, DMR-27, DMR-28, DMR -30, DMR-31, DMR-32, DMR-33, DMR-36, DMR-37, DMR-41, DMR-43, DMR-44, DMR-45, DMR-46, DMR-47, DMR-48 , DMR-49, DMR-50, DMR-51, DMR-52, DMR-53, DMR-54, DMR-57, DMR-64, DMR-65, DMR-66, DMR-67, DMR-72, DMR -73, DMR-77, DMR-79 and DMR-80.
所述marker组合6中的差异性甲基化区域包括:DMR-1~DMR-80。The differentially methylated regions in the marker combination 6 include: DMR-1 to DMR-80.
所述marker组合7中的差异性甲基化区域包括:DMR-1~DMR-14、DMR-16~DMR-21、DMR-23~DMR-27、DMR-29~DMR-34、DMR-36~DMR-38、DMR-40~DMR-48、DMR-50~DMR-54、 DMR-57~DMR-68、DMR-70、DMR-72~DMR-78、DMR-80~DMR-103。The differentially methylated regions in the marker combination 7 include: DMR-1~DMR-14, DMR-16~DMR-21, DMR-23~DMR-27, DMR-29~DMR-34, DMR-36 ~DMR-38, DMR-40~DMR-48, DMR-50~DMR-54, DMR-57~DMR-68, DMR-70, DMR-72~DMR-78, DMR-80~DMR-103.
所述marker组合8中的差异性甲基化区域包括:DMR-104~DMR-189。The differentially methylated regions in the marker combination 8 include: DMR-104-DMR-189.
所述marker组合9中的差异性甲基化区域包括:DMR-18、DMR-190~DMR-249。The differentially methylated regions in the marker combination 9 include: DMR-18, DMR-190-DMR-249.
所述marker组合10中的差异性甲基化区域包括:DMR-1~DMR-103。The differentially methylated regions in the marker combination 10 include: DMR-1 to DMR-103.
所述marker组合11中的差异性甲基化区域包括:DMR-3、DMR-4、DMR-9、DMR-10、DMR-12、DMR-14、DMR-15、DMR-16、DMR-21、DMR-22、DMR-24、DMR-26、DMR-27、DMR-29、DMR-30、DMR-31、DMR-34、DMR-38、DMR-39、DMR-40、DMR-50、DMR-52、DMR-53、DMR-54、DMR-55、DMR-58、DMR-59、DMR-65、DMR-66、DMR-67、DMR-70、DMR-72、DMR-73、DMR-74、DMR-76、DMR-77、DMR-79、DMR-81、DMR-82、DMR-85、DMR-94、DMR-97、DMR-99、DMR-101、DMR-102。The differentially methylated regions in the marker combination 11 include: DMR-3, DMR-4, DMR-9, DMR-10, DMR-12, DMR-14, DMR-15, DMR-16, DMR-21 , DMR-22, DMR-24, DMR-26, DMR-27, DMR-29, DMR-30, DMR-31, DMR-34, DMR-38, DMR-39, DMR-40, DMR-50, DMR -52, DMR-53, DMR-54, DMR-55, DMR-58, DMR-59, DMR-65, DMR-66, DMR-67, DMR-70, DMR-72, DMR-73, DMR-74 , DMR-76, DMR-77, DMR-79, DMR-81, DMR-82, DMR-85, DMR-94, DMR-97, DMR-99, DMR-101, DMR-102.
所述marker组合12中的差异性甲基化区域包括:DMR-3、DMR-4、DMR-9、DMR-10、DMR-12、DMR-14、DMR-15、DMR-16、DMR-21、DMR-22、DMR-24、DMR-26、DMR-27、DMR-28、DMR-29、DMR-30、DMR-31、DMR-34、DMR-35、DMR-38、DMR-39、DMR-40、DMR-50、DMR-52、DMR-53、DMR-54、DMR-55、DMR-56、DMR-58、DMR-59、DMR-62、DMR-65、DMR-66、DMR-67、DMR-70、DMR-71、DMR-72、DMR-73、DMR-74、DMR-76、DMR-77、DMR-79、DMR-81、DMR-82、DMR-85、DMR-88、DMR-90、DMR-94、DMR-97、DMR-99、DMR-101、DMR-102。The differentially methylated regions in the marker combination 12 include: DMR-3, DMR-4, DMR-9, DMR-10, DMR-12, DMR-14, DMR-15, DMR-16, DMR-21 , DMR-22, DMR-24, DMR-26, DMR-27, DMR-28, DMR-29, DMR-30, DMR-31, DMR-34, DMR-35, DMR-38, DMR-39, DMR -40, DMR-50, DMR-52, DMR-53, DMR-54, DMR-55, DMR-56, DMR-58, DMR-59, DMR-62, DMR-65, DMR-66, DMR-67 , DMR-70, DMR-71, DMR-72, DMR-73, DMR-74, DMR-76, DMR-77, DMR-79, DMR-81, DMR-82, DMR-85, DMR-88, DMR -90, DMR-94, DMR-97, DMR-99, DMR-101, DMR-102.
所述marker组合13中的差异性甲基化区域包括:DMR-1~DMR-103、DMR-250~DMR-366;The differentially methylated regions in the marker combination 13 include: DMR-1~DMR-103, DMR-250~DMR-366;
所述marker组合14中的差异性甲基化区域包括:DMR-23、DMR-45、DMR-58、DMR-60、DMR-77、DMR-90、DMR-96、DMR-266、DMR-283、DMR-284、DMR-321、DMR-337、DMR-367~DMR-414;The differentially methylated regions in the marker combination 14 include: DMR-23, DMR-45, DMR-58, DMR-60, DMR-77, DMR-90, DMR-96, DMR-266, DMR-283 , DMR-284, DMR-321, DMR-337, DMR-367~DMR-414;
所述marker组合15中的差异性甲基化区域包括:DMR-15、DMR-16、DMR-39、DMR-56、DMR-62、DMR-66、DMR-250、DMR-258、DMR-259、DMR-260、DMR-265、DMR-266、DMR-268、DMR-273、DMR-279、DMR-283、DMR-285、DMR-288、DMR-302、DMR-321、DMR-324、DMR-327、DMR-335、DMR-337、DMR-343、DMR-349、DMR-355、DMR-357、DMR-362、DMR-380、DMR-381、DMR-384、DMR-390、DMR-400、DMR-402、DMR-406、DMR-408、DMR-409、DMR-414~DMR-499The differentially methylated regions in the marker combination 15 include: DMR-15, DMR-16, DMR-39, DMR-56, DMR-62, DMR-66, DMR-250, DMR-258, DMR-259 , DMR-260, DMR-265, DMR-266, DMR-268, DMR-273, DMR-279, DMR-283, DMR-285, DMR-288, DMR-302, DMR-321, DMR-324, DMR -327, DMR-335, DMR-337, DMR-343, DMR-349, DMR-355, DMR-357, DMR-362, DMR-380, DMR-381, DMR-384, DMR-390, DMR-400 , DMR-402, DMR-406, DMR-408, DMR-409, DMR-414~DMR-499
所述marker组合16中的差异性甲基化区域包括:DMR-15、DMR-55、DMR-67、DMR-101、DMR-250、DMR-265、DMR-273、DMR-279、DMR-288、DMR-324、DMR-327、DMR-341、DMR-343、DMR-349、DMR-355、DMR-357、DMR-362、DMR-408、DMR-420、DMR-421、DMR-426、DMR-428、DMR-429、DMR-430、DMR-432、DMR-433、DMR-435、DMR-438、DMR-443、DMR-446、DMR-447、DMR-448、DMR-449、DMR-452、DMR-453、DMR-454、DMR-455、DMR-458、DMR-466、DMR-467、DMR-469、DMR-480、DMR-483、DMR-491、DMR-495~DMR-498、DMR-500~DMR-521。The differentially methylated regions in the marker combination 16 include: DMR-15, DMR-55, DMR-67, DMR-101, DMR-250, DMR-265, DMR-273, DMR-279, DMR-288 , DMR-324, DMR-327, DMR-341, DMR-343, DMR-349, DMR-355, DMR-357, DMR-362, DMR-408, DMR-420, DMR-421, DMR-426, DMR -428, DMR-429, DMR-430, DMR-432, DMR-433, DMR-435, DMR-438, DMR-443, DMR-446, DMR-447, DMR-448, DMR-449, DMR-452 , DMR-453, DMR-454, DMR-455, DMR-458, DMR-466, DMR-467, DMR-469, DMR-480, DMR-483, DMR-491, DMR-495~DMR-498, DMR -500 to DMR-521.
实施例:实验方法Embodiment: experimental method
1、血浆cfDNA/组织gDNA提取1. Plasma cfDNA/tissue gDNA extraction
血浆cfDNA提取具体操作步骤按照Life公司的MagMax TM Cell-Free DNA Isolation Kit操作说明书进行。组织gDNA的提取步骤按照QIAGEN公司的DNeasy Blood&Tissue Kit操作说明进行。 The specific operation steps of plasma cfDNA extraction were carried out according to the instruction manual of MagMax TM Cell-Free DNA Isolation Kit of Life Company. The extraction steps of tissue gDNA were carried out according to the operating instructions of DNeasy Blood & Tissue Kit of QIAGEN Company.
2、对提取的DNA进行亚硫酸盐转化2. Sulphite conversion of the extracted DNA
将提取的cfDNA(10ng)或组织gDNA(50ng)进行亚硫酸氢盐转化,使DNA中未发生甲基化的胞嘧啶脱氨基转变成尿嘧啶,而甲基化的胞嘧啶保持不变,得到亚硫酸氢盐转化后的DNA,转化具体操作按照Zymo Research的EZ DNA Methylation-Lightning Kit说明书进行。The extracted cfDNA (10ng) or tissue gDNA (50ng) was subjected to bisulfite conversion to deaminate the unmethylated cytosine in the DNA and convert it to uracil, while the methylated cytosine remained unchanged to obtain For the DNA after bisulfite conversion, the specific operation of the conversion is carried out according to the instructions of the EZ DNA Methylation-Lightning Kit of Zymo Research.
3、末端修复3. End repair
将转化后的样本加入以下(表2)试剂进行反应;The converted sample was added to the following (Table 2) reagents for reaction;
表2.末端修复体系Table 2. End repair system
组分components 体积(μl)Volume (μl)
转化后样本Transformed samples 1717
MEB1缓冲液(Buffer)MEB1 buffer (Buffer) 22
MEE2酶(Enzyme)MEE2 enzyme (Enzyme) 11
总体积 total capacity 2020
上述反应体系置于PCR仪中按照以下(表3)程序进行反应:Above-mentioned reaction system is placed in PCR machine and reacts according to following (table 3) procedure:
表3.末端修复反应程序Table 3. End Repair Reaction Procedures
第一步37℃37°C for the first step 30min30min
第二步95℃2nd step 95°C 5min5min
热盖hot cover 105℃105°C
当PCR反应第二步(95℃)达到5min时,立即将样本从PCR仪中取出,直接***冰中,放置2min以上再进行下一步操作When the second step of the PCR reaction (95°C) reaches 5 minutes, immediately take the sample out of the PCR instrument, insert it directly into the ice, and place it for more than 2 minutes before proceeding to the next step
4、连接I4. Connect I
配制如下(表4)反应液:Prepare the reaction solution as follows (Table 4):
表4.连接I反应体系Table 4. Connection I reaction system
组分components 单个用量(μl)Single dosage (μl)
上一步反应产物previous reaction product 2020
H 2O H 2 O 44
MLB1缓冲液MLB1 buffer 88
MLR1试剂 MLR1 Reagent 22
MLR5试剂 MLR5 Reagent 22
MLE1酶 MLE1 enzyme 22
MLE5酶 MLE5 enzyme 22
反应混合物(Mix)体积Reaction mixture (Mix) volume 4040
将上述发音体系置于PCR仪中按照以下(表5)程序进行反应:The above-mentioned pronunciation system is placed in the PCR instrument and reacts according to the following (table 5) program:
表5.连接I反应程序Table 5. Ligation I reaction program
37℃37°C 30min30min
95℃95°C 5min5min
热盖hot cover 105℃105°C
5、扩增I5. Amplification I
配制如下(表6)反应液:Prepare the reaction solution as follows (Table 6):
表6.扩增I反应体系Table 6. Amplification I reaction system
组分components 单个用量(μl)Single dosage (μl)
上一步反应产物previous reaction product 4040
H 2O H 2 O 3535
MAB2缓冲液 MAB2 buffer 2020
MAR1试剂 MAR1 reagent 22
MAR2试剂 MAR2 reagent 22
MAE3酶 MAE3 enzyme 11
反应混合物体积volume of reaction mixture 4040
将上述反应体系置于PCR仪中按照以下(表7)程序进行反应:The above reaction system was placed in a PCR instrument to react according to the following (Table 7) program:
表7.扩增I反应程序Table 7. Amplification I reaction program
Figure PCTCN2022129181-appb-000031
Figure PCTCN2022129181-appb-000031
6、纯化I6. Purification I
加入166μl 1:6倍稀释的Agencourt AMPure Beads(需提前于室温平衡半小时)的对Amplification I反应后的产物进行纯化,用21μl EB进行洗脱,纯化具体步骤如下:1)取上一步反应产物并进行离心,每个样本加入166μl 1:6倍稀释的Agencourt AMPure Beads,用移液器吹打混匀;2)室温孵育5min;3)离心,置于磁力架上静置5min;4)吸去上清;5)加入200μl 80%乙醇(EtOH),静置30s,吸走乙醇;6)重复步骤5一次;7)离心,将PCR管置于磁力架上,吸走剩余乙醇;8)开盖干燥磁珠2-3min;9)加入21μl EB进行洗脱,用移液器充分吹打混匀,室温静置3min;8)离心,将PCR管置于磁力架上,静置3min;10)吸取20μl上清于新的PCR管中。Add 166 μl 1:6-fold diluted Agencourt AMPure Beads (need to be equilibrated at room temperature for half an hour in advance) to purify the product after Amplification I reaction, and use 21 μl EB for elution. The specific purification steps are as follows: 1) Take the reaction product of the previous step And centrifuged, add 166μl 1:6 times diluted Agencourt AMPure Beads to each sample, blow and mix with a pipette; 2) Incubate at room temperature for 5min; 3) Centrifuge, place on a magnetic stand for 5min; 4) Suck off Supernatant; 5) Add 200 μl 80% ethanol (EtOH), let it stand for 30s, and absorb the ethanol; 6) Repeat step 5 once; 7) Centrifuge, place the PCR tube on a magnetic stand, and absorb the remaining ethanol; 8) Open Cover and dry the magnetic beads for 2-3 minutes; 9) Add 21 μl EB for elution, blow and mix well with a pipette, and let stand at room temperature for 3 minutes; 8) Centrifuge, place the PCR tube on a magnetic stand, and let stand for 3 minutes; 10) Pipette 20 μl of supernatant into a new PCR tube.
7、连接II7. Connection II
配制如下(表8)反应液:Prepare the reaction solution as follows (Table 8):
表8.连接II反应体系Table 8. Connection II reaction system
组分components 体积(μl)Volume (μl)
上一步反应体积Reaction volume of previous step 2020
H 2O H 2 O 44
MSB1缓冲液MSB1 buffer 88
MSR1试剂 MSR1 Reagent 22
MSR5试剂 MSR5 Reagent 22
MSE1酶 MSE1 enzyme 22
MSE5酶 MSE5 enzyme 22
总体积total capacity 4040
将上述反应体系置于PCR仪中按照以下(表9)程序进行反应:The above reaction system was placed in a PCR instrument to react according to the following (Table 9) program:
表9.连接II反应程序Table 9. Link II reaction program
温度temperature 时间time 循环数number of cycles
37℃37°C 30min30min 11
95℃95°C 5min5min 11
10℃10°C 保持Keep 11
8、Indexing PCR8. Indexing PCR
配制如下(表10)反应液:Prepare the reaction solution as follows (Table 10):
表10.Indexing PCR反应体系Table 10. Indexing PCR reaction system
Figure PCTCN2022129181-appb-000032
Figure PCTCN2022129181-appb-000032
将上述反应体系置于PCR仪中按照以下(表11)程序进行反应:The above reaction system was placed in a PCR instrument to react according to the following (Table 11) program:
表11.Indexing PCR反应程序Table 11. Indexing PCR reaction program
Figure PCTCN2022129181-appb-000033
Figure PCTCN2022129181-appb-000033
9、纯化II9. Purification II
加入Agencourt AMPure Beads(需提前于室温平衡半小时)对Indexing PCR反应后的产物进行纯化,用41μl EB进行洗脱,纯化具体步骤如下:1)取上一步反应产物并进行离心,每个样本加入71μl未稀释的Agencourt AMPure Beads,用移液器吹打混匀;2)室温孵育5min;3)置于磁力架上静置5min;4)吸去上清;5)加入200μl 80%EtOH,静置30s,吸走乙醇;6)重复步骤5一次;7)离心,将PCR管置于磁力架上,吸走剩余乙醇;8)开盖干燥磁珠2-3min,注意不要过干;9)加入41μl EB进行洗脱,用移液器充分吹打混匀,室温静置3min;10)离心,将PCR管置于磁力架上,静置3min;11)吸取20μl上清于新的PCR管中;12)Qubit定量:取1μl用Qubit dsDNA HS Assay Kit对文库进行定量。Add Agencourt AMPure Beads (need to be equilibrated at room temperature for half an hour in advance) to purify the product after Indexing PCR reaction, and use 41μl EB to elute. The specific steps of purification are as follows: 1) Take the reaction product of the previous step and centrifuge it. 71μl undiluted Agencourt AMPure Beads, pipette to mix; 2) Incubate at room temperature for 5min; 3) Place on the magnetic stand for 5min; 4) Remove the supernatant; 5) Add 200μl 80% EtOH, let stand 30s, absorb ethanol; 6) Repeat step 5 once; 7) Centrifuge, place the PCR tube on the magnetic stand, and absorb the remaining ethanol; 8) Open the cover and dry the magnetic beads for 2-3min, be careful not to dry them; 9) Add 41 μl of EB was eluted, fully blown and mixed with a pipette, and stood at room temperature for 3 minutes; 10) Centrifuged, placed the PCR tube on a magnetic stand, and stood for 3 minutes; 11) Pipetted 20 μl of the supernatant into a new PCR tube; 12) Qubit quantification: Take 1 μl to quantify the library with Qubit dsDNA HS Assay Kit.
10、杂交捕获10. Hybrid capture
通过对已建库后的样本进行寡核苷酸探针捕获富集,得到特定区域的上机终文库。杂交捕获试剂盒为IDT公司的xGen Lockdown Reagents,具体按照说明书进行操作。By capturing and enriching the samples with oligonucleotide probes after the library has been built, the final library of specific regions can be obtained. The hybridization capture kit is xGen Lockdown Reagents from IDT Company, and it is operated according to the instructions.
11、测序11. Sequencing
采用Illumina公司的测序仪对杂交捕获后的样本进行测序,得到测序结果。The sequencer of Illumina Company was used to sequence the samples captured by hybridization to obtain the sequencing results.
12、数据的分析:12. Data analysis:
对测序仪的下机原始数据,进行常规的生物信息学分析处理,先通过fastp过滤低质量(QC低,长度短、太多N等)的读长(reads),然后去除reads双端的adapter、共有序列、PolyA/T,得到理想的***片段序列(target区间),使用bismark将这些reads比对hg19对应的位置后,根据UMI对reads进行去重,得到每份样本被探针捕获得到的真实reads数据(bam file),对bam文件进行统计和分析,得到甲基化数据,用于后续的数据再分析。Perform conventional bioinformatics analysis and processing on the raw data of the sequencer, first filter the reads with low quality (low QC, short length, too many N, etc.) through fastp, and then remove the double-ended adapter, Consensus sequence, PolyA/T, to obtain the ideal insert sequence (target interval), use bismark to compare these reads to the corresponding positions of hg19, and then deduplicate the reads according to UMI, and obtain the real data captured by the probe for each sample reads data (bam file), perform statistics and analysis on the bam file, and obtain methylation data for subsequent data reanalysis.
测试例1:差异性甲基化区域与乳腺癌的相关性Test Example 1: Correlation between differentially methylated regions and breast cancer
本测试例研究本公开提供的分子标记物与乳腺癌的相关性,具体方法如下:This test example studies the correlation between the molecular markers provided in this disclosure and breast cancer, and the specific method is as follows:
对乳腺组织石蜡切片样本进行DNA提取,具体操作如实施例所述;Carry out DNA extraction to breast tissue paraffin section sample, specific operation is as described in embodiment;
提取的DNA,转化并建库,具体操作如实施例所述;The extracted DNA was transformed and built a library, and the specific operations were as described in the examples;
采用Illumina公司的测序仪对杂交捕获后的样本进行测序,得到测序结果,具体操作如实施例所述;The sequencer of Illumina Company was used to sequence the sample after hybridization and capture to obtain the sequencing result, and the specific operation was as described in the examples;
数据过滤和分析,具体操作如实施例所述。Data filtering and analysis, the specific operations are as described in the examples.
本测试例对112例乳腺结节组织样本中(56例良性样本,56例恶性样本)的差异性甲基化区域DMR-1-DMR-521进行了单区域甲基化水平检测。计算每个样本至少3个连续的CpG区域的甲基化程度β值(某样本在该甲基化区域内甲基化reads与total reads的比值,取值0-1间,其中至少连续3个CpG都存在甲基化的reads视为甲基化reads,否则为非甲基化reads,total reads为甲基化reads与非甲基化reads之和),根据wilcoxon test计算所得P值采用BH方法计算FDR(False Discovery Rate)值,依据过滤条件:甲基化差异程度delta值(两组甲基化β值之差值)>0.05且FDR<0.01,共获得35,814个marker。进一步通过Random Forest筛选并构建鉴别模型,获得521个DMR区域的marker。如图1所示,前述差异性甲基化区域,基于良恶性组织间具显著差异的甲基化水平,对良恶性乳腺结节均具有显著区分能力。In this test example, a single-region methylation level detection was performed on the differentially methylated regions DMR-1-DMR-521 in 112 breast nodule tissue samples (56 benign samples and 56 malignant samples). Calculate the methylation degree β value of at least 3 consecutive CpG regions in each sample (the ratio of methylated reads to total reads in the methylated region of a sample, the value is between 0 and 1, and at least 3 consecutive Reads with methylation in all CpGs are regarded as methylated reads, otherwise they are unmethylated reads, total reads are the sum of methylated reads and unmethylated reads), and the P value calculated according to the wilcoxon test adopts the BH method Calculate the FDR (False Discovery Rate) value, based on the filter condition: methylation difference degree delta value (the difference between two groups of methylation β values)>0.05 and FDR<0.01, and a total of 35,814 markers were obtained. After further screening and building a discrimination model through Random Forest, 521 markers of DMR regions were obtained. As shown in Figure 1, the aforementioned differentially methylated regions, based on the significantly different methylation levels between benign and malignant tissues, have a significant ability to distinguish benign and malignant breast nodules.
测试例2:不同分子标记物组合对乳腺组织样本进行乳腺结节良恶性检测的表现Test example 2: The performance of different molecular marker combinations in the detection of benign and malignant breast nodules in breast tissue samples
本测试例通过实施例的实验方法,对265例乳腺结节组织样本(56例良性样本,209例恶性样本;恶性样本中包含2例0期,52例I期,103例II期,以及52例III期)进行检测分析,具体的检测试验方法以及数据判断处理如实施例所述。In this test example, 265 cases of breast nodule tissue samples (56 cases of benign samples, 209 cases of malignant samples; 2 cases of stage 0, 52 cases of stage I, 103 cases of stage II, and 52 cases of malignant samples) were tested by the experimental method of the embodiment. Example Phase III) for detection and analysis, specific detection test methods and data judgment processing as described in the examples.
本测试例选取包含了不同特定分子标记物的组合,具体分子标记物组合有:marker组合1、marker组合2、marker组合3、marker组合4、marker组合5、marker组合6,marker组合7、marker组合10、marker组合11、marker组合12、marker组合13、marker组合14、marker组合15和marker组合16(如表1所示)利用随机森林(Random Forest)算法建立良恶性预测模型。This test example selects a combination of different specific molecular markers. The specific molecular marker combinations are: marker combination 1, marker combination 2, marker combination 3, marker combination 4, marker combination 5, marker combination 6, marker combination 7, marker combination Combination 10, marker combination 11, marker combination 12, marker combination 13, marker combination 14, marker combination 15 and marker combination 16 (as shown in Table 1) used the random forest (Random Forest) algorithm to establish benign and malignant prediction models.
不同分子标记物组合:marker组合1、marker组合2、marker组合3、marker组合4、marker组合5、marker组合6、marker组合7、marker组合10、marker组合11、marker组合12、marker组合13、marker组合14、marker组合15和marker组合16对于鉴别乳腺良恶性结节的AUC,特异性(SP),灵敏性(SE),准确率(ACC),阳性预测值(PPV)以及阴性预测值(NPV)分别如表12和图2所示。各marker组合的AUC都高于0.996,其中marker组合6和marker组合7的AUC达到1。前述几个marker组合,都表现出非常高的特异性(SP>0.98)和阳性预测值(PPV>0.992),准确率也都到达了100%。可见所述所选分子标记的组合对乳腺结节良恶性诊断具有非常好的检测能力。Different molecular marker combinations: marker combination 1, marker combination 2, marker combination 3, marker combination 4, marker combination 5, marker combination 6, marker combination 7, marker combination 10, marker combination 11, marker combination 12, marker combination 13, AUC, specificity (SP), sensitivity (SE), accuracy (ACC), positive predictive value (PPV) and negative predictive value ( NPV) are shown in Table 12 and Figure 2 respectively. The AUC of each marker combination is higher than 0.996, and the AUC of marker combination 6 and marker combination 7 reaches 1. The aforementioned marker combinations all showed very high specificity (SP>0.98) and positive predictive value (PPV>0.992), and the accuracy rate also reached 100%. It can be seen that the combination of the selected molecular markers has a very good detection ability for the diagnosis of benign and malignant breast nodules.
表12.不同分子标记物组合对乳腺组织样本进行乳腺结节良恶性检测的表现Table 12. The performance of different molecular marker combinations in the detection of benign and malignant breast nodules in breast tissue samples
模型Model AUCAUC Youden_SEYouden_SE Youden_SPYouden_SP PPVPPV NPV NPV
marker组合1marker combination 1 0.9960.996 0.9810.981 11 11 0.9850.985
marker组合2 marker combination 2 0.9980.998 0.9810.981 11 11 0.9850.985
marker组合3marker combination 3 0.9980.998 0.9810.981 11 11 0.9850.985
marker组合4 marker combination 4 0.9990.999 0.990.99 11 11 0.9960.996
marker组合5marker combination 5 0.9990.999 0.990.99 11 11 0.9960.996
marker组合6marker combination 6 11 11 11 11 11
marker组合7marker combination 7 11 0.990.99 11 11 0.9960.996
marker组合10 marker combination 10 0.9990.999 0.990.99 11 11 0.9960.996
marker组合11marker combination 11 0.9990.999 0.990.99 11 11 0.9960.996
marker组合12marker combination 12 0.9990.999 0.990.99 11 11 0.9960.996
marker组合13marker combination 13 0.9990.999 0.990.99 11 11 0.9660.966
marker组合14marker combination 14 0.9950.995 0.990.99 11 11 0.9660.966
marker组合15marker combination 15 0.9970.997 0.9520.952 11 11 0.8480.848
marker组合16marker combination 16 0.9920.992 0.9810.981 0.9640.964 0.990.99 0.9310.931
测试例3:不同分子标记物组合对血液样本进行乳腺结节良恶性检测的表现Test Example 3: The performance of different molecular marker combinations in the detection of benign and malignant breast nodules in blood samples
本测试例通过实施例的实验方法,对307例乳腺结节患者(153例良性样本和154例恶性样本;其中恶性样本包括8例0期,52例I期,67例II期,25例III期,和2例IV期)的血浆样本进行检测分析,具体的检测试验方法以及数据判断处理如实施例所述。统计每个样本3个连续CpG区域内的甲基化程度β值。This test example is by the experimental method of embodiment, to 307 routine breast nodules patients (153 routine benign samples and 154 routine malignant samples; Wherein malignant sample comprises 8 routine 0 stage, 52 routine I stage, 67 routine II stage, 25 routine III Phase, and 2 cases of plasma samples of Phase IV) were detected and analyzed, and the specific detection and test methods and data judgment and processing are as described in the examples. The methylation degree β value in 3 consecutive CpG regions of each sample was counted.
本测试例选取包含了不同特定分子标记物的组合,根据训练集在随机森林模型构建的参数MeanDecreaseGini(通过基尼(Gini)指数计算每个变量对分类树每个节点上观测值的异质性的影响,从而比较变量的重要性。该值越大表示该变量的重要性越大)按降序排序,从中选取top 3、top 5、top10、top 20、top 50、top 80、top 92、top103组成具体分子标记物组合,分别为marker组合1、marker组合2、marker组合3、marker组合4、marker组合5、marker组合6、marker组合7和marker组合10,以及marker组合11、marker组合12、marker组合13、marker组合14、marker组合15和marker组合16,如表1所述。设置样本数训练集与测试集=7:3,利用随机森林(Random Forest)算法建立良恶性预测模型,获取每个样本的恶性结节概率值(取值0-1之间),通过对恶性结节概率值的ROC曲线及AUC评估性能,AUC越高,认为区分性能越好。在约登指数最大条件下划定cutoff,测试样本恶性结节概率值大于等于cutoff值预测为恶性结节,否则预测为良性结节。This test example selects the combination that contains different specific molecular markers, according to the parameter MeanDecreaseGini (calculate the heterogeneity of each variable on each node of the classification tree through the Gini (Gini) index) constructed in the random forest model of the training set Influence, so as to compare the importance of variables. The larger the value, the greater the importance of the variable) Sort in descending order, from which top 3, top 5, top10, top 20, top 50, top 80, top 92, top103 are selected to form The specific molecular marker combinations are respectively marker combination 1, marker combination 2, marker combination 3, marker combination 4, marker combination 5, marker combination 6, marker combination 7 and marker combination 10, and marker combination 11, marker combination 12, marker combination Combination 13, marker combination 14, marker combination 15 and marker combination 16, as described in Table 1. Set the sample number training set and test set = 7:3, use the Random Forest (Random Forest) algorithm to establish a benign and malignant prediction model, and obtain the malignant nodule probability value of each sample (between 0 and 1), by ROC curve and AUC evaluation performance of nodule probability value, the higher the AUC, the better the discrimination performance. The cutoff is defined under the condition of the maximum Youden index, and the probability value of the test sample malignant nodule is greater than or equal to the cutoff value to predict a malignant nodule, otherwise it is predicted to be a benign nodule.
不同分子标记物组合marker组合1、marker组合2、marker组合3、marker组合4、marker组合5、marker组合6、marker组合7、marker组合10、marker组合11、marker组合12、marker组合13、marker组合14、marker组合15和marker组合16对于鉴别乳腺良恶性结节的AUC(图3),特异性(SP),灵敏性(SE),准确率(ACC),阳性预测值(PPV)以及阴性预测值(NPV)分别如表13所示。其中marker组合5、marker组合6、marker组合7、marker组合10和marker13的AUC都高于0.81,特异性/灵敏性分别为0.978/0.553、0.957/0.617、0.87/0702,0.804/0.766和0.848/0.660。相比较与其他marker组合,marker组合6具有更高性能表现,其AUC与更多marker数量的marker组合7接近,只相差0.001,与更多marker数量的marker组合10一致,并且高于其他marker数量少的组合;其PPV也能达到0.935,说明被 该marker组合预测为阳性样本的准确率更高。Different molecular marker combinations marker combination 1, marker combination 2, marker combination 3, marker combination 4, marker combination 5, marker combination 6, marker combination 7, marker combination 10, marker combination 11, marker combination 12, marker combination 13, marker Combination 14, marker combination 15 and marker combination 16 for AUC (Figure 3), specificity (SP), sensitivity (SE), accuracy (ACC), positive predictive value (PPV) and negative The predicted values (NPV) are shown in Table 13 respectively. Among them, the AUC of marker combination 5, marker combination 6, marker combination 7, marker combination 10 and marker 13 were all higher than 0.81, and the specificity/sensitivity were 0.978/0.553, 0.957/0.617, 0.87/0702, 0.804/0.766 and 0.848/ 0.660. Compared with other marker combinations, marker combination 6 has higher performance, and its AUC is close to that of marker combination 7 with more markers, with only a difference of 0.001, consistent with marker combination 10 with more markers, and higher than other markers There are few combinations; its PPV can also reach 0.935, indicating that the accuracy rate of positive samples predicted by this marker combination is higher.
可见所述所选分子标记的组合marker组合5、marker组合6、marker组合7、marker组合10和marker组合13都有较高的检测特异性(SP>0.80),能够更好的降低乳腺结节良恶性诊断的假阳性率;且marker组合6、marker组合7和marker组合10具有较高的准确率(ACC=0.785);因此分子marker组合6、marker组合7和marker组合10更适用于基于血浆样本进行乳腺结节良恶性的诊断。It can be seen that the combinations of the selected molecular markers marker combination 5, marker combination 6, marker combination 7, marker combination 10 and marker combination 13 all have high detection specificity (SP>0.80), which can better reduce breast nodules The false positive rate of benign and malignant diagnosis; and marker combination 6, marker combination 7 and marker combination 10 have a high accuracy rate (ACC=0.785); therefore molecular marker combination 6, marker combination 7 and marker combination 10 are more suitable for plasma-based The samples were used for the diagnosis of benign and malignant breast nodules.
表13.不同分子标记物组合对血液样本进行乳腺结节良恶性检测的表现Table 13. The performance of different molecular marker combinations in the detection of benign and malignant breast nodules in blood samples
模型Model AUCAUC SESE SPSP ACCACC PPVPPV NPV NPV
marker组合1marker combination 1 0.656(0.544-0.769)0.656(0.544-0.769) 0.82980.8298 0.50.5 0.6670.667 0.6290.629 0.7420.742
marker组合2 marker combination 2 0.721(0.617-0.825)0.721 (0.617-0.825) 0.7660.766 0.6090.609 0.6880.688 0.6670.667 0.7180.718
marker组合3marker combination 3 0.729(0.627-0.832)0.729 (0.627-0.832) 0.5530.553 0.8480.848 0.6990.699 0.7880.788 0.650.65
marker组合4 marker combination 4 0.792(0.702-0.882)0.792 (0.702-0.882) 0.8510.851 0.630.63 0.7420.742 0.7020.702 0.8060.806
marker组合5marker combination 5 0.823(0.739-0.907)0.823 (0.739-0.907) 0.5530.553 0.9780.978 0.7630.763 0.9630.963 0.6820.682
marker组合6marker combination 6 0.858(0.782-0.934)0.858(0.782-0.934) 0.6170.617 0.9570.957 0.7850.785 0.9350.935 0.710.71
marker组合7marker combination 7 0.859(0.786-0.931)0.859 (0.786-0.931) 0.7020.702 0.870.87 0.7850.785 0.8460.846 0.7410.741
marker组合10 marker combination 10 0.858(0.784-0.933)0.858(0.784-0.933) 0.7660.766 0.8040.804 0.7850.785 0.80.8 0.7710.771
marker组合11marker combination 11 0.743(0.642-0.844)0.743 (0.642-0.844) 0.770.77 0.630.63 0.70.7 0.680.68 0.730.73
marker组合12marker combination 12 0.754(0.655-0.853)0.754(0.655-0.853) 0.70.7 0.740.74 0.720.72 0.730.73 0.710.71
marker组合13marker combination 13 0.811(0.723-0.899)0.811 (0.723-0.899) 0.660.66 0.8480.848 0.7530.753 0.8160.816 0.7090.709
marker组合14marker combination 14 0.692(0.582-0.801)0.692 (0.582-0.801) 0.6810.681 0.6960.696 0.6880.688 0.6960.696 0.6810.681
marker组合15marker combination 15 0.661(0.549-0.773)0.661 (0.549-0.773) 0.7450.745 0.5430.543 0.6450.645 0.6250.625 0.6760.676
marker组合16marker combination 16 0.621(0.506-0.736)0.621 (0.506-0.736) 0.830.83 0.4130.413 0.6240.624 0.5910.591 0.7040.704
测试例4:分子标记物组合marker组合8对乳腺结节组织进行乳腺恶性结节亚型分类的表现Test Example 4: The performance of the molecular marker combination marker combination 8 in classifying breast malignant nodules subtypes on breast nodule tissue
本测试例通过实施例的实验方法,对265例乳腺结节组织样本(56例良性样本,209例恶性样本;其中恶性样本中有89例ER/PR/Her2 +型,88例ER +/PR/Her2 -型和32例ER -/PR -/Her2 -型)进行检测分析,具体的检测试验方法以及数据判断处理如实施例所述。 This test example adopts the experimental method of embodiment, to 265 routine breast nodule tissue samples (56 routine benign samples, 209 routine malignant samples; Wherein malignant sample has 89 routine ER/PR/Her2 + type, 88 routine ER + /PR /Her2 - type and 32 cases of ER- / PR- /Her2 - type) were detected and analyzed, and the specific detection test methods and data judgment and processing are as described in the examples.
本测试例利用随机森林(Random Forest)算法建立良恶性预测模型,并选取marker组合8对乳腺结节组织进行乳腺恶性结节亚型分类。In this test case, the random forest (Random Forest) algorithm was used to establish a benign and malignant prediction model, and the marker combination 8 was selected to classify breast malignant nodule subtypes.
表14.分子标记组合marker组合8对乳腺结节组织进行乳腺恶性结节亚型分类Table 14. Molecular marker combination marker combination 8 classifies breast malignant nodule subtypes on breast nodule tissue
亚型Subtype PPVPPV SESE
ER/PR/Her2 +ER/PR/Her2 + type 0.9250.925 0.9130.913
ER +/PR/Her2 -ER + /PR/Her2 - type 0.8720.872 0.9110.911
ER -/PR -/Her2 -ER - /PR - /Her2 -type 0.9390.939 0.90.9
如图4所示,该分子标记组合能够非常明显的将乳腺结节分为良性,ER/PR/Her2 +型、ER +/PR/Her2 -型和ER -/PR -/Her2 -型。如表14所示,该分子标记组合,在乳腺组织中对ER/PR/Her2 +型、ER +/PR/Her2 -型和ER -/PR -/Her2 -型乳腺结节的敏感性分别为0.913、0.911和0.9,均达到了0.9以上;且阳性预测值分别为0.925、0.872和0.939。该组合对不同分子分型的恶性结节有非常高的检测灵敏度,且分类准确率很高。 As shown in Figure 4, the combination of molecular markers can clearly classify breast nodules into benign, ER/PR/Her2 + , ER + /PR/ Her2- and ER- / PR- /Her2 - types. As shown in Table 14, the sensitivity of this molecular marker combination to ER/PR/Her2 + type, ER + /PR/Her2 - type and ER - /PR - /Her2 - type breast nodule in breast tissue is respectively 0.913, 0.911 and 0.9, all reached above 0.9; and the positive predictive values were 0.925, 0.872 and 0.939 respectively. This combination has very high detection sensitivity for malignant nodules of different molecular types, and the classification accuracy is very high.
测试例5:分子标记物组合marker组合9对乳腺组织和血液样本进行乳腺恶性结节亚型分类的表现Test Example 5: The performance of molecular marker combination marker combination 9 in breast malignant nodule subtype classification on breast tissue and blood samples
本测试例通过实施例的实验方法,对测试例2所述乳腺组织样本和测试例3所述血浆样本分别进行检测分析,具体的检测试验方法以及数据判断处理如实施例所述。In this test example, the mammary gland tissue sample described in Test Example 2 and the plasma sample described in Test Example 3 were respectively detected and analyzed by the experimental method of the embodiment. The specific detection test method and data judgment processing are as described in the embodiment.
本测试例利用随机森林(Random Forest)算法建立良恶性预测模型,并选取marker组合9对乳腺结节组织和血浆样本分别进行乳腺恶性结节亚型分类。In this test case, the random forest (Random Forest) algorithm was used to establish a benign and malignant prediction model, and the marker combination 9 was selected to classify the subtypes of breast nodule tissue and plasma samples respectively.
表15.分子标记组合marker组合9对乳腺结节组织进行乳腺恶性结节亚型分类Table 15. Molecular marker combination marker combination 9 classifies breast malignant nodule subtypes on breast nodule tissue
亚型Subtype PPVPPV SESE
ER/PR/Her2 +ER/PR/Her2 + type 0.82420.8242 0.79680.7968
ER +/PR/Her2 -ER + /PR/Her2 - type 0.78020.7802 0.83760.8376
ER -/PR -/Her2 -ER - /PR - /Her2 -type 0.93820.9382 0.8330.833
表16.分子标记组合marker组合9对血浆样本进行乳腺恶性结节亚型分类Table 16. Molecular marker combination marker combination 9 for subtype classification of breast malignant nodules in plasma samples
亚型Subtype PPVPPV SESE
ER/PR/Her2 +ER/PR/Her2 + type 0.70480.7048 0.41480.4148
ER +/PR/Her2 -ER + /PR/Her2 - type 0.74740.7474 0.96140.9614
ER -/PR -/Her2 -ER - /PR - /Her2 -type 0.83340.8334 0.2840.284
该分子标记组合,在乳腺组织中对ER/PR/Her2 +型、ER +/PR/Her2 -型和ER -/PR -/Her2 -型乳腺结节的敏感性分别为0.7968、0.8376和0.833;阳性预测值分别为0.8242、0.7802和0.9382。对不同恶性结节亚型有较好区分度,其中对ER +/PR/Her2 -型和ER -/PR -/Her2 -型结节的灵敏性较高,对ER -/PR -/Her2 -型结节的阳性预测值准确率较高,如表15所述。该分子标记组合,在血浆样本中,对ER/PR/Her2 +型、ER +/PR/Her2 -型和ER -/PR -/Her2 -型乳腺结节的敏感性分别为0.4148、0.9614和0.284,其中对ER +/PR/Her2 -型结节的灵敏性较高;阳性预测值分别为0.7048,0.7474和0.8334,对ER -/PR -/Her2 -型结节的阳性预测值准确率较高,如表16所述。综上所述,该分子标记物组合更适于对ER +/PR/Her2 -型恶性乳腺结节的进行分类。 The sensitivity of this combination of molecular markers to ER/PR/Her2 + , ER + /PR/Her2 - and ER - /PR - /Her2 - breast nodules in breast tissue was 0.7968, 0.8376 and 0.833, respectively; The positive predictive values were 0.8242, 0.7802 and 0.9382, respectively. It has good discrimination for different subtypes of malignant nodules, among which the sensitivity for ER + /PR/Her2 - type and ER - /PR - /Her2 - type nodules is high, and for ER - /PR - /Her2 - The accuracy rate of positive predictive value of type nodules is higher, as shown in Table 15. The sensitivity of this molecular marker combination to ER/PR/Her2 + , ER + /PR/Her2 - and ER - /PR - /Her2 - breast nodules in plasma samples was 0.4148, 0.9614 and 0.284, respectively , the sensitivity to ER + /PR/Her2 - type nodules is higher; the positive predictive values are 0.7048, 0.7474 and 0.8334 respectively, and the accuracy of positive predictive value to ER - /PR - /Her2 - type nodules is higher , as described in Table 16. In summary, this combination of molecular markers is more suitable for the classification of ER + /PR/Her2 - type malignant breast nodules.

Claims (14)

  1. 一种用于乳腺癌诊断的甲基化生物标记物,其中,所述的甲基化生物标记物包括表1中提供的差异性甲基化区域DMR-1~DMR-521中的任一项或其任意组合。A methylation biomarker for the diagnosis of breast cancer, wherein the methylation biomarker includes any one of the differentially methylated regions DMR-1 to DMR-521 provided in Table 1 or any combination thereof.
  2. 根据权利要求1所述的用于乳腺癌诊断的甲基化生物标记物,其中,所述的甲基化生物标记物包括以下中的任一项:DMR-1~DMR-521中至少3个、5个、10个、20个、30个、40个、50个、60个、70个、80个、90个、100个、110个、120个、130个、150个、170个、200个或更多个差异性甲基化区域。The methylation biomarker for breast cancer diagnosis according to claim 1, wherein the methylation biomarker includes any one of the following: at least 3 of DMR-1 to DMR-521 , 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 150, 170, 200 One or more differentially methylated regions.
  3. 根据权利要求1或2所述的用于乳腺癌诊断的甲基化生物标记物,其中,所述的甲基化生物标记物包括(a)-(p)中任一项或其任意组合:The methylated biomarker for breast cancer diagnosis according to claim 1 or 2, wherein said methylated biomarker comprises any one of (a)-(p) or any combination thereof:
    (a)marker组合1中的差异性甲基化区域;(a) Differentially methylated regions in marker combination 1;
    (b)marker组合2中的差异性甲基化区域;(b) Differentially methylated regions in marker combination 2;
    (c)marker组合3中的差异性甲基化区域;(c) Differentially methylated regions in marker combination 3;
    (d)marker组合4中的差异性甲基化区域;(d) Differentially methylated regions in marker combination 4;
    (e)marker组合5中的差异性甲基化区域;(e) Differentially methylated regions in marker combination 5;
    (f)marker组合6中的差异性甲基化区域;(f) Differentially methylated regions in marker combination 6;
    (g)marker组合10中的差异性甲基化区域;(g) Differentially methylated regions in marker combination 10;
    (h)marker组合7中的差异性甲基化区域;(h) Differentially methylated regions in marker combination 7;
    (i)marker组合11中的差异性甲基化区域;(i) Differentially methylated regions in marker combination 11;
    (j)marker组合12中的差异性甲基化区域;(j) Differentially methylated regions in marker combination 12;
    (k)marker组合13中的差异性甲基化区域;(k) Differentially methylated regions in marker combination 13;
    (l)marker组合14中的差异性甲基化区域;(l) Differentially methylated regions in marker combination 14;
    (m)marker组合15中的差异性甲基化区域;(m) Differentially methylated regions in marker combination 15;
    (n)marker组合16中的差异性甲基化区域;(n) Differentially methylated regions in marker combination 16;
    (o)marker组合9中的差异性甲基化区域;(o) Differentially methylated regions in marker combination 9;
    (p)marker组合8中的差异性甲基化区域。(p) Differentially methylated regions in marker set 8.
  4. 根据权利要求3所述的用于乳腺癌诊断的甲基化生物标记物,其中,The methylation biomarker for breast cancer diagnosis according to claim 3, wherein,
    所述的甲基化生物标记物包括marker组合5中的差异性甲基化区域;The methylated biomarkers include differentially methylated regions in marker combination 5;
    和/或,所述的甲基化生物标记物包括marker组合6中的差异性甲基化区域;And/or, the methylated biomarkers include differentially methylated regions in marker combination 6;
    和/或,所述的甲基化生物标记物包括marker组合10中的差异性甲基化区域;And/or, the methylated biomarkers include differentially methylated regions in marker combination 10;
    和/或,所述的甲基化生物标记物包括marker组合7中的差异性甲基化区域;And/or, the methylated biomarkers include differentially methylated regions in marker combination 7;
    和/或,所述的甲基化生物标记物包括marker组合13中的差异性甲基化区域;And/or, the methylated biomarkers include differentially methylated regions in marker combination 13;
    和/或,所述的甲基化生物标记物包括marker组合11中的差异性甲基化区域;And/or, the methylated biomarkers include differentially methylated regions in marker combination 11;
    和/或,所述的甲基化生物标记物包括marker组合12中的差异性甲基化区域。And/or, the methylated biomarkers include differentially methylated regions in marker combination 12.
  5. 根据权利要求1至4中任一项所述的用于乳腺癌诊断的甲基化生物标记物,其中,所述的乳腺癌选自ER/PR/Her2 +型乳腺癌、ER +/PR/Her2 -型乳腺癌和ER -/PR -/Her2 -型乳腺癌; The methylation biomarker for breast cancer diagnosis according to any one of claims 1 to 4, wherein the breast cancer is selected from ER/PR/Her2 + breast cancer, ER + /PR/ Her2 - type breast cancer and ER- / PR- /Her2 - type breast cancer;
    任选地,所述的乳腺癌选自0期、I期、II期、III期和IV期乳腺癌。Optionally, the breast cancer is selected from stage 0, stage I, stage II, stage III and stage IV breast cancer.
  6. 根据权利要求1至4中任一项所述的用于乳腺癌诊断的甲基化生物标记物,其中,所述的诊断为区分乳腺结节为良性或恶性。The methylation biomarker for breast cancer diagnosis according to any one of claims 1 to 4, wherein the diagnosis is to distinguish benign or malignant breast nodules.
  7. 根据权利要求1至4中任一项所述的用于乳腺癌诊断的甲基化生物标记物,其中,所述的诊断为鉴别乳腺癌的分子亚型;The methylation biomarker for breast cancer diagnosis according to any one of claims 1 to 4, wherein the diagnosis is to identify molecular subtypes of breast cancer;
    优选地,所述的乳腺癌的分子亚型包括ER/PR/Her2 +型、ER +/PR/Her2 -型和ER -/PR -/Her2 -型乳腺癌。 Preferably, the molecular subtypes of breast cancer include ER/PR/Her2 + type, ER + /PR/Her2 - type and ER /PR /Her2 - type breast cancer.
  8. 一种乳腺癌诊断试剂盒,其包含检测待测样品中的如权利要求1至7中任一项所述的甲基化生 物标记物的甲基化状态的试剂。A diagnostic kit for breast cancer, comprising a reagent for detecting the methylation status of the methylation biomarker according to any one of claims 1 to 7 in a sample to be tested.
  9. 根据权利要求8所述的乳腺癌诊断试剂盒,其中,所述的待测样品是血浆、血清、血液、组织或其任意组合;优选地,所述的组织为乳腺组织;更优选地,所述的组织为乳腺结节组织。The diagnostic kit for breast cancer according to claim 8, wherein the sample to be tested is plasma, serum, blood, tissue or any combination thereof; preferably, the tissue is breast tissue; more preferably, the The tissue mentioned is mammary gland nodule tissue.
  10. 根据权利要求8或9所述的乳腺癌诊断试剂盒,其中,所述的诊断为区分乳腺结节为良性或恶性。The diagnostic kit for breast cancer according to claim 8 or 9, wherein the diagnosis is to distinguish benign or malignant breast nodules.
  11. 根据权利要求8或9所述的乳腺癌诊断试剂盒,其中,所述的诊断为鉴别乳腺癌的分子亚型;优选地,所述的乳腺癌的分子亚型包括ER/PR/Her2 +型、ER +/PR/Her2 -型和ER -/PR -/Her2 -型乳腺癌。 The diagnostic kit for breast cancer according to claim 8 or 9, wherein the diagnosis is to identify molecular subtypes of breast cancer; preferably, the molecular subtypes of breast cancer include ER/PR/Her2 + type , ER + /PR/Her2 -type and ER - /PR - /Her2 -type breast cancer.
  12. 根据权利要求8至11中任一项所述的乳腺癌诊断试剂盒,其中,所述的试剂为选自以下的检测甲基化状态的方法中所使用的试剂:焦磷酸测序法、重亚硫酸盐转化测序法、甲基化芯片法、qPCR法、数字PCR法、二代测序法、三代测序法、全基因组甲基化测序法、DNA富集检测法、简化亚硫酸氢盐测序技术、HPLC法、MassArray、甲基化特异PCR、或其任意组合。The breast cancer diagnostic kit according to any one of claims 8 to 11, wherein the reagent is a reagent selected from the following methods for detecting methylation status: pyrosequencing, heavy sub Sulfate conversion sequencing method, methylation chip method, qPCR method, digital PCR method, next-generation sequencing method, third-generation sequencing method, genome-wide methylation sequencing method, DNA enrichment detection method, simplified bisulfite sequencing technology, HPLC method, MassArray, methylation-specific PCR, or any combination thereof.
  13. 如权利要求1至7中任一项所述的甲基化生物标记物在制备用于诊断受试者是否罹患乳腺癌的试剂盒中的应用。Use of the methylation biomarker according to any one of claims 1 to 7 in the preparation of a kit for diagnosing whether a subject suffers from breast cancer.
  14. 根据权利要求13所述的应用,其中,所述的诊断为区分乳腺结节为良性或恶性;或者The application according to claim 13, wherein the diagnosis is to distinguish breast nodules as benign or malignant; or
    所述的诊断为鉴别乳腺癌的分子亚型。The diagnosis is to identify molecular subtypes of breast cancer.
PCT/CN2022/129181 2021-11-04 2022-11-02 Methylation biomarker for breast cancer diagnosis and use thereof WO2023078283A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202111300404.9 2021-11-04
CN202111300404 2021-11-04
CN202111464539.9 2021-12-03
CN202111464539 2021-12-03

Publications (1)

Publication Number Publication Date
WO2023078283A1 true WO2023078283A1 (en) 2023-05-11

Family

ID=86168959

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/129181 WO2023078283A1 (en) 2021-11-04 2022-11-02 Methylation biomarker for breast cancer diagnosis and use thereof

Country Status (2)

Country Link
CN (1) CN116064809A (en)
WO (1) WO2023078283A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102666876A (en) * 2009-09-22 2012-09-12 皇家飞利浦电子股份有限公司 Method and compositions for assisting in diagnosing and/or monitoring breast cancer progression
CN106636430A (en) * 2017-01-26 2017-05-10 湖南圣维基因科技有限公司 Biomarker combination and kit for predicting breast cancer and use method
CN109680060A (en) * 2017-10-17 2019-04-26 华东师范大学 Methylate marker and its application in diagnosing tumor, classification
CN111378754A (en) * 2020-04-23 2020-07-07 嘉兴市第一医院 TCGA (TCGA-based genetic algorithm) database-based breast cancer methylation biomarker and screening method thereof
CN111863250A (en) * 2020-08-14 2020-10-30 中国科学院大学温州研究院(温州生物材料与工程研究所) Combined diagnosis model and system for early breast cancer
CN111910004A (en) * 2020-08-14 2020-11-10 中国科学院大学温州研究院(温州生物材料与工程研究所) Application of cfDNA in noninvasive diagnosis of early breast cancer

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102666876A (en) * 2009-09-22 2012-09-12 皇家飞利浦电子股份有限公司 Method and compositions for assisting in diagnosing and/or monitoring breast cancer progression
CN106636430A (en) * 2017-01-26 2017-05-10 湖南圣维基因科技有限公司 Biomarker combination and kit for predicting breast cancer and use method
CN109680060A (en) * 2017-10-17 2019-04-26 华东师范大学 Methylate marker and its application in diagnosing tumor, classification
CN111378754A (en) * 2020-04-23 2020-07-07 嘉兴市第一医院 TCGA (TCGA-based genetic algorithm) database-based breast cancer methylation biomarker and screening method thereof
CN111863250A (en) * 2020-08-14 2020-10-30 中国科学院大学温州研究院(温州生物材料与工程研究所) Combined diagnosis model and system for early breast cancer
CN111910004A (en) * 2020-08-14 2020-11-10 中国科学院大学温州研究院(温州生物材料与工程研究所) Application of cfDNA in noninvasive diagnosis of early breast cancer

Also Published As

Publication number Publication date
CN116064809A (en) 2023-05-05

Similar Documents

Publication Publication Date Title
TWI797095B (en) Methods and systems for tumor detection
KR102587176B1 (en) Non-invasive determination of methylome of fetus or tumor from plasma
JP2018512048A (en) Mutation detection for cancer screening and fetal analysis
US10731224B2 (en) Enhancement of cancer screening using cell-free viral nucleic acids
CN112501293B (en) Reagent combination for detecting liver cancer, kit and application thereof
WO2015073949A1 (en) Method of subtyping high-grade bladder cancer and uses thereof
TWI730429B (en) HOXA7 methylation detection reagent
CN108676878B (en) Application of product for detecting NDRG4 gene methylation site in preparation of product for early detection of colorectal cancer
WO2022161076A1 (en) Methylation markers for detection of benign/malignant pulmonary nodules or combination thereof, and application thereof
WO2023226938A1 (en) Methylation biomarker, kit and use
CN114317738A (en) Methylation biomarker related to detection of gastric cancer lymph node metastasis or combination and application thereof
Hai et al. Whole‐genome circulating tumor DNA methylation landscape reveals sensitive biomarkers of breast cancer
WO2021233329A1 (en) Methylation biomarker or combination thereof for detecting breast cancer, and application
WO2023226939A1 (en) Methylation biomarker for detecting colorectal cancer lymph node metastasis and use thereof
WO2023078283A1 (en) Methylation biomarker for breast cancer diagnosis and use thereof
EP4294938A1 (en) Cell-free dna methylation test
CN114150065B (en) Marker for colorectal cancer or precancerous lesion and application thereof
CN115772566B (en) Methylation biomarker for auxiliary detection of lung cancer somatic ERBB2 gene mutation and application thereof
CN111378757B (en) Application of methylation state of region near HBV integration site in cancer detection
WO2022188776A1 (en) Gene methylation marker or combination thereof that can be used for gastric carcinoma her2 companion diagnostics, and use thereof
US20230102121A1 (en) Reagent combination and kit for detecting liver cancers, and use thereof
CN117363724A (en) Methylation biomarker for diagnosing gastric cancer and application thereof
CN117344011A (en) Methylation biomarker for diagnosing gastric cancer, kit and application
Wang et al. Early Detection of Metastatic Relapse and Monitoring of Therapeutic Efficacy by a Five Circulating Tumor DNA Methylation Signature in Colorectal Cancer

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22889305

Country of ref document: EP

Kind code of ref document: A1