CN111440869A - DNA methylation marker for predicting primary breast cancer occurrence risk and screening method and application thereof - Google Patents

DNA methylation marker for predicting primary breast cancer occurrence risk and screening method and application thereof Download PDF

Info

Publication number
CN111440869A
CN111440869A CN202010181207.9A CN202010181207A CN111440869A CN 111440869 A CN111440869 A CN 111440869A CN 202010181207 A CN202010181207 A CN 202010181207A CN 111440869 A CN111440869 A CN 111440869A
Authority
CN
China
Prior art keywords
methylation
breast cancer
primary breast
risk
predicting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010181207.9A
Other languages
Chinese (zh)
Inventor
张红雨
全源
佟馨宇
李姜
黄清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Baiyao Lianke Science And Technology Co ltd
Huazhong Agricultural University
Original Assignee
Wuhan Baiyao Lianke Science And Technology Co ltd
Huazhong Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Baiyao Lianke Science And Technology Co ltd, Huazhong Agricultural University filed Critical Wuhan Baiyao Lianke Science And Technology Co ltd
Priority to CN202010181207.9A priority Critical patent/CN111440869A/en
Publication of CN111440869A publication Critical patent/CN111440869A/en
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Genetics & Genomics (AREA)
  • Organic Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Biotechnology (AREA)
  • Zoology (AREA)
  • Molecular Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Wood Science & Technology (AREA)
  • Pathology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Immunology (AREA)
  • Data Mining & Analysis (AREA)
  • Bioethics (AREA)
  • Public Health (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Hospice & Palliative Care (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Oncology (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a DNA methylation marker for predicting primary breast cancer occurrence risk and a screening method and application thereof. The method comprises the following steps: (1) carrying out methylation analysis on the sample data to obtain methylation sites relevant to predicting the risk of primary breast cancer; (2) obtaining a methylation Beta value for the methylation site; (3) constructing a primary breast cancer occurrence risk prediction model based on the methylation Beta value of the methylation site, and verifying the feasibility of the model by calculating a ratio; (4) constructing a primary breast cancer occurrence risk prediction model based on the methylation sites by adopting a machine learning method, calculating a ratio, an AUC (AUC), a recall rate, an accuracy rate and an F1 value, and performing mutual verification with the prediction model in the step (3); (5) and the methylation sites corresponding to the methylation probes in the prediction model are DNA methylation markers. The invention can improve the detection rate of primary breast cancer and is suitable for large-scale popularization and application.

Description

DNA methylation marker for predicting primary breast cancer occurrence risk and screening method and application thereof
Technical Field
The invention relates to the technical field of biology, in particular to a DNA methylation marker for predicting primary breast cancer occurrence risk and a screening method and application thereof.
Background
Breast Cancer is the first of the worldwide female malignancies, one of the most common malignancies in women worldwide, according to 2012 data, more than 500,000 deaths, approximately 268,000 women are diagnosed with Breast Cancer each year, of which approximately 70,000 die From Breast Cancer, accounting for approximately 15% of all women 'S newly diagnosed malignancies and 7% of women' S Cancer-related deaths (more L a, bran F, Siegel R L, et al global Cancer status, 2012[ J ] CA: a Cancer patient for clinics, 2015,65(2): 87-108; most J.A, Merino, Bonilla, et al Breast Cancer in the 21 center, Breast Cancer to Cancer [ J ] Radiation, plasma, Radiation.
The tumor biomarker can be used as tumor sign or judging tumor treatment responseAccording to the existing treatment scheme, we can use the tumor biomarkers to realize help in diagnosing tumors, determining the selection of treatment schemes and predicting the treatment efficacy (Califf R M. Biomarker definitions and the therapeutic effects [ J ]].Experimental Biology and Medicine,2018,243(3):213-221;Aronson J K,Ferner RE.Biomarkers—a general review[J]Current protocols in pharmacology,2017,76(1): 9.23.1-9.23.17.). However, the existing tumor immunotherapy methods have some defects, such as long time-consuming process, high cost, etc. The role of tumor biomarkers extends the entire tumor diagnosis process from tumor risk prediction, diagnosis to treatment regimen selection and efficacy prediction (Kim S H, Hoffmann U, Borggrefe M, et al. Advantagesand limitations of current biological research J. experimental research]Current pharmaceutical biotechnology,2017,18(6):445-].Critical ReviewsTMin Oncogenesis,2017,22(5-6). Therefore, finding accurate biomarkers is a crucial step in tumor immunotherapy approaches and tumor risk prediction.
DNA Methylation (DNA Methylation) refers to the fact that cytosine in a specific sequence (continuous cytosine and guanine) on a DNA sequence is likely to be methylated, changes in the degree of Methylation can alter the expression of genes by changing chromatin structure and stability, altering transcription factor binding Activity, etc. (Bouyer D, Kramdi A, Kassam M, et al. DNA Methylation vector life [ J ]. genology, 2017,18(1):179.) generally, hypomethylation is favorable for the expression of genes, hypermethylation inhibits the expression of genes, abnormal Methylation of DNA is one of the marker events in the process of tumorigenesis development, abnormal Methylation results in dysfunction of some genes (Pan Y, L iu G, ZHou F, et al. DNA Methylation in tumor diagnosis genes and expression of breast Cancer, see no more than 12, no more than 10, no more than 25, no more than 10, no more than 12, no more than 10, no more than 12, no more than 25, no more than 15, no more than 25, no more than one, more than three, more than one, more than four, more than one, more than three, more than one, more than four, one, more than four, four.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a DNA methylation marker for predicting the risk of primary breast cancer, a screening method and application thereof.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a screening method for DNA methylation markers for predicting the risk of primary breast cancer development, comprising the steps of:
(1) carrying out methylation analysis on the sample data to obtain methylation sites relevant to predicting the risk of primary breast cancer;
(2) obtaining a methylation Beta value for the methylation site;
(3) constructing a primary breast cancer occurrence risk prediction model based on the methylation Beta value of the methylation site, and verifying the feasibility of the model by calculating a ratio;
(4) constructing a primary breast cancer occurrence risk prediction model based on the methylation sites by adopting a machine learning method, calculating a ratio, an AUC (AUC), a recall rate, an accuracy rate and an F1 value, and performing mutual verification with the prediction model in the step (3);
(5) and (4) obtaining a DNA methylation marker as a methylation site corresponding to the methylation probe in the prediction models in the step (3) and the step (4).
The inventor of the application discovers a DNA methylation biomarker for predicting the occurrence risk of primary breast cancer by downloading and integrating DNA methylation data of 235 primary breast cancer patients and 2484 non-primary breast cancer blood samples in a GEO database, analyzing the obtained DNA methylation data, homogenizing the data, and obtaining methylation sites related to the onset of the breast cancer and methylation Beta values aiming at the methylation sites; two primary breast cancer occurrence risk prediction models are constructed by using methylation sites, and machine learning methods (random forest, cross validation and confusion matrix) are used for validation.
Further, methylation sites and their corresponding methylation probes associated with predicting risk of primary breast cancer are shown in the following table:
Figure BDA0002412614750000041
the invention constructs a primary breast cancer occurrence risk prediction model by using the methylation sites.
Further, in the step (4), a random forest method is used, an R language platform and an R studio tool are used for data analysis, cross validation is combined, a confusion matrix is used as a visualization tool, a prediction model is constructed according to sample methylation data, and a primary breast cancer occurrence risk prediction model is constructed based on the methylation Beta value of the methylation site for mutual validation.
The invention also provides a DNA methylation marker for predicting the risk of primary breast cancer development, selected from at least one of the following CpG sites (based on the coordinates of the Chromosome _36/Coordinate _36 version site in the genome): 10:12277929, 10:12277911, 10:12256556, 10:12278542, 10:12278166, 10:12277918, 10:12276839, 10:12277559, 10:12249674, 10:12251484, 10:12278347, 10:12277764, 10:12268313, 10:12275299, 10:12277961, 10:12277807, 10:12278371, 10:12278305 and 10: 12261333. The methylation marker of the breast cancer related gene can be used for detecting the methylation degree of primary breast cancer related DNA, and further the 19 methylation markers can be combined to detect the methylation degrees of a plurality of primary breast cancer related DNA, so that the detection rate of primary breast cancer can be obviously improved, and the method is suitable for large-scale popularization and application.
The invention also provides application of the DNA methylation marker in preparation of a kit for predicting or assisting in predicting the risk of primary breast cancer.
The CpG sites are used as biomarkers, a kit capable of detecting the 19 sites is developed to detect and evaluate the genes of the 19 sites, so that the risk of primary breast cancer can be predicted or assisted to be predicted, the kit can be applied to predicting whether a patient can suffer from primary breast cancer, and the kit has wide application prospect.
Compared with the prior art, the invention has the beneficial effects that:
the methylation marker of the primary breast cancer related gene obtained by screening can judge the methylation degree of a plurality of primary breast cancer related DNAs, screen individuals more susceptible to primary breast cancer, improve the detection rate of primary breast cancer, screen individuals more susceptible to primary breast cancer, save cost, be suitable for popularization and application and have better application prospect.
Detailed Description
To better illustrate the objects, aspects and advantages of the present invention, the present invention will be further described with reference to specific examples. It will be understood by those skilled in the art that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The present inventors collected 2719 samples in total, 235 breast cancer patients and 2484 normal blood samples, by analyzing DNA methylation data (derived from https:// www.ncbi.nlm.nih.gov/gds/, GSE51032, GSE111629, GSE85210, GSE53045, GSE50660 and GSE115278 DataSets in NCBI' GEO DataSets database); using a standard Illumina Human 450k DNA methylation Chip data output file, using an R language platform (v3.6.0) and an R studio tool (v3.6.0) to perform data analysis, wherein the used toolkit is minfi (v1.32.0); DNA methylation levels were determined by methylation analysis of sample data to obtain 19 significant P values (P-value >0.05), methylation sites likely to be associated with breast cancer onset (see table 1), and calculating the methylation Beta values of the methylation sites, intensity values from methylated bead types/(intensity values from methylated + intensity values from unmethylated bead types + 100). The sites are DNA methylation markers for predicting the risk of primary breast cancer; predicting the future risk of the primary breast cancer of the sample by detecting the methylation Beta value of the methylation site in the sample, wherein the higher the Beta value is, the higher the risk of the primary breast cancer is.
Table 1 sites significantly associated with prediction of primary breast cancer risk
Figure BDA0002412614750000051
Figure BDA0002412614750000061
Constructing a prediction model by using the methylation Beta value of the methylation site, and verifying the feasibility of the model by calculating an Odds ratio value (ratio), wherein the calculation formula of the Odds ratio value is as follows:
Figure BDA0002412614750000062
wherein, a is the number of samples which are predicted to be primary breast cancer patients and actually are also primary breast cancer patients, b is the number of samples which are predicted to be primary breast cancer patients and actually are normal samples, c is the number of samples which are predicted to be normal and actually are primary breast cancer patients, and d is the number of samples which are predicted to be normal and actually are also normal samples.
The prediction models are mutually verified synchronously by using a machine learning method (random forest, cross verification and confusion matrix):
in the machine learning method, a random forest is a classifier comprising a plurality of decision trees, each decision tree generates corresponding prediction output according to an input data set, and an algorithm selects a category mode as a prediction result by adopting a voting mechanism. The method is suitable for the data set with a large number of features, can evaluate the importance of a single feature in classification, and does not worry about overfitting. Through a self-help resampling technology, repeatedly and randomly extracting k samples from an original training data set N by a returned application bootstrap method to generate a new training sample set, and constructing k classification trees according to the sample set. If n features are set, mtry features are randomly extracted at each node of each tree, and one feature with the most classification capability is selected from the features to perform node splitting by calculating the information content contained in each feature. Each tree grows to the maximum extent without any cutting, and a plurality of generated trees form a random forest. Each unpulped sample constitutes K out-of-bag data (BBB). The classification result of the new data is determined by the mode of the classification results of the decision trees. The essence of this is an improvement in decision tree algorithms, which combine multiple decision trees, each of which is built on an independently drawn sample. The classification power of a single tree may be small, but after a large number of decision trees are randomly generated, a test sample can determine the final classification of each tree by voting through the classification result of each tree. The method uses a random forest method, uses an R language platform (v3.6.0) and an R studio tool (v3.6.0) to perform data analysis, uses a randomForest (v4.6-14) as a tool package, combines cross validation, uses a confusion-matrix (confusion-matrix) as a visualization tool, constructs a prediction model according to sample methylation data, and performs mutual validation with the prediction model constructed based on the methylation Beta value of a methylation site.
If an instance is a Positive class, but is predicted to be a Positive class, i.e., a True class (TP);
if an example is a Negative class, but is predicted to be a Negative class, namely a True Negative class (TN);
if an instance is a negative class, but is predicted to be a Positive class, i.e., a False Positive class (FP);
if an instance is positive, but is predicted to be Negative, i.e. False Negative (FN)
From the prediction results of the prediction model, the obtainable confusion matrix is as follows:
Figure BDA0002412614750000071
based on the confusion matrix values, the following general evaluation indices in the art can be introduced: odds ratio values, AUC, recall, accuracy and F1 values.
The Odds ratio value is calculated as:
Figure BDA0002412614750000072
wherein, a is the number of samples which are predicted to be primary breast cancer patients and actually are also primary breast cancer patients, b is the number of samples which are predicted to be primary breast cancer patients and actually are normal samples, c is the number of samples which are predicted to be normal and actually are primary breast cancer patients, and d is the number of samples which are predicted to be normal and actually are also normal samples.
The Recall rate Recall value, which is the proportion of all correctly predicted positive accounts for actually being positive, is calculated by the following formula:
Figure BDA0002412614750000081
the Precision value, which is the proportion of all predictions that are correctly predicted as positive, is calculated as:
Figure BDA0002412614750000082
the F1 value (H-mean value), a weighted average of model accuracy and recall, is calculated as:
Figure BDA0002412614750000083
after the formula is converted into:
Figure BDA0002412614750000084
ROC (receiver Operating characteristics), namely the "receiver Operating characteristics", the main analysis tool is a curve-ROC curve drawn on a two-dimensional plane. The abscissa of the plane is False Positive Rate (FPR) and the ordinate is True Positive Rate (TPR). For a classifier, we can derive a TPR and FPR point pair based on its performance on the test sample. Thus, the classifier can be mapped to a point on the ROC plane. By adjusting the threshold used by the classifier, we can obtain a curve passing through (0,0) and (1,1), which is the ROC curve of the classifier. In general, this curve should be above the (0,0) and (1,1) lines. Because the ROC curve formed by the (0,0) and (1,1) lines actually represents a random classifier. AUC (area Under rocCurve) is used to measure the performance (generalization capability) of the machine learning algorithm of the "two-class problem", the area of the ROC curve is the AUC, the AUC value is usually between 0.5 and 1.0, and the larger the AUC value, the better the classification model.
The AUC calculation formula is: namely, it is
Figure BDA0002412614750000085
Wherein M is the number of samples predicted as positive class, and N is the number of samples predicted as negative class.
Example 1
Taking DNA methylation marker 10:12277929 as an example, according to the Beta value of the sample, constructing a prediction model, predicting 32 patients with primary breast cancer, and predicting 14 samples without primary breast cancer in the patients according to the data set information, and obtaining the following results:
Figure BDA0002412614750000086
Figure BDA0002412614750000091
the OR value in this model was 27.8114, the P value was 1.02E-24, significant, with a 95% confidence interval of [14.6042,52.9625 ].
Example 2
Taking DNA methylation marker 10:12277911 as an example, according to the Beta value of the sample, constructing a prediction model, predicting 30 patients with primary breast cancer, and predicting 20 non-diseased samples in the patients according to the data set information, and obtaining the following results:
Figure BDA0002412614750000092
the OR value in this model was 18.0293, the P value was 2.16E-20, significant, with a 95% confidence interval of [9.6865,34.0422 ].
Example 3
Taking DNA methylation marker 10:12256556 as an example, according to the Beta value of the sample, constructing a prediction model, predicting 26 patients with primary breast cancer, and predicting 34 samples without primary breast cancer in the patients according to the data set information, and obtaining the following results:
Figure BDA0002412614750000093
the OR value in this model was 8.9491, the P value was 2.98E-13, significant, with a 95% confidence interval of [5.0519,15.6948 ].
Example 4
Taking DNA methylation marker 10:12278542 as an example, according to the Beta value of the sample, constructing a prediction model, predicting 22 patients with primary breast cancer, and predicting 21 non-diseased samples in the patients according to the data set information to obtain the following results:
Figure BDA0002412614750000094
Figure BDA0002412614750000101
the OR value in this model was 12.0889, the P value was 3.21E-13, significant, with a 95% confidence interval of [6.2288,23.5211 ].
Example 5
Taking DNA methylation marker 10:12278166 as an example, according to the Beta value of the sample, constructing a prediction model, predicting 21 patients with primary breast cancer, and predicting 15 non-diseased samples in the patients according to the data set information to obtain the following results:
Figure BDA0002412614750000102
the OR value in this model was 16.1141, the P value was 3.43E-14, significant, with a 95% confidence interval of [7.7875,34.1605 ].
Example 6
Using a random forest method, using an R language platform (v3.6.0) and an R studio tool (v3.6.0) to perform data analysis, using a randomForest (v4.6-14) as a tool package, combining cross validation, using a confusion matrix as a visualization tool, and constructing a prediction model according to sample methylation data to obtain the following results:
Figure BDA0002412614750000103
the OR value in this model was 26.6127, the P value was 3.96E-30, significant, with 95% confidence intervals [15.1222,46.8341], the AUC value was 0.9112, the Recall value was 0.1657, the Precision value was 0.6857, and the F1 value was 0.2668 (a weighted average of model Precision and Recall).
The two models were validated against each other by detecting selected DNA methylation markers: 10:12277929, 10:12277911, 10:12256556, 10:12278542, 10:12278166, 10:12277918, 10:12276839, 10:12277559, 10:12249674, 10:12251484, 10:12278347, 10:12277764, 10:12268313, 10:12275299, 10:12277961, 10:12277807, 10:12278371, 10:12278305 and 10:12261333 (based on the coordinates of the Chromosome-36/Coordinate-36 version site in the genome), the risk of primary breast cancer can be well predicted.
The DNA methylation marker for predicting the primary breast cancer comprises at least one of the 19 primary breast cancer related gene methylation markers, and the 19 breast cancer related gene methylation markers can be combined to detect the methylation degrees of a plurality of primary breast cancer related DNA, so that the detection rate of the primary breast cancer can be obviously improved, the cost is saved, and the DNA methylation marker is suitable for popularization and application and has a better application prospect.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and not for limiting the protection scope of the present invention, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (5)

1. A method of screening for a DNA methylation marker for predicting the risk of developing primary breast cancer, comprising the steps of:
(1) carrying out methylation analysis on the sample data to obtain methylation sites relevant to predicting the risk of primary breast cancer;
(2) obtaining a methylation Beta value for the methylation site;
(3) constructing a primary breast cancer occurrence risk prediction model based on the methylation Beta value of the methylation site, and verifying the feasibility of the model by calculating a ratio;
(4) constructing a primary breast cancer occurrence risk prediction model based on the methylation sites by adopting a machine learning method, calculating a ratio, an AUC (AUC), a recall rate, an accuracy rate and an F1 value, and performing mutual verification with the prediction model in the step (3);
(5) and (4) obtaining a DNA methylation marker as a methylation site corresponding to the methylation probe in the prediction models in the step (3) and the step (4).
2. The method for screening DNA methylation markers according to claim 1, wherein the methylation sites and corresponding methylation probes related to the prediction of the risk of primary breast cancer are shown in the following table:
Figure FDA0002412614740000011
Figure FDA0002412614740000021
3. the method for screening DNA methylation markers, according to claim 1, wherein in the step (4), a random forest method is used, an R language platform and an R studio tool are used for data analysis, cross validation is combined, a confusion matrix is used as a visualization tool, a prediction model is constructed according to sample methylation data, and a primary breast cancer occurrence risk prediction model is constructed based on methylation Beta values of methylation sites for mutual validation.
4. A DNA methylation marker for predicting the risk of primary breast cancer development, characterized by being selected from at least one of the following CpG sites: 10:12277929, 10:12277911, 10:12256556, 10:12278542, 10:12278166, 10:12277918, 10:12276839, 10:12277559, 10:12249674, 10:12251484, 10:12278347, 10:12277764, 10:12268313, 10:12275299, 10:12277961, 10:12277807, 10:12278371, 10:12278305 and 10: 12261333.
5. Use of the DNA methylation marker of claim 4 for the preparation of a kit for predicting or aiding in the prediction of the risk of developing primary breast cancer.
CN202010181207.9A 2020-03-16 2020-03-16 DNA methylation marker for predicting primary breast cancer occurrence risk and screening method and application thereof Pending CN111440869A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010181207.9A CN111440869A (en) 2020-03-16 2020-03-16 DNA methylation marker for predicting primary breast cancer occurrence risk and screening method and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010181207.9A CN111440869A (en) 2020-03-16 2020-03-16 DNA methylation marker for predicting primary breast cancer occurrence risk and screening method and application thereof

Publications (1)

Publication Number Publication Date
CN111440869A true CN111440869A (en) 2020-07-24

Family

ID=71653993

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010181207.9A Pending CN111440869A (en) 2020-03-16 2020-03-16 DNA methylation marker for predicting primary breast cancer occurrence risk and screening method and application thereof

Country Status (1)

Country Link
CN (1) CN111440869A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112037854A (en) * 2020-10-15 2020-12-04 深圳市龙岗中心医院 Method and system for acquiring tumor methylation marker based on methylation chip data
CN112877419A (en) * 2021-01-20 2021-06-01 武汉大学 DNA methylation marker for predicting schizophrenia occurrence risk, screening method and application
CN115620812A (en) * 2022-12-21 2023-01-17 珠海圣美生物诊断技术有限公司 Resampling-based feature selection method and device, electronic equipment and storage medium
CN116758989A (en) * 2023-06-09 2023-09-15 哈尔滨星云生物信息技术开发有限公司 Breast cancer marker screening method and related device
WO2024007205A1 (en) * 2022-07-06 2024-01-11 何肇基 Method and system for establishing indicator for assessing degree of malignancy of tissue microenvironment, and method and system for using indicator for assessing degree of malignancy of tissue microenvironment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107574243A (en) * 2016-06-30 2018-01-12 博奥生物集团有限公司 The construction method of molecular marker, reference gene and its application, detection kit and detection model
CN107604061A (en) * 2017-08-31 2018-01-19 中国科学院北京基因组研究所 The screening technique in mitochondrial core DNA methylation joint site and application
CN108676879A (en) * 2018-05-24 2018-10-19 中国科学院北京基因组研究所 Special application of the methylation sites as breast cancer molecular classification diagnosis marker

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107574243A (en) * 2016-06-30 2018-01-12 博奥生物集团有限公司 The construction method of molecular marker, reference gene and its application, detection kit and detection model
CN107604061A (en) * 2017-08-31 2018-01-19 中国科学院北京基因组研究所 The screening technique in mitochondrial core DNA methylation joint site and application
CN108676879A (en) * 2018-05-24 2018-10-19 中国科学院北京基因组研究所 Special application of the methylation sites as breast cancer molecular classification diagnosis marker

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BERNARDO P. DE ALMEIDA: "Roadmap of DNA methylation in breast cancer identifies novel prognostic biomarkers" *
华艳珊: "乳腺癌DNA 甲基化临床研究进展" *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112037854A (en) * 2020-10-15 2020-12-04 深圳市龙岗中心医院 Method and system for acquiring tumor methylation marker based on methylation chip data
CN112037854B (en) * 2020-10-15 2024-04-09 深圳市龙岗中心医院 Method and system for obtaining tumor methylation marker based on methylation chip data
CN112877419A (en) * 2021-01-20 2021-06-01 武汉大学 DNA methylation marker for predicting schizophrenia occurrence risk, screening method and application
WO2024007205A1 (en) * 2022-07-06 2024-01-11 何肇基 Method and system for establishing indicator for assessing degree of malignancy of tissue microenvironment, and method and system for using indicator for assessing degree of malignancy of tissue microenvironment
CN115620812A (en) * 2022-12-21 2023-01-17 珠海圣美生物诊断技术有限公司 Resampling-based feature selection method and device, electronic equipment and storage medium
CN116758989A (en) * 2023-06-09 2023-09-15 哈尔滨星云生物信息技术开发有限公司 Breast cancer marker screening method and related device
CN116758989B (en) * 2023-06-09 2024-04-30 哈尔滨星云生物信息技术开发有限公司 Breast cancer marker screening method and related device

Similar Documents

Publication Publication Date Title
CN111440869A (en) DNA methylation marker for predicting primary breast cancer occurrence risk and screening method and application thereof
JP6063446B2 (en) Analysis of biomarker expression in cells by product rate
CN109411015A (en) Tumor mutations load detection device and storage medium based on Circulating tumor DNA
Wang et al. A novel approach combined transfer learning and deep learning to predict TMB from histology image
CN111564214A (en) Establishment and verification method of breast cancer prognosis evaluation model based on 7 special genes
CN112735592B (en) Construction method and application method of lung cancer prognosis model and electronic equipment
CN115375640A (en) Tumor heterogeneity identification method and device, electronic equipment and storage medium
KR101765999B1 (en) Device and Method for evaluating performace of cancer biomarker
CN109979532B (en) Thyroid papillary carcinoma distant metastasis molecular mutation prediction model, method and system
Chen et al. Integrative network analysis to identify aberrant pathway networks in ovarian cancer
CN111833963A (en) cfDNA classification method, device and application
CN112382341A (en) Method for identifying biomarkers related to esophageal squamous carcinoma prognosis
KR20190000169A (en) System and method of biomarker identification for cancer recurrence prediction
CN110942808A (en) Prognosis prediction method and prediction system based on gene big data
CN116403701A (en) Method and device for predicting TMB level of non-small cell lung cancer patient
CN114974432A (en) Screening method of biomarker and related application thereof
US20140107936A1 (en) Cross-modal application of combination signatures indicative of a phenotype
CA3128379A1 (en) Stratification of risk of virus associated cancers
Wang et al. Classification of Muscle Invasive Bladder Cancer to Predict Prognosis of Patients Treated with Immunotherapy
CN117438097B (en) Method and system for predicting recurrence risk after early liver cancer operation
Van Kleunen et al. The spatial structure of the tumor immune microenvironment can explain and predict patient response in high-grade serous carcinoma
KR20190126606A (en) IDENTIFYING METHOD FOR TUMOR PATIENT BASED ON miRNA IN EXOSOME AND APPARATUS FOR THE SAME
Yang et al. SMART: reference-free deconvolution for spatial transcriptomics using marker-gene-assisted topic models
Wang et al. Application of Machine Learning for Tracing the Origin of Metastatic Lung Cancer Tissues
Phuong et al. Computational modeling approaches for circulating cell-free DNA in oncology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200724

RJ01 Rejection of invention patent application after publication