CN116631510B - Device for differential diagnosis of Crohn's disease and ulcerative colitis - Google Patents
Device for differential diagnosis of Crohn's disease and ulcerative colitis Download PDFInfo
- Publication number
- CN116631510B CN116631510B CN202310559017.XA CN202310559017A CN116631510B CN 116631510 B CN116631510 B CN 116631510B CN 202310559017 A CN202310559017 A CN 202310559017A CN 116631510 B CN116631510 B CN 116631510B
- Authority
- CN
- China
- Prior art keywords
- sample
- ulcerative colitis
- gene
- mmps
- expression
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 208000011231 Crohn disease Diseases 0.000 title claims abstract description 48
- 206010009900 Colitis ulcerative Diseases 0.000 title claims abstract description 31
- 201000006704 Ulcerative Colitis Diseases 0.000 title claims abstract description 31
- 238000003748 differential diagnosis Methods 0.000 title description 12
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 75
- 230000014509 gene expression Effects 0.000 claims abstract description 44
- 102000002274 Matrix Metalloproteinases Human genes 0.000 claims abstract description 40
- 108010000684 Matrix Metalloproteinases Proteins 0.000 claims abstract description 40
- 238000001514 detection method Methods 0.000 claims abstract description 7
- 101001069921 Homo sapiens Growth-regulated alpha protein Proteins 0.000 claims description 25
- 102100025277 C-X-C motif chemokine 13 Human genes 0.000 claims description 24
- 101000858064 Homo sapiens C-X-C motif chemokine 13 Proteins 0.000 claims description 24
- 102100040006 Annexin A1 Human genes 0.000 claims description 23
- 101000959738 Homo sapiens Annexin A1 Proteins 0.000 claims description 23
- 101001013150 Homo sapiens Interstitial collagenase Proteins 0.000 claims description 23
- 102000000380 Matrix Metalloproteinase 1 Human genes 0.000 claims description 23
- 102100034221 Growth-regulated alpha protein Human genes 0.000 claims description 17
- 238000012360 testing method Methods 0.000 claims description 4
- 208000022559 Inflammatory bowel disease Diseases 0.000 description 21
- 238000000034 method Methods 0.000 description 19
- 208000002551 irritable bowel syndrome Diseases 0.000 description 14
- 238000011088 calibration curve Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 238000012795 verification Methods 0.000 description 7
- 101150027068 DEGS1 gene Proteins 0.000 description 6
- 238000003745 diagnosis Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 238000012549 training Methods 0.000 description 6
- 238000010200 validation analysis Methods 0.000 description 6
- 238000011282 treatment Methods 0.000 description 5
- 102000010834 Extracellular Matrix Proteins Human genes 0.000 description 4
- 108010037362 Extracellular Matrix Proteins Proteins 0.000 description 4
- 241000282414 Homo sapiens Species 0.000 description 4
- 101000577881 Homo sapiens Macrophage metalloelastase Proteins 0.000 description 4
- 101000990912 Homo sapiens Matrilysin Proteins 0.000 description 4
- 101000990902 Homo sapiens Matrix metalloproteinase-9 Proteins 0.000 description 4
- 101000669513 Homo sapiens Metalloproteinase inhibitor 1 Proteins 0.000 description 4
- 101000990915 Homo sapiens Stromelysin-1 Proteins 0.000 description 4
- 101000577874 Homo sapiens Stromelysin-2 Proteins 0.000 description 4
- 101000638886 Homo sapiens Urokinase-type plasminogen activator Proteins 0.000 description 4
- 102100027998 Macrophage metalloelastase Human genes 0.000 description 4
- 102100030417 Matrilysin Human genes 0.000 description 4
- 102100030412 Matrix metalloproteinase-9 Human genes 0.000 description 4
- 102100039364 Metalloproteinase inhibitor 1 Human genes 0.000 description 4
- 102100030416 Stromelysin-1 Human genes 0.000 description 4
- 102100028848 Stromelysin-2 Human genes 0.000 description 4
- 102100031358 Urokinase-type plasminogen activator Human genes 0.000 description 4
- 238000002790 cross-validation Methods 0.000 description 4
- 210000002744 extracellular matrix Anatomy 0.000 description 4
- 108010052500 Calgranulin A Proteins 0.000 description 3
- 108010052495 Calgranulin B Proteins 0.000 description 3
- 108010037462 Cyclooxygenase 2 Proteins 0.000 description 3
- 238000012351 Integrated analysis Methods 0.000 description 3
- 102100038280 Prostaglandin G/H synthase 2 Human genes 0.000 description 3
- 102100032442 Protein S100-A8 Human genes 0.000 description 3
- 102100032420 Protein S100-A9 Human genes 0.000 description 3
- 208000025865 Ulcer Diseases 0.000 description 3
- 238000010276 construction Methods 0.000 description 3
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 238000007477 logistic regression Methods 0.000 description 3
- 230000006916 protein interaction Effects 0.000 description 3
- 230000001105 regulatory effect Effects 0.000 description 3
- 231100000397 ulcer Toxicity 0.000 description 3
- 206010016654 Fibrosis Diseases 0.000 description 2
- 102000005741 Metalloproteases Human genes 0.000 description 2
- 108010006035 Metalloproteases Proteins 0.000 description 2
- 206010028116 Mucosal inflammation Diseases 0.000 description 2
- 208000031481 Pathologic Constriction Diseases 0.000 description 2
- 102000035195 Peptidases Human genes 0.000 description 2
- 108091005804 Peptidases Proteins 0.000 description 2
- -1 S100a12 Proteins 0.000 description 2
- HCHKCACWOHOZIP-UHFFFAOYSA-N Zinc Chemical compound [Zn] HCHKCACWOHOZIP-UHFFFAOYSA-N 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 210000001072 colon Anatomy 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 230000004761 fibrosis Effects 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- 230000006698 induction Effects 0.000 description 2
- 210000004347 intestinal mucosa Anatomy 0.000 description 2
- 239000003771 matrix metalloproteinase inhibitor Substances 0.000 description 2
- 229940121386 matrix metalloproteinase inhibitor Drugs 0.000 description 2
- 230000007935 neutral effect Effects 0.000 description 2
- 235000019833 protease Nutrition 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 230000036262 stenosis Effects 0.000 description 2
- 208000037804 stenosis Diseases 0.000 description 2
- 230000007838 tissue remodeling Effects 0.000 description 2
- 239000011701 zinc Substances 0.000 description 2
- 229910052725 zinc Inorganic materials 0.000 description 2
- 101150008694 ANXA1 gene Proteins 0.000 description 1
- 102000004145 Annexin A1 Human genes 0.000 description 1
- 108090000663 Annexin A1 Proteins 0.000 description 1
- 101150093802 CXCL1 gene Proteins 0.000 description 1
- 101150012886 CXCL13 gene Proteins 0.000 description 1
- 101100222383 Homo sapiens CXCL13 gene Proteins 0.000 description 1
- 206010061218 Inflammation Diseases 0.000 description 1
- 101150014058 MMP1 gene Proteins 0.000 description 1
- 101710133727 Phospholipid:diacylglycerol acyltransferase Proteins 0.000 description 1
- 102100029812 Protein S100-A12 Human genes 0.000 description 1
- 108700016890 S100A12 Proteins 0.000 description 1
- 101150097337 S100A12 gene Proteins 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 230000001684 chronic effect Effects 0.000 description 1
- 208000037976 chronic inflammation Diseases 0.000 description 1
- 230000006020 chronic inflammation Effects 0.000 description 1
- 206010009887 colitis Diseases 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013211 curve analysis Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000010195 expression analysis Methods 0.000 description 1
- 125000000524 functional group Chemical group 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 208000010758 granulomatous inflammation Diseases 0.000 description 1
- 210000003405 ileum Anatomy 0.000 description 1
- 230000028993 immune response Effects 0.000 description 1
- 230000008595 infiltration Effects 0.000 description 1
- 238000001764 infiltration Methods 0.000 description 1
- 230000002757 inflammatory effect Effects 0.000 description 1
- 230000004054 inflammatory process Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000000968 intestinal effect Effects 0.000 description 1
- 230000003902 lesion Effects 0.000 description 1
- 210000004698 lymphocyte Anatomy 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000002493 microarray Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000004400 mucous membrane Anatomy 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000004393 prognosis Methods 0.000 description 1
- 238000013441 quality evaluation Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 108700026220 vif Genes Proteins 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/158—Expression markers
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Chemical & Material Sciences (AREA)
- Genetics & Genomics (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Molecular Biology (AREA)
- Organic Chemistry (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Evolutionary Biology (AREA)
- Analytical Chemistry (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Immunology (AREA)
- Bioethics (AREA)
- General Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- Physiology (AREA)
- Artificial Intelligence (AREA)
- Pathology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a device for assisting in judging Crohn disease and ulcerative colitis, which comprises parameter acquisition equipment and a readable carrier; the parameter acquisition device comprises a device for acquiring various parameters involved in the readable carrier; p is recorded on the readable carrier UC = exp (MMPs Scores)/(1+exp (MMPs score)) (1); wherein P is UC The probability of the sample to be tested being predicted as ulcerative colitis; when P UC And when the sample to be tested is less than 0.5, the sample to be tested is Crohn disease. The model constructed in the device of the invention gives up the specific expression value of the MMPs related gene sets, but is based on the binary variable converted by the MMPs related gene sets, thereby better overcoming the problem of batch difference of different chip detection platform sources and having higher clinical use value.
Description
Technical Field
The invention relates to a device for differential diagnosis of Crohn disease and ulcerative colitis based on a binary variable construction model of patient intestinal mucosa gene expression, belonging to the field of biomedical treatment.
Background
Inflammatory bowel disease (inflammatory bowel disease, IBD) causes chronic intestinal inflammation and is associated with significant morbidity as a result of cross-action of genetic and environmental factors affecting immune responses. Crohn's Disease (CD) and ulcerative colitis (Ulcerative colitis, UC) are two major inflammatory bowel diseases. Although CD and UC share some common pathological and clinical characteristics, they differ somewhat, indicating that they are two different disease types. CD is characterized by ulcer rupture and submucosal fibrosis, granulomatous inflammation and submucosal fibrosis. However, the histological findings characteristic of UC are rectal crypt deformation, lymphocyte infiltration, and chronic inflammation, often limited to the lamina propria. Clinically, differential diagnosis of IBD is usually determined by comprehensive assessment of clinical manifestations and endoscopic, histopathological, radiological and laboratory examination results.
Currently, differential diagnosis between CD and UC in IBD colitis patients is critical for a tailored treatment plan, since 2 diseases face different treatments and response mechanisms after diagnosis. However, differential diagnosis of these subtypes remains a significant clinical challenge, as currently there is no single diagnostic gold standard for UC and CD. According to the disclosure, about 5% to 15% of patients do not meet the stringent criteria for UC or CD, and up to 14% of patients experience at least one change in diagnosis of UC or CD. Thus, diagnosis of IBD, particularly when inflammatory lesions are limited to patients of the colon, is still difficult with current methods.
Disclosure of Invention
The invention aims to provide a device and a method for assisting in judging Crohn disease and/or ulcerative colitis.
The invention provides a kit for assisting in judging Crohn's disease and/or ulcerative colitis, which comprises parameter acquisition equipment and a readable carrier;
the parameter acquisition device comprises a device for acquiring various parameters involved in the readable carrier;
the readable carrier has recorded thereon the following formulas (1) - (3),
P UC =exp(MMPs Scores)/(1+exp(MMPs Scores)) (1)
MMPs Scores=-1.3813+[ANXA1×(0.6358)]+[CXCL13×(0.1000)]+[MMP1×(0.2507)]+[CXCL1×(0.4478)](2)
P UC +P CD =1 (3);
wherein P is UC The probability of the sample to be tested being predicted as ulcerative colitis; p (P) CD The probability of being predicted as Crohn's disease for the case under test; ANXA1, CXCL13, MMP1, CXCL1 are binary variables of the ANXA1, CXCL13, MMP1, CXCL1 genes, respectively; if the expression value of the gene in the sample to be tested is larger than the median value of the expression value of the gene in the ulcerative colitis sample, the binary variable of the gene is assigned to be 1; otherwise, the binary variable of the gene is assigned a value of 0;
when P UC When the sample to be detected is more than 0.5, the sample to be detected is ulcerative colitis; when P UC And when the sample to be tested is less than 0.5, the sample to be tested is Crohn disease.
The parameter acquisition equipment is a device for detecting the expression quantity of ANXA1, CXCL13, MMP1 and CXCL1 genes in a sample to be detected.
Wherein the kit further comprises recording means and/or calculating means; the recording means comprises a pen and/or a computer; the computing means comprises a calculator and/or the computer.
Wherein the readable carrier is a kit instruction; the content of formula I is printed on a card.
Wherein the readable carrier is a computer readable carrier.
The median value of the expression values of the genes in the ulcerative colitis samples is obtained by detecting the expression amounts of the genes by using the same detection device for at least 10 ulcerative colitis samples, and the average value of the expression amounts of the ulcerative colitis samples is the median value of the expression values in the ulcerative colitis samples.
The invention also provides a kit for assisting in judging the Crohn's disease and/or ulcerative colitis, which comprises a device for detecting the expression level of ANXA1, a device for detecting the expression level of CXCL13, a device for detecting the expression level of MMP1, a device for detecting the expression level of CXCL1 and a computing device provided with a parameter operation module; the parameter operation module can perform operations of the following formulas (1) - (3):
P UC =exp(MMPs Scores)/(1+exp(MMPs Scores)) (1);
MMPs Scores=-1.3813+[ANXA1×(0.6358)]+[CXCL13×(0.1000)]+[MMP1×(0.2507)]+[CXCL1×(0.4478)](2);
P UC +P CD =1 (3);
wherein P is UC The probability of the sample to be tested being predicted as ulcerative colitis; p (P) CD The probability of being predicted as Crohn's disease for the case under test; ANXA1, CXCL13, MMP1, CXCL1 are binary variables of the ANXA1, CXCL13, MMP1, CXCL1 genes, respectively; if the expression value of the gene in the sample to be detected is larger than the median value of the expression value of the gene in the sample, the binary variable of the gene is assigned to be 1; otherwise, the binary variable of the gene is assigned a value of 0;
when P UC When the sample to be detected is more than 0.5, the sample to be detected is ulcerative colitis; when P UC And when the sample to be tested is less than 0.5, the sample to be tested is Crohn disease.
The use of a system for detecting the expression levels of the ANXA1, CXCL13, MMP1 and CXCL1 genes in the preparation of products for the determination of crohn's disease and ulcerative colitis should also be within the scope of the present invention.
Wherein the system for detecting the expression levels of the ANXA1, CXCL13, MMP1 and CXCL1 genes is (Affymetrix Human Gene 1.0.0 ST Array/Affymetrix Human Genome U Plus 2.0Array/Agilent-014850Whole Human Genome Microarray 4x44K G4112F).
The ANXA1 gene is annexin A1 (nm_ 000700.3); CXCL13 gene C-X-C motif chemokine ligand 13 (NM-001371558.1); MMP1 gene matrix metallopeptidase1 (NM-002421); CXCL1 gene is C-X-C motif chemokine ligand 1 (NM-001511).
The invention provides a method for establishing a model for IBD differential diagnosis by utilizing metalloproteinase family related genes (MMPs-associated genes), and verification results thereof in a plurality of central data queues. Matrix Metalloproteinases (MMPs) are a group of zinc-dependent neutral peptidases that degrade all components of the extracellular matrix (extracellular matrix, ECM), associated with extensive mucosal degradation and tissue remodeling, ultimately contributing to the development of ulcers, fistulae and stenosis, and thus MMPs are an important gene family involved in and regulating the progression of the course of inflammatory bowel disease. To date, there is sufficient evidence that IBD-associated mucosal inflammation is associated with enhanced induction of various MMPs, and that at least 3 clinical trials of matrix metalloproteinase inhibitors have been publicly reported in the context of IBD treatment. Our study showed that the MMPs related gene set is also the main differential gene set between CD and UC. In order to overcome the difference of different source data queue detection platforms, the expression quantity of MMPs related gene sets is converted into binary variables, and based on the binary variables, a differential diagnosis model is established by minimum absolute shrinkage and selection operator (LASSO) logistic regression to distinguish CD and UC. Finally, the patent also verifies the model in the IBD queue meeting the requirements, which is published at present, and achieves better effect. Thus, our diagnostic model provides a promising diagnostic tool, potentially improving clinical practice very quickly.
Advantages of this method include: 1) The establishment and verification of the method integrates the chip data of most of CD and UC reported in the prior art, is very critical to the result of the combination multi-center research of large sample size for IBD diseases with higher heterogeneity, and meanwhile, has not been reported in the prior art on the gene expression model for differential diagnosis of UC and CD; 2) In the method, different technical routes are adopted to carry out integrated analysis on the multi-center IBD queue, so that bias caused by a single integrated data set method is effectively reduced; 3) The evaluation steps of the model strictly follow the current clinical model evaluation guideline TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis), and the evidence belonging to the highest level in the quality evaluation of the guideline is evaluated by distinguishing, calibrating and clinical applicability in different centers and different queues respectively; 4) The constructed model gives up the specific expression value of the MMPs related gene sets, but is based on binary variables converted by the MMPs related gene sets, so that the problem of batch difference of different chip detection platform sources is better solved, and the method has higher clinical use value.
Drawings
FIG. 1 is a diagram of a protein interaction network constructed from a differential gene (differentially expressed genes, DEGs) obtained by screening based on RRA method, and a diagram of an important gene module identified by MCODE.
FIG. 2 is a diagram of a protein interaction network constructed based on data integration of the found DEGs and a diagram of the important gene modules identified by MCODE.
FIG. 3 is a schematic diagram of a process for determining final inclusion model genes based on LASSO regression and cross-validation. The left broken line is the punishment coefficient log (lambda) corresponding to the optimal AUC area determined by cross validation; the right dashed line is the penalty factor log (λ) corresponding to the optimal AUC area+1 standard error.
Fig. 4 is a nomogram drawn based on the build model.
FIG. 5 is a diagram of diagnostic capabilities of the build model in a training queue, including ROC, calibration curve and Decision Curve Analysis (DCA).
Fig. 6 is a graph of diagnostic ability of the build model in a validation queue (GSE 75214), including ROC curves, calibration curves and decision curves.
Fig. 7 is a graph of diagnostic ability of the build model in a validation queue (GSE 179285), including ROC curves, calibration curves and decision curves.
Detailed Description
The following detailed description of the invention is provided in connection with the accompanying drawings that are presented to illustrate the invention and not to limit the scope thereof. The examples provided below are intended as guidelines for further modifications by one of ordinary skill in the art and are not to be construed as limiting the invention in any way.
The experimental methods in the following examples, unless otherwise specified, are conventional methods, and are carried out according to techniques or conditions described in the literature in the field or according to the product specifications. Materials, reagents and the like used in the examples described below are commercially available unless otherwise specified.
Example 1
The invention provides a method for establishing a model for IBD differential diagnosis by utilizing metalloproteinase family related genes (MMPs-associated genes), and verification results thereof in a plurality of central data queues. Matrix Metalloproteinases (MMPs) are a group of zinc-dependent neutral peptidases that degrade all components of the extracellular matrix (extracellular matrix, ECM), associated with extensive mucosal degradation and tissue remodeling, ultimately contributing to the development of ulcers, fistulae and stenosis, and thus MMPs are an important gene family involved in and regulating the progression of the course of inflammatory bowel disease. To date, there is sufficient evidence that IBD-associated mucosal inflammation is associated with enhanced induction of various MMPs, and that at least 3 clinical trials of matrix metalloproteinase inhibitors have been publicly reported in the context of IBD treatment. Our study showed that the MMPs related gene set is also the main differential gene set between CD and UC. In order to overcome the difference of different source data queue detection platforms, the expression quantity of MMPs related gene sets is converted into binary variables, and based on the binary variables, a differential diagnosis model is established by minimum absolute shrinkage and selection operator (LASSO) logistic regression to distinguish CD and UC. Finally, the patent also verifies the model in the IBD queue meeting the requirements, which is published at present, and achieves better effect. Thus, our diagnostic model provides a promising diagnostic tool, potentially improving clinical practice very quickly.
1. Determining and incorporating data sets to be analyzed
By means of a Gene Expression Omnibus (GEO) database (https:// www.ncbi.nlm.nih.gov/GEO /) search, the keywords are as follows: the total 139 data sets were retrieved ("Inflammatory Bowel Diseases" [ MeSH terminals ] OR Inflammatory Bowel Diseases [ All Fields ]) AND "Homo sapiens" [ gargn ] AND ("Expression profiling by array" [ Filter ] AND ("2008/01/01" [ PDAT ]) AND were manually screened according to inclusion criteria of (1) samples with a sample size greater than 15, (2) samples with simultaneous coverage of CD AND UC in the data set, (3) samples from intestinal mucosa of the ileum or colon excluding blood AND other sources, (4) available genetic annotation information, finally 5 different central data sets were included, including GSE75214 (n=59/74, sample size=cd/UC, the following), GSE10616 (n=32/10), GSE36807 (n=13/15), AND GSE9686 (n=11/5).
TABLE 1
2. Integrated analysis of different data sets based on Robust Rank Aggregation (RRA) analysis method
Based on RRA method, we integrated 4 different source data sets (GSE 75214, GSE10616, GSE36807 and GSE 9686), and finally identified differential genes (differentially expressed genes, DEGs) by taking logFC > 0.7 and adjP < 0.05 as standards, and identified 141 differential genes in total. Details are shown in Table 2. The protein interaction network was thus constructed using the String website (https:// cn. String-db. Org /) and Cytoscape software (v3.7.2), and important functional groups were identified by MCODE (molecular complex detection) plug-ins, the major members of which were all the MMP family, see figure 1 (in figure 1A, the genes upregulated in UC are shown orange, the genes upregulated in CD are shown blue, and the most important gene modules identified by the software are shown yellow in figure 1B. The yellow indicates seed genes), including seed genes for MMP1, MMP12, PLAU, MMP9, CXCL1, MMP10, PTGS2, TIMP1, and MMP7, with MMP3 as the group.
TABLE 2
3. Method for carrying out integrated analysis on different data sets based on batch correction and merging
To reduce the bias of RRA methods, another approach was introduced to integrate the data set. Firstly, since GSE10616, GSE36807 and GSE9686 data sets are derived from the same chip platform (GPL 570), batch correction and merging are performed on 3 queues by using SVA packets in R software, the newly generated data sets are named as merged data sets (Combined Datasets), then difference analysis is performed on Combined Datasets and GSE75214 respectively, finally DEGs are identified by using logFC > 0.6 and adjP < 0.1 as standards, and finally intersection sets are taken for DEGs identified by 2 data sets, so that 65 DEGs are obtained in total, see table 3. The PPI network was constructed again according to the above method and the most important gene modules were identified by MCODE, wherein the genes constituting the modules still consisted mainly of MMPs family genes including MMP12, MMP10, MMP3, MMP9, TIMP1, CXCL1, PLAU, S100A9, CXCL13, S100A8, ANXA1 and S100A12, and MMP7 as seed genes, see FIG. 2 (in FIG. 2A, the genes upregulated in UC are shown orange, the genes upregulated in CD are shown blue, the most important gene modules identified by software are shown in yellow. The gene modules are further shown in FIG. 2B, yellow indicates seed genes).
TABLE 3 Table 3
/>
4. Construction of Lasso logistic regression model
Based on the two different technical routes, the related genes of MMPs are considered to be the most important differential gene sets in UC and CD, the gene sets identified by the 2 methods are combined, and 15 genes are obtained after repeated genes are removed: MMP3, MMP1, MMP12, PLAU, MMP9, CXCL1, MMP10, PTGS2, TIMP1, MMP7, CXCL13, S100a12, S100A8, S100A9, and ANXA1.
In order to overcome the problem of model application caused by batch differences between different chip platforms, we performed binary variable conversion on 15 candidate genes: for a gene whose expression is increased in UC, if the expression value of the gene is greater than the median of the expression values of the gene in all samples, then the binary variable for the MMP-related gene is assigned a value of 1; otherwise, the index is defined as 0. For genes whose expression is increased in CD, if the expression value of the gene is less than the median of the expression values of the gene in all samples, the binary variable of the MMP-related gene is assigned a value of 1; otherwise, the exponent is defined as 0. Thus, the expression values of 15 genes were converted from continuous variable to binary variable. For example, for a patient in Combined Datasets, ANXA1, MMP10, CXCL13, TIMP1, MMP3, MMP7, MMP9, S100a12, PLAU, MMP12, S100A9, PTGS2, CXCL1, S100A8 are all genes whose expression levels are up-regulated in UC, their expression levels are 1.9734573,1.9701188,1.1136878,2.8159726,2.7689527,4.7186331,2.0414428,2.1097156,1.7163029,2.1842115,2.4673306,2.9328217,1.6551834,5.2526517,2.4706825, respectively, and their numbers of digits are 3.4117391,3.2046994,3.44135835,5.10064625,4.923122,5.00327205,3.33740685,4.17297635,2.2498484,3.638494,5.400392,3.835166,2.6820964,5.1378286,4.3677868, respectively, and the binary variable of 15 genes is changed to 0,0,0,0,0,0,0,0,0,0,0,0,0,1,0 after conversion.
Combined Datasets is then set to the training set and GSE75214 is set to the validation set to verify the effect of the model. To determine the optimal penalty factor, we performed an 8-fold cross-validation and used the area under the receiver operating characteristic curve (ROC) curve as a performance metric to determine the final model with maximum lambda (optimal AUC corresponds to lambda plus one standard error) as the penalty factor. The cross-validation diagram of the model construction is shown in fig. 3 (the left dashed line is the lambda coefficient corresponding to the maximum AUC, the right dashed line is the lambda coefficient corresponding to the maximum AUC plus a standard error, i.e. the penalty coefficient selected by the present procedure).
The final differential diagnosis model is constructed as follows:
P UC =exp(MMPs Scores)/(1+exp(MMPs Scores)) (1)
MMPs Scores=-1.3813+[ANXA1×(0.6358)]+[CXCL13×(0.1000)]+[MMP1×(0.2507)]+[CXCL1×(0.4478)](2)
P UC +P CD =1 (3)
note that: p (P) UC For calculation from the model, the probability that the case is predicted to be UC, P is calculated because the model is a discriminating model of UC and CD UC +P CD =1, the model is predicted as P CD The probability of (2) may be defined by P UC Indirectly obtaining the product.
For more convenient application of the authentication model, the model is constructed as a nomogram and is shown in fig. 4. In fig. 4 we take the red dots as an example of application. For example, for patients with CXCL13 value of 0, MMP1 value of 1, ANXA1 value of 0, and CXCL1 value of 1, the predictive probability of UC diagnosis is 0.336, while the predictive probability of CD diagnosis is 0.664. Based on a cutoff value of 0.5, the patient was determined to have CD according to the model constructed by the present method.
5. Model evaluation
According to the model, training set (data sets GSE10616, GSE36807 and GSE 9686), validation set 1 (data set GSE 75214) and validation set 2 (data set GSE 179285) were model-constructed as described above, and the constructed model was distinguished (ROC curve), calibration degree (calibration curve) and clinical applicability (DCA curve) were examined, respectively, with the following results:
1. training set data results display: combined Datasets the area under the ROC curve is 0.801, the calibration curve results show a better calibration effect (Sp >0.05, brier score < 0.25), and DCA curves show better clinical compliance (as shown in fig. 5).
2. Verification group 1 data results show: the area under the ROC curve of GSE75214 is 0.811, the calibration curve results show a better calibration effect (Sp >0.05, brier score < 0.25), and the DCA curve shows better clinical compliance (as shown in fig. 6). Meanwhile, the training set data is from the chip platform GPL570, and the verification set data is from the chip platform GPL6244, which shows that the model has good performance on different platforms.
3. Validation set 2 data results presentation: since the data sets are all used for screening genes, a group of newly issued data team GSE179285 columns are selected for model verification, the area under the ROC curve of GSE179285 is 0.751, the calibration curve result shows that the calibration effect is good (Sp >0.05 and Brier score < 0.25), and the DCA curve shows good clinical adaptability (as shown in FIG. 7). Meanwhile, the training set data is from the chip platform GPL570, and the verification set data is from the chip platform GPL6480, which shows that the model has good performance on different platforms.
The present invention is described in detail above. It will be apparent to those skilled in the art that the present invention can be practiced in a wide range of equivalent parameters, concentrations, and conditions without departing from the spirit and scope of the invention and without undue experimentation. While the invention has been described with respect to specific embodiments, it will be appreciated that the invention may be further modified. In general, this application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. The application of some of the basic features may be done in accordance with the scope of the claims that follow.
Claims (7)
1. An auxiliary device for judging Crohn's disease and ulcerative colitis comprises parameter acquisition equipment and a readable carrier;
the parameter acquisition device comprises a device for acquiring various parameters involved in the readable carrier;
the readable carrier has recorded thereon the following formulas (1) - (3),
P UC =exp(MMPs Scores)/(1+exp(MMPs Scores)) (1)
MMPs Scores=-1.3813+[ANXA1×(0.6358)]+[CXCL13×(0.1000)]+[MMP1×(0.2507)]+[CXCL1×(0.4478)](2)
P UC +P CD =1 (3);
wherein P is UC The probability of the sample to be tested being predicted as ulcerative colitis; p (P) CD The probability of being predicted as Crohn's disease for the case under test; ANXA1, CXCL13, MMP1, CXCL1 are binary variables of the ANXA1, CXCL13, MMP1, CXCL1 genes, respectively; if the expression value of the gene in the sample to be tested is larger than the median value of the expression value of the gene in the ulcerative colitis sample, the binary variable of the gene is assigned to be 1; otherwise, the binary variable of the gene is assigned a value of 0;
when P UC When the sample to be detected is more than 0.5, the sample to be detected is ulcerative colitis; when P UC And when the sample to be tested is less than 0.5, the sample to be tested is Crohn disease.
2. The apparatus according to claim 1, wherein: the parameter acquisition equipment is a device for detecting the expression quantity of ANXA1, CXCL13, MMP1 and CXCL1 genes in a sample to be detected.
3. The apparatus according to claim 1 or 2, characterized in that: the apparatus further comprises recording means and/or computing means; the recording means comprises a pen and/or a computer; the computing means comprises a calculator and/or the computer.
4. The apparatus according to claim 1 or 2, characterized in that: the readable carrier is a kit instruction; the content of formula I is printed on a card.
5. The apparatus according to claim 1 or 2, characterized in that: the readable carrier is a computer readable carrier.
6. The apparatus according to claim 1 or 2, characterized in that: the median value of the expression values of the genes in the ulcerative colitis samples is obtained by detecting the gene expression values of at least 10 ulcerative colitis samples by using the same detection device, and the average value of the expression values of the ulcerative colitis samples is obtained as the median value of the expression values in the ulcerative colitis samples.
7. The kit for assisting in judging the Crohn disease and the ulcerative colitis is characterized by comprising a device for detecting the expression level of ANXA1, a device for detecting the expression level of CXCL13, a device for detecting the expression level of MMP1, a device for detecting the expression level of CXCL1 and a computing device provided with a parameter operation module; the parameter operation module can perform operations of the following formulas (1) - (3):
P UC =exp(MMPs Scores)/(1+exp(MMPs Scores)) (1)
MMPs Scores=-1.3813+[ANXA1×(0.6358)]+[CXCL13×(0.1000)]+[MMP1×(0.2507)]+[CXCL1×(0.4478)](2)
P UC +P CD =1 (3);
wherein P is UC The probability of the sample to be tested being predicted as ulcerative colitis; p (P) CD The probability of being predicted as Crohn's disease for the case under test; ANXA1, CXCL13, MMP1, CXCL1 are binary variables of the ANXA1, CXCL13, MMP1, CXCL1 genes, respectively; if the expression value of the gene in the sample to be detected is larger than the median value of the expression value of the gene in the sample, the binary variable of the gene is assigned to be 1; otherwise, the binary variable of the gene is assigned a value of 0;
when P UC When the sample to be detected is more than 0.5, the sample to be detected is ulcerative colitis; when P UC And when the sample to be tested is less than 0.5, the sample to be tested is Crohn disease.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211340533 | 2022-10-28 | ||
CN2022113405335 | 2022-10-28 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116631510A CN116631510A (en) | 2023-08-22 |
CN116631510B true CN116631510B (en) | 2024-01-12 |
Family
ID=87601934
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310559017.XA Active CN116631510B (en) | 2022-10-28 | 2023-05-17 | Device for differential diagnosis of Crohn's disease and ulcerative colitis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116631510B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009192383A (en) * | 2008-02-14 | 2009-08-27 | Kanazawa Univ | Diagnosis and medical treatment of crohn's disease using ergothioneine |
CN105219844A (en) * | 2015-06-08 | 2016-01-06 | 刘宗正 | A kind of compose examination 11 kinds of diseases gene marker combination, test kit and disease risks predictive model |
CN108403711A (en) * | 2017-02-10 | 2018-08-17 | 中国科学院上海生命科学研究院 | A kind of microRNA for detecting and treating inflammatory bowel disease |
CN109994214A (en) * | 2019-04-13 | 2019-07-09 | 中国医学科学院北京协和医院 | The identification model and model building method of Crohn disease and the white plug of intestines |
CN109998488A (en) * | 2019-04-13 | 2019-07-12 | 中国医学科学院北京协和医院 | The identification model and construction method of Crohn disease and enteron aisle ulcer type lymthoma |
CN113744802A (en) * | 2021-08-25 | 2021-12-03 | 聂凯 | Screening method and application of gene marker for predicting Crohn's disease treatment response |
CN114732899A (en) * | 2015-01-09 | 2022-07-12 | 辉瑞公司 | Dosing regimens for MAdCAM antagonists |
CN114974595A (en) * | 2022-05-13 | 2022-08-30 | 江苏省人民医院(南京医科大学第一附属医院) | Crohn's disease patient mucosa healing prediction model and method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
MXPA04005219A (en) * | 2001-11-29 | 2005-06-20 | Greystone Medical Group Inc | Treatment of wounds and compositions employed. |
-
2023
- 2023-05-17 CN CN202310559017.XA patent/CN116631510B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009192383A (en) * | 2008-02-14 | 2009-08-27 | Kanazawa Univ | Diagnosis and medical treatment of crohn's disease using ergothioneine |
CN114732899A (en) * | 2015-01-09 | 2022-07-12 | 辉瑞公司 | Dosing regimens for MAdCAM antagonists |
CN105219844A (en) * | 2015-06-08 | 2016-01-06 | 刘宗正 | A kind of compose examination 11 kinds of diseases gene marker combination, test kit and disease risks predictive model |
CN108403711A (en) * | 2017-02-10 | 2018-08-17 | 中国科学院上海生命科学研究院 | A kind of microRNA for detecting and treating inflammatory bowel disease |
CN109994214A (en) * | 2019-04-13 | 2019-07-09 | 中国医学科学院北京协和医院 | The identification model and model building method of Crohn disease and the white plug of intestines |
CN109998488A (en) * | 2019-04-13 | 2019-07-12 | 中国医学科学院北京协和医院 | The identification model and construction method of Crohn disease and enteron aisle ulcer type lymthoma |
CN113744802A (en) * | 2021-08-25 | 2021-12-03 | 聂凯 | Screening method and application of gene marker for predicting Crohn's disease treatment response |
CN114974595A (en) * | 2022-05-13 | 2022-08-30 | 江苏省人民医院(南京医科大学第一附属医院) | Crohn's disease patient mucosa healing prediction model and method |
Also Published As
Publication number | Publication date |
---|---|
CN116631510A (en) | 2023-08-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7368483B2 (en) | An integrated machine learning framework for estimating homologous recombination defects | |
US10354747B1 (en) | Deep learning analysis pipeline for next generation sequencing | |
Denny et al. | Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data | |
Gautier et al. | Alternative mapping of probes to genes for Affymetrix chips | |
CN110634573A (en) | Clinical cerebral infarction patient recurrence risk early warning scoring visualization model system and evaluation method thereof | |
Gorcenco et al. | New generation genetic testing entering the clinic | |
Lamri et al. | Fine-tuning of genome-wide polygenic risk scores and prediction of gestational diabetes in South Asian women | |
Marker et al. | Homozygous deletion of CDKN2A by fluorescence in situ hybridization is prognostic in grade 4, but not grade 2 or 3, IDH-mutant astrocytomas | |
US20220277811A1 (en) | Detecting False Positive Variant Calls In Next-Generation Sequencing | |
WO2023071877A1 (en) | Prediction model, and evaluation system and method for postoperative recurrence risk of urolithiasis | |
Momozawa et al. | Genome wide association study of 40 clinical measurements in eight dog breeds | |
Bianco et al. | The association between HMGA1 rs146052672 variant and type 2 diabetes: a transethnic meta-analysis | |
CN116200490A (en) | Method for detecting tiny residual focus of solid tumor | |
Yun et al. | Genetic risk score raises the risk of incidence of chronic kidney disease in Korean general population-based cohort | |
Dann et al. | Precise identification of cell states altered in disease using healthy single-cell references | |
CN116631510B (en) | Device for differential diagnosis of Crohn's disease and ulcerative colitis | |
Wang et al. | Systematic benchmarking of imaging spatial transcriptomics platforms in FFPE tissues | |
CN117079723A (en) | Biomarker and diagnostic model related to amyotrophic lateral sclerosis and application of biomarker and diagnostic model | |
CN109182490B (en) | LRSAM1 gene SNP mutation site typing primer and application thereof in coronary heart disease prediction | |
US20220267837A1 (en) | Methods for identifying carrier status and assessing risk for spinal muscular atrophy | |
Steuerman et al. | Exploiting gene-expression deconvolution to probe the genetics of the immune system | |
CN114566213A (en) | Single-parent diploid analysis method and system for family high-throughput sequencing data | |
CN112820410A (en) | Clinical cerebral infarction patient recurrence risk early warning scoring visualization model system and evaluation method thereof | |
KR20220075700A (en) | Type 2 diabetes mellitus prediction system using genome-wide Polygenic Risk Score | |
Lindemann et al. | A low-cost sequencing platform for rapid genotyping in ADPKD and its impact on clinical care |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |