CN116631510B

CN116631510B - Device for differential diagnosis of Crohn's disease and ulcerative colitis

Info

Publication number: CN116631510B
Application number: CN202310559017.XA
Authority: CN
Inventors: 邓江; 张艳宇; 赵宁; 吕丽萍; 马平; 张阳阳
Original assignee: Academy of Military Medical Sciences AMMS of PLA
Current assignee: Academy of Military Medical Sciences AMMS of PLA
Priority date: 2022-10-28
Filing date: 2023-05-17
Publication date: 2024-01-12
Anticipated expiration: 2043-05-17
Also published as: CN116631510A

Abstract

The invention discloses a device for assisting in judging Crohn disease and ulcerative colitis, which comprises parameter acquisition equipment and a readable carrier; the parameter acquisition device comprises a device for acquiring various parameters involved in the readable carrier; p is recorded on the readable carrier _UC = exp (MMPs Scores)/(1+exp (MMPs score)) (1); wherein P is _UC The probability of the sample to be tested being predicted as ulcerative colitis; when P _UC And when the sample to be tested is less than 0.5, the sample to be tested is Crohn disease. The model constructed in the device of the invention gives up the specific expression value of the MMPs related gene sets, but is based on the binary variable converted by the MMPs related gene sets, thereby better overcoming the problem of batch difference of different chip detection platform sources and having higher clinical use value.

Description

Device for differential diagnosis of Crohn's disease and ulcerative colitis

Technical Field

The invention relates to a device for differential diagnosis of Crohn disease and ulcerative colitis based on a binary variable construction model of patient intestinal mucosa gene expression, belonging to the field of biomedical treatment.

Background

Inflammatory bowel disease (inflammatory bowel disease, IBD) causes chronic intestinal inflammation and is associated with significant morbidity as a result of cross-action of genetic and environmental factors affecting immune responses. Crohn's Disease (CD) and ulcerative colitis (Ulcerative colitis, UC) are two major inflammatory bowel diseases. Although CD and UC share some common pathological and clinical characteristics, they differ somewhat, indicating that they are two different disease types. CD is characterized by ulcer rupture and submucosal fibrosis, granulomatous inflammation and submucosal fibrosis. However, the histological findings characteristic of UC are rectal crypt deformation, lymphocyte infiltration, and chronic inflammation, often limited to the lamina propria. Clinically, differential diagnosis of IBD is usually determined by comprehensive assessment of clinical manifestations and endoscopic, histopathological, radiological and laboratory examination results.

Currently, differential diagnosis between CD and UC in IBD colitis patients is critical for a tailored treatment plan, since 2 diseases face different treatments and response mechanisms after diagnosis. However, differential diagnosis of these subtypes remains a significant clinical challenge, as currently there is no single diagnostic gold standard for UC and CD. According to the disclosure, about 5% to 15% of patients do not meet the stringent criteria for UC or CD, and up to 14% of patients experience at least one change in diagnosis of UC or CD. Thus, diagnosis of IBD, particularly when inflammatory lesions are limited to patients of the colon, is still difficult with current methods.

Disclosure of Invention

The invention aims to provide a device and a method for assisting in judging Crohn disease and/or ulcerative colitis.

The invention provides a kit for assisting in judging Crohn's disease and/or ulcerative colitis, which comprises parameter acquisition equipment and a readable carrier;

the parameter acquisition device comprises a device for acquiring various parameters involved in the readable carrier;

the readable carrier has recorded thereon the following formulas (1) - (3),

P _UC ＝exp(MMPs Scores)/(1+exp(MMPs Scores)) (1)

MMPs Scores＝-1.3813+[ANXA1×(0.6358)]+[CXCL13×(0.1000)]+[MMP1×(0.2507)]+[CXCL1×(0.4478)](2)

P _UC +P _CD ＝1 (3)；

wherein P is _UC The probability of the sample to be tested being predicted as ulcerative colitis; p (P) _CD The probability of being predicted as Crohn's disease for the case under test; ANXA1, CXCL13, MMP1, CXCL1 are binary variables of the ANXA1, CXCL13, MMP1, CXCL1 genes, respectively; if the expression value of the gene in the sample to be tested is larger than the median value of the expression value of the gene in the ulcerative colitis sample, the binary variable of the gene is assigned to be 1; otherwise, the binary variable of the gene is assigned a value of 0;

when P _UC When the sample to be detected is more than 0.5, the sample to be detected is ulcerative colitis; when P _UC And when the sample to be tested is less than 0.5, the sample to be tested is Crohn disease.

The parameter acquisition equipment is a device for detecting the expression quantity of ANXA1, CXCL13, MMP1 and CXCL1 genes in a sample to be detected.

Wherein the kit further comprises recording means and/or calculating means; the recording means comprises a pen and/or a computer; the computing means comprises a calculator and/or the computer.

Wherein the readable carrier is a kit instruction; the content of formula I is printed on a card.

Wherein the readable carrier is a computer readable carrier.

The median value of the expression values of the genes in the ulcerative colitis samples is obtained by detecting the expression amounts of the genes by using the same detection device for at least 10 ulcerative colitis samples, and the average value of the expression amounts of the ulcerative colitis samples is the median value of the expression values in the ulcerative colitis samples.

The invention also provides a kit for assisting in judging the Crohn's disease and/or ulcerative colitis, which comprises a device for detecting the expression level of ANXA1, a device for detecting the expression level of CXCL13, a device for detecting the expression level of MMP1, a device for detecting the expression level of CXCL1 and a computing device provided with a parameter operation module; the parameter operation module can perform operations of the following formulas (1) - (3):

P _UC ＝exp(MMPs Scores)/(1+exp(MMPs Scores)) (1)；

MMPs Scores＝-1.3813+[ANXA1×(0.6358)]+[CXCL13×(0.1000)]+[MMP1×(0.2507)]+[CXCL1×(0.4478)](2)；

P _UC +P _CD ＝1 (3)；

wherein P is _UC The probability of the sample to be tested being predicted as ulcerative colitis; p (P) _CD The probability of being predicted as Crohn's disease for the case under test; ANXA1, CXCL13, MMP1, CXCL1 are binary variables of the ANXA1, CXCL13, MMP1, CXCL1 genes, respectively; if the expression value of the gene in the sample to be detected is larger than the median value of the expression value of the gene in the sample, the binary variable of the gene is assigned to be 1; otherwise, the binary variable of the gene is assigned a value of 0;

The use of a system for detecting the expression levels of the ANXA1, CXCL13, MMP1 and CXCL1 genes in the preparation of products for the determination of crohn's disease and ulcerative colitis should also be within the scope of the present invention.

Wherein the system for detecting the expression levels of the ANXA1, CXCL13, MMP1 and CXCL1 genes is (Affymetrix Human Gene 1.0.0 ST Array/Affymetrix Human Genome U Plus 2.0Array/Agilent-014850Whole Human Genome Microarray 4x44K G4112F).

The ANXA1 gene is annexin A1 (nm_ 000700.3); CXCL13 gene C-X-C motif chemokine ligand 13 (NM-001371558.1); MMP1 gene matrix metallopeptidase1 (NM-002421); CXCL1 gene is C-X-C motif chemokine ligand 1 (NM-001511).

The invention provides a method for establishing a model for IBD differential diagnosis by utilizing metalloproteinase family related genes (MMPs-associated genes), and verification results thereof in a plurality of central data queues. Matrix Metalloproteinases (MMPs) are a group of zinc-dependent neutral peptidases that degrade all components of the extracellular matrix (extracellular matrix, ECM), associated with extensive mucosal degradation and tissue remodeling, ultimately contributing to the development of ulcers, fistulae and stenosis, and thus MMPs are an important gene family involved in and regulating the progression of the course of inflammatory bowel disease. To date, there is sufficient evidence that IBD-associated mucosal inflammation is associated with enhanced induction of various MMPs, and that at least 3 clinical trials of matrix metalloproteinase inhibitors have been publicly reported in the context of IBD treatment. Our study showed that the MMPs related gene set is also the main differential gene set between CD and UC. In order to overcome the difference of different source data queue detection platforms, the expression quantity of MMPs related gene sets is converted into binary variables, and based on the binary variables, a differential diagnosis model is established by minimum absolute shrinkage and selection operator (LASSO) logistic regression to distinguish CD and UC. Finally, the patent also verifies the model in the IBD queue meeting the requirements, which is published at present, and achieves better effect. Thus, our diagnostic model provides a promising diagnostic tool, potentially improving clinical practice very quickly.

Advantages of this method include: 1) The establishment and verification of the method integrates the chip data of most of CD and UC reported in the prior art, is very critical to the result of the combination multi-center research of large sample size for IBD diseases with higher heterogeneity, and meanwhile, has not been reported in the prior art on the gene expression model for differential diagnosis of UC and CD; 2) In the method, different technical routes are adopted to carry out integrated analysis on the multi-center IBD queue, so that bias caused by a single integrated data set method is effectively reduced; 3) The evaluation steps of the model strictly follow the current clinical model evaluation guideline TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis), and the evidence belonging to the highest level in the quality evaluation of the guideline is evaluated by distinguishing, calibrating and clinical applicability in different centers and different queues respectively; 4) The constructed model gives up the specific expression value of the MMPs related gene sets, but is based on binary variables converted by the MMPs related gene sets, so that the problem of batch difference of different chip detection platform sources is better solved, and the method has higher clinical use value.

Drawings

FIG. 1 is a diagram of a protein interaction network constructed from a differential gene (differentially expressed genes, DEGs) obtained by screening based on RRA method, and a diagram of an important gene module identified by MCODE.

FIG. 2 is a diagram of a protein interaction network constructed based on data integration of the found DEGs and a diagram of the important gene modules identified by MCODE.

FIG. 3 is a schematic diagram of a process for determining final inclusion model genes based on LASSO regression and cross-validation. The left broken line is the punishment coefficient log (lambda) corresponding to the optimal AUC area determined by cross validation; the right dashed line is the penalty factor log (λ) corresponding to the optimal AUC area+1 standard error.

Fig. 4 is a nomogram drawn based on the build model.

FIG. 5 is a diagram of diagnostic capabilities of the build model in a training queue, including ROC, calibration curve and Decision Curve Analysis (DCA).

Fig. 6 is a graph of diagnostic ability of the build model in a validation queue (GSE 75214), including ROC curves, calibration curves and decision curves.

Fig. 7 is a graph of diagnostic ability of the build model in a validation queue (GSE 179285), including ROC curves, calibration curves and decision curves.

Detailed Description

The following detailed description of the invention is provided in connection with the accompanying drawings that are presented to illustrate the invention and not to limit the scope thereof. The examples provided below are intended as guidelines for further modifications by one of ordinary skill in the art and are not to be construed as limiting the invention in any way.

The experimental methods in the following examples, unless otherwise specified, are conventional methods, and are carried out according to techniques or conditions described in the literature in the field or according to the product specifications. Materials, reagents and the like used in the examples described below are commercially available unless otherwise specified.

Example 1

1. Determining and incorporating data sets to be analyzed

By means of a Gene Expression Omnibus (GEO) database (https:// www.ncbi.nlm.nih.gov/GEO /) search, the keywords are as follows: the total 139 data sets were retrieved ("Inflammatory Bowel Diseases" [ MeSH terminals ] OR Inflammatory Bowel Diseases [ All Fields ]) AND "Homo sapiens" [ gargn ] AND ("Expression profiling by array" [ Filter ] AND ("2008/01/01" [ PDAT ]) AND were manually screened according to inclusion criteria of (1) samples with a sample size greater than 15, (2) samples with simultaneous coverage of CD AND UC in the data set, (3) samples from intestinal mucosa of the ileum or colon excluding blood AND other sources, (4) available genetic annotation information, finally 5 different central data sets were included, including GSE75214 (n=59/74, sample size=cd/UC, the following), GSE10616 (n=32/10), GSE36807 (n=13/15), AND GSE9686 (n=11/5).

TABLE 1

2. Integrated analysis of different data sets based on Robust Rank Aggregation (RRA) analysis method

Based on RRA method, we integrated 4 different source data sets (GSE 75214, GSE10616, GSE36807 and GSE 9686), and finally identified differential genes (differentially expressed genes, DEGs) by taking logFC > 0.7 and adjP < 0.05 as standards, and identified 141 differential genes in total. Details are shown in Table 2. The protein interaction network was thus constructed using the String website (https:// cn. String-db. Org /) and Cytoscape software (v3.7.2), and important functional groups were identified by MCODE (molecular complex detection) plug-ins, the major members of which were all the MMP family, see figure 1 (in figure 1A, the genes upregulated in UC are shown orange, the genes upregulated in CD are shown blue, and the most important gene modules identified by the software are shown yellow in figure 1B. The yellow indicates seed genes), including seed genes for MMP1, MMP12, PLAU, MMP9, CXCL1, MMP10, PTGS2, TIMP1, and MMP7, with MMP3 as the group.

TABLE 2

3. Method for carrying out integrated analysis on different data sets based on batch correction and merging

To reduce the bias of RRA methods, another approach was introduced to integrate the data set. Firstly, since GSE10616, GSE36807 and GSE9686 data sets are derived from the same chip platform (GPL 570), batch correction and merging are performed on 3 queues by using SVA packets in R software, the newly generated data sets are named as merged data sets (Combined Datasets), then difference analysis is performed on Combined Datasets and GSE75214 respectively, finally DEGs are identified by using logFC > 0.6 and adjP < 0.1 as standards, and finally intersection sets are taken for DEGs identified by 2 data sets, so that 65 DEGs are obtained in total, see table 3. The PPI network was constructed again according to the above method and the most important gene modules were identified by MCODE, wherein the genes constituting the modules still consisted mainly of MMPs family genes including MMP12, MMP10, MMP3, MMP9, TIMP1, CXCL1, PLAU, S100A9, CXCL13, S100A8, ANXA1 and S100A12, and MMP7 as seed genes, see FIG. 2 (in FIG. 2A, the genes upregulated in UC are shown orange, the genes upregulated in CD are shown blue, the most important gene modules identified by software are shown in yellow. The gene modules are further shown in FIG. 2B, yellow indicates seed genes).

TABLE 3 Table 3

/>

4. Construction of Lasso logistic regression model

Based on the two different technical routes, the related genes of MMPs are considered to be the most important differential gene sets in UC and CD, the gene sets identified by the 2 methods are combined, and 15 genes are obtained after repeated genes are removed: MMP3, MMP1, MMP12, PLAU, MMP9, CXCL1, MMP10, PTGS2, TIMP1, MMP7, CXCL13, S100a12, S100A8, S100A9, and ANXA1.

In order to overcome the problem of model application caused by batch differences between different chip platforms, we performed binary variable conversion on 15 candidate genes: for a gene whose expression is increased in UC, if the expression value of the gene is greater than the median of the expression values of the gene in all samples, then the binary variable for the MMP-related gene is assigned a value of 1; otherwise, the index is defined as 0. For genes whose expression is increased in CD, if the expression value of the gene is less than the median of the expression values of the gene in all samples, the binary variable of the MMP-related gene is assigned a value of 1; otherwise, the exponent is defined as 0. Thus, the expression values of 15 genes were converted from continuous variable to binary variable. For example, for a patient in Combined Datasets, ANXA1, MMP10, CXCL13, TIMP1, MMP3, MMP7, MMP9, S100a12, PLAU, MMP12, S100A9, PTGS2, CXCL1, S100A8 are all genes whose expression levels are up-regulated in UC, their expression levels are 1.9734573,1.9701188,1.1136878,2.8159726,2.7689527,4.7186331,2.0414428,2.1097156,1.7163029,2.1842115,2.4673306,2.9328217,1.6551834,5.2526517,2.4706825, respectively, and their numbers of digits are 3.4117391,3.2046994,3.44135835,5.10064625,4.923122,5.00327205,3.33740685,4.17297635,2.2498484,3.638494,5.400392,3.835166,2.6820964,5.1378286,4.3677868, respectively, and the binary variable of 15 genes is changed to 0,0,0,0,0,0,0,0,0,0,0,0,0,1,0 after conversion.

Combined Datasets is then set to the training set and GSE75214 is set to the validation set to verify the effect of the model. To determine the optimal penalty factor, we performed an 8-fold cross-validation and used the area under the receiver operating characteristic curve (ROC) curve as a performance metric to determine the final model with maximum lambda (optimal AUC corresponds to lambda plus one standard error) as the penalty factor. The cross-validation diagram of the model construction is shown in fig. 3 (the left dashed line is the lambda coefficient corresponding to the maximum AUC, the right dashed line is the lambda coefficient corresponding to the maximum AUC plus a standard error, i.e. the penalty coefficient selected by the present procedure).

The final differential diagnosis model is constructed as follows:

P _UC ＝exp(MMPs Scores)/(1+exp(MMPs Scores)) (1)

P _UC +P _CD ＝1 (3)

note that: p (P) _UC For calculation from the model, the probability that the case is predicted to be UC, P is calculated because the model is a discriminating model of UC and CD _UC +P _CD =1, the model is predicted as P _CD The probability of (2) may be defined by P _UC Indirectly obtaining the product.

For more convenient application of the authentication model, the model is constructed as a nomogram and is shown in fig. 4. In fig. 4 we take the red dots as an example of application. For example, for patients with CXCL13 value of 0, MMP1 value of 1, ANXA1 value of 0, and CXCL1 value of 1, the predictive probability of UC diagnosis is 0.336, while the predictive probability of CD diagnosis is 0.664. Based on a cutoff value of 0.5, the patient was determined to have CD according to the model constructed by the present method.

5. Model evaluation

According to the model, training set (data sets GSE10616, GSE36807 and GSE 9686), validation set 1 (data set GSE 75214) and validation set 2 (data set GSE 179285) were model-constructed as described above, and the constructed model was distinguished (ROC curve), calibration degree (calibration curve) and clinical applicability (DCA curve) were examined, respectively, with the following results:

1. training set data results display: combined Datasets the area under the ROC curve is 0.801, the calibration curve results show a better calibration effect (Sp >0.05, brier score < 0.25), and DCA curves show better clinical compliance (as shown in fig. 5).

2. Verification group 1 data results show: the area under the ROC curve of GSE75214 is 0.811, the calibration curve results show a better calibration effect (Sp >0.05, brier score < 0.25), and the DCA curve shows better clinical compliance (as shown in fig. 6). Meanwhile, the training set data is from the chip platform GPL570, and the verification set data is from the chip platform GPL6244, which shows that the model has good performance on different platforms.

3. Validation set 2 data results presentation: since the data sets are all used for screening genes, a group of newly issued data team GSE179285 columns are selected for model verification, the area under the ROC curve of GSE179285 is 0.751, the calibration curve result shows that the calibration effect is good (Sp >0.05 and Brier score < 0.25), and the DCA curve shows good clinical adaptability (as shown in FIG. 7). Meanwhile, the training set data is from the chip platform GPL570, and the verification set data is from the chip platform GPL6480, which shows that the model has good performance on different platforms.

The present invention is described in detail above. It will be apparent to those skilled in the art that the present invention can be practiced in a wide range of equivalent parameters, concentrations, and conditions without departing from the spirit and scope of the invention and without undue experimentation. While the invention has been described with respect to specific embodiments, it will be appreciated that the invention may be further modified. In general, this application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. The application of some of the basic features may be done in accordance with the scope of the claims that follow.

Claims

1. An auxiliary device for judging Crohn's disease and ulcerative colitis comprises parameter acquisition equipment and a readable carrier;

the readable carrier has recorded thereon the following formulas (1) - (3),

P _UC ＝exp(MMPs Scores)/(1+exp(MMPs Scores)) (1)

P _UC +P _CD ＝1 (3)；

2. The apparatus according to claim 1, wherein: the parameter acquisition equipment is a device for detecting the expression quantity of ANXA1, CXCL13, MMP1 and CXCL1 genes in a sample to be detected.

3. The apparatus according to claim 1 or 2, characterized in that: the apparatus further comprises recording means and/or computing means; the recording means comprises a pen and/or a computer; the computing means comprises a calculator and/or the computer.

4. The apparatus according to claim 1 or 2, characterized in that: the readable carrier is a kit instruction; the content of formula I is printed on a card.

5. The apparatus according to claim 1 or 2, characterized in that: the readable carrier is a computer readable carrier.

6. The apparatus according to claim 1 or 2, characterized in that: the median value of the expression values of the genes in the ulcerative colitis samples is obtained by detecting the gene expression values of at least 10 ulcerative colitis samples by using the same detection device, and the average value of the expression values of the ulcerative colitis samples is obtained as the median value of the expression values in the ulcerative colitis samples.

7. The kit for assisting in judging the Crohn disease and the ulcerative colitis is characterized by comprising a device for detecting the expression level of ANXA1, a device for detecting the expression level of CXCL13, a device for detecting the expression level of MMP1, a device for detecting the expression level of CXCL1 and a computing device provided with a parameter operation module; the parameter operation module can perform operations of the following formulas (1) - (3):

P _UC ＝exp(MMPs Scores)/(1+exp(MMPs Scores)) (1)

P _UC +P _CD ＝1 (3)；