CN115954107A

CN115954107A - Method and device for analyzing clinical examination data of primary biliary cholangitis

Info

Publication number: CN115954107A
Application number: CN202211642178.7A
Authority: CN
Inventors: 赵丹彤; 赵艳
Original assignee: Beijing Youan Hospital
Current assignee: Beijing Youan Hospital
Priority date: 2022-12-20
Filing date: 2022-12-20
Publication date: 2023-04-11
Anticipated expiration: 2042-12-20
Also published as: CN115954107B

Abstract

The embodiment of the invention discloses a method and a device for analyzing primary biliary cholangitis clinical examination data. The method comprises the following steps: acquiring clinical data of a primary biliary cholangitis patient in an illness state progress process from a preset database, and screening the clinical data based on preset key index item data to obtain sample clinical data; clustering the test data in the sample clinical data by stages by adopting a preset hierarchical clustering algorithm to obtain a plurality of clustering results; and carrying out statistical analysis on the sample clinical data corresponding to the test data in each clustering result to obtain a target analysis result. The technical scheme of the embodiment of the invention solves the problems of less application and insufficient depth of clinical data analysis of patients with primary biliary cholangitis, can realize the full mining and analysis of the clinical data of PBC patients, and provides data support for clinical performance classification and prognosis judgment of PBC patients.

Description

Method and device for analyzing clinical examination data of primary biliary cholangitis

Technical Field

The embodiment of the invention relates to the technical field of clinical data processing, in particular to a method and a device for analyzing clinical examination data of primary biliary cholangitis.

Background

Primary Biliary Cholangitis (PBC) is an autoimmune liver disease of unknown cause. The PBC patient group is mainly female in middle and old age, and the disease is not limited by regions and ethnic groups. Abnormalities in the values of relevant indices in clinical tests of PBC patients occur, for example, elevation of serum alkaline phosphatase (ALP), elevation of aspartate Aminotransferase (AST) and alanine Aminotransferase (ALT), elevation of serum immunoglobulins, elevation of mainly immunoglobulin M (IgM) and positive serum anti-mitochondrial antibodies (AMAs), etc. Among them, AMAs are marker antibodies for PBC serodiagnosis, and antinuclear antibodies (ANA) can be detected in serum by indirect immunofluorescence, and these clinical data may have diagnostic and prognostic value. If the disease condition and prognosis diversity of PBC patients can be determined, accurate prediction or assessment of PBC patient prognosis has important significance for clinical follow-up of further treatment. However, no simple and effective method for distinguishing clinical characteristics and accurately judging prognosis of PBC patients is available except for PBC clinical staging and pathological staging.

Disclosure of Invention

The invention provides a method, a device, equipment and a medium for analyzing primary biliary cholangitis clinical examination data, which can fully mine and analyze the clinical data of PBC patients and provide data support for clinical performance classification and prognosis judgment of PBC patients.

According to an aspect of the present invention, there is provided a method for analyzing clinical test data of primary biliary cholangitis, the method comprising:

acquiring clinical data of a primary biliary cholangitis patient in an illness state progress process from a preset database, and screening the clinical data based on preset key index item data to obtain sample clinical data;

clustering the test data in the sample clinical data by stages by adopting a preset hierarchical clustering algorithm to obtain a plurality of clustering results;

and carrying out statistical analysis on the sample clinical data corresponding to the test data in each clustering result to obtain a target analysis result.

According to another aspect of the present invention, there is provided a primary biliary cholangitis clinical examination data analysis apparatus, including:

the system comprises a sample data acquisition module, a data processing module and a data processing module, wherein the sample data acquisition module is used for acquiring clinical data of a primary biliary cholangitis patient in an illness state progress process from a preset database, and screening the clinical data based on preset key index item data to obtain sample clinical data;

the sample data clustering module is used for clustering the test data in the sample clinical data by stages by adopting a preset hierarchical clustering algorithm to obtain a plurality of clustering results;

and the sample data analysis module is used for carrying out statistical analysis on the sample clinical data corresponding to the test data in each clustering result to obtain a target analysis result.

According to another aspect of the present invention, there is provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the primary biliary cholangitis clinical examination data analysis method according to any one of the embodiments of the present invention.

According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement the primary biliary cholangitis clinical examination data analysis method according to any one of the embodiments of the present invention when executed.

According to the technical scheme of the embodiment of the invention, clinical data of the PBC patient in the disease progress process are obtained from the preset database, and the clinical data are screened based on the preset key index data to obtain sample clinical data; clustering the test data in the sample clinical data by stages by adopting a preset hierarchical clustering algorithm to obtain a plurality of clustering results; and carrying out statistical analysis on the sample clinical data corresponding to the test data in each clustering result to obtain a target analysis result. The technical scheme of the embodiment of the invention solves the problems of less application and insufficient depth of the PBC patient clinical data analysis, can realize the full mining and analysis of the PBC patient clinical data, and provides data support for the PBC patient clinical performance classification and prognosis judgment.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart of a method for analyzing clinical examination data of primary biliary cholangitis according to an embodiment of the present invention;

FIG. 2 is a flow chart of another method for analyzing clinical test data of primary biliary cholangitis, provided by an embodiment of the present invention;

FIG. 3 is a flow chart of another method for analyzing clinical examination data of primary biliary cholangitis, according to an embodiment of the present invention;

FIG. 4 is a flow chart of a specific method for analyzing clinical examination data of primary biliary cholangitis according to an embodiment of the present invention;

FIG. 5 is a diagram of a specific statistical analysis method for primary biliary cholangitis clinical data according to an embodiment of the present invention;

fig. 6 is a block diagram of a data analysis apparatus for clinical examination of primary biliary cholangitis according to an embodiment of the present invention;

fig. 7 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.

It should be noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of this invention and the above-described drawings are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In the technical scheme of the embodiment of the invention, the aspects of acquisition, collection, updating, analysis, processing, use, transmission, storage and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and the personal information is used for legal purposes without violating the good customs of the public order. Necessary measures are taken for the personal information of the user, the illegal access to the personal information data of the user is prevented, and the personal information safety, the network safety and the national safety of the user are maintained.

Fig. 1 is a flowchart of a method for analyzing primary biliary cholangitis clinical test data according to an embodiment of the present invention, which is applicable to a PBC clinical test data analysis scenario, and is more applicable to a case where PBC clinical test data analysis is implemented based on clinical data and disease progression. The method can be executed by a primary biliary cholangitis clinical examination data analysis device, and the primary biliary cholangitis clinical examination data analysis device can be realized in a hardware and/or software mode and can also be configured in electronic equipment.

As shown in fig. 1, the method for analyzing clinical examination data of primary biliary cholangitis includes the following steps:

s110, obtaining clinical data of the primary biliary cholangitis patient in the disease progress process from a preset database, and screening the clinical data based on preset key index item data to obtain sample clinical data.

Wherein the preset database is used for storing data generated at each stage of the medical activity. The preset database may be associated with a Hospital Information System (HIS) and/or a Laboratory Information System (LIS) to collect, store, process, extract, transmit, and summarize data generated during each stage of a medical activity, thereby providing comprehensive automated management and various services for the overall operation of the Hospital. The clinical data includes clinical symptom descriptive data, patient characteristic attribute data, and test data.

The test data includes the clinical test results of PBC patients in the course of each disease course, for example, the test results of autoantibodies and the like. Autoantibodies are antibodies to self-tissues, organs, cells and cellular components, PBC-related autoantibodies include ANA, AMA and/or AMA-M2, ACA and/or anti-CENP-B antibodies, anti-gp 210 antibodies, anti-sp 100 antibodies, anti-Ro 52 antibodies, anti-SSA antibodies, anti-SSB antibodies and the like. The detection results of autoantibodies include qualitative results (positive or negative) and semi-quantitative or quantitative values converted from antibody titers or concentrations.

Wherein the preset key index items comprise a plurality of preset categories of antinuclear antibodies relevant to PBC diagnosis and prognosis. Specifically, the predetermined key indicators are Anti-nuclear antibody (ANA), anti-mitochondrial antibody (AMA) and/or Anti-AMA-M2 antibody, anti-Centromere antibody (ACA) and/or Anti-Centromere protein B (CENP-B) antibody, anti-Ro 52 antibody, anti-SSA antibody, anti-SSB antibody (Anti-monokaryon antibody), anti-Smith (Sm) antibody, anti-ribonucleoprotein (nuclear ribonucleoprotein, nRNP) antibody, anti-double-stranded DNA (dsDNA) antibody, anti-ribosomal P protein (Rib) antibody, anti-histone (His) antibody, anti-nk antibody, anti-Scl-70 antibody, anti-Jol antibody, anti-210 antibody, anti-sp 100 antibody, anti-Soluble Liver Antigen (SLA) antibody, anti-hepatorenal type 1 antibody (microsomal type 1) antibody and Anti-cytoplasmic cell gp 1 antibody (lkp) antibody, 19 kind of autologous antibody. Antinuclear antibodies are a generic term for autoantibodies targeting various components of eukaryotic cells, and are total ANA detected by indirect immunofluorescence. ANA has multiple target antigens, each corresponding to a different autoantibody, forming an antinuclear antibody profile. Antinuclear antibody spectra are subject to antinuclear antibody total antibodies, and ANA positivity does not represent antibodies that must be positive in the existing antinuclear antibody spectra. ANA, AMA and ACA were detected by indirect immunofluorescence, and the remaining antibodies were detected by immunoblotting and enzyme linked immunosorbent assay (ELISA).

Further, screening the clinical data based on the preset key index item data to obtain sample clinical data, which comprises the following steps: and selecting data which contains all preset key index items and has effective data of all the preset key index items as sample clinical data in the clinical data.

Specifically, firstly, test results of blood routine, biochemical indexes, virological markers, autoantibodies and the like in each disease progression process of a PBC patient are obtained from HIS and/or LIS; and then, selecting data which contains all preset key index items and is effective in the data of all the preset key index items from the clinical data as sample clinical data based on the preset key index item data.

And S120, clustering the test data in the sample clinical data by stages by adopting a preset hierarchical clustering algorithm to obtain a plurality of clustering results.

Wherein the preset hierarchical clustering algorithm is a BIRCH two-step clustering method. The BIRCH two-step clustering method is an improvement on a BIRCH algorithm, adds a mechanism for automatically determining the number of clusters, and is used for clustering various attribute data sets. A hierarchical nested cluster tree is created by calculating the similarity between different types of data points, the original data points of different types are the lowest layer of the tree, and the top layer of the tree is a root node of a cluster.

Specifically, the BIRCH two-step clustering method is divided into two stages:

1. a pre-clustering (pre-clustering) stage.

Specifically, data points corresponding to test data in sample clinical data are read one by one, and a Cluster Feature tree (Cluster Feature tree/CF tree) is generated, and meanwhile, data points in a dense area are clustered in advance to form a plurality of sub-clusters.

2. A clustering (pre-clustering) stage.

According to the result of the pre-clustering stage, the sub-clusters are combined by using an aggregation method until the target cluster number is reached.

Specifically, first, the sample clinical data is staged, for example, the sample clinical data may be staged according to the distance, density, connectivity, or the like between the sample clinical data; and then, inputting the test data in the staged sample clinical data into a preset hierarchical clustering algorithm for clustering to obtain a clustering result of the test data in the corresponding stage.

S130, carrying out statistical analysis on the sample clinical data corresponding to the test data in each clustering result to obtain a target analysis result.

The clinical data may be clinical diagnostic results related to the sample test data in the clustered results, and may be, for example, clinical data such as blood routine, biochemical markers, and virological markers. Wherein the biochemical indexes comprise liver function indexes such as total protein, albumin, globulin, white-to-globulin ratio, total bilirubin, transaminase, direct bilirubin and indirect bilirubin, and blood lipid indexes such as total cholesterol, triglyceride, high density lipoprotein, apolipoprotein, fasting blood glucose, kidney function, uric acid, lactate dehydrogenase and creatine creatinase; the virological markers comprise indexes such as hepatitis A antibody, hepatitis B antibody, hepatitis C antibody, hepatitis E antibody and the like.

The statistical analysis comprises establishing a mathematical model by using a mathematical mode, and carrying out mathematical statistics and analysis on data. The statistical analysis method comprises the following steps: frequency analysis, data exploration, cross-table analysis, chi-square test, T-test, analysis of variance, regression analysis, and factor analysis.

Specifically, one or more statistical analysis methods are selected according to the characteristics of sample clinical data corresponding to the test data in the clustering result, a mathematical model is established by using a mathematical mode, mathematical statistics and analysis are carried out on the data to obtain a target analysis result, the clinical data and the disease characteristics can be combined, and the relationship between the test data and the clinical data can be better reflected.

According to the technical scheme of the embodiment of the invention, clinical data of the PBC patient in the disease progress process are obtained from the preset database, and the clinical data are screened based on the preset key index data to obtain sample clinical data; clustering the test data in the sample clinical data by stages by adopting a preset hierarchical clustering algorithm to obtain a plurality of clustering results; and carrying out statistical analysis on the sample clinical data corresponding to the test data in each clustering result to obtain a target analysis result. The technical scheme of the embodiment of the invention fully excavates and analyzes the PBC patient clinical examination data, and is helpful for people to know PBC disease characteristics and heterogeneity, thereby being helpful for clinicians to rapidly identify the disease characteristics of patients on the basis of comprehensively analyzing the clinical examination results, and providing decision basis for diagnosis, typing, treatment and prognosis judgment of diseases.

Fig. 2 is a flowchart of another method for analyzing primary biliary cholangitis clinical test data according to an embodiment of the present invention, which belongs to the same inventive concept as the method for analyzing primary biliary cholangitis clinical test data according to the foregoing embodiment, and further describes a process of clustering test data in sample clinical data in stages by using a preset hierarchical clustering algorithm to obtain a plurality of clustering results based on the foregoing embodiment. The method can be executed by a data analysis device for clinical examination of primary biliary cholangitis, and the device can be realized by software and/or hardware and is integrated in electronic equipment with an application development function.

As shown in fig. 2, the method for analyzing clinical examination data of primary biliary cholangitis includes the following steps:

s210, obtaining clinical data of the PBC patient in the progress process from a preset database, and screening the clinical data based on preset key index item data to obtain sample clinical data.

And S220, pre-grouping the sample clinical data by adopting the log-likelihood distance among the test data to obtain a corresponding pre-grouping result.

Assuming that the sample clinical data is divided into a plurality of clusters, wherein one cluster includes two types of test data, the calculation of the log-likelihood distance between the two types of test data includes: firstly, respectively calculating the log-likelihood estimation before merging the two types of test data and the log-likelihood estimation after merging; and then, calculating the difference of the log-likelihood estimation before and after combination, namely the log-likelihood distance between the two types of test data.

Specifically, the log-likelihood distance between every two pieces of test data in the sample clinical data is calculated, and the sample clinical data is pre-grouped based on the log-likelihood distance to obtain a pre-grouping result.

And S230, carrying out balanced iterative clustering based on the pre-grouping result to obtain a plurality of clustering results.

The Balanced Iterative Clustering (Balanced Iterative reduction and Clustering of hierarchy, BIRCH) also uses the Balanced Iterative specification and Clustering of the hierarchical method, and the Clustering can be performed only by scanning the data set to be clustered. After the data set is scanned, a CF tree stored in the memory is established, which can be regarded as multi-layer compression of data. The CF Tree only stores CF nodes and corresponding pointers, and all samples are on a disk, so that the memory can be saved. Each node of the clustering feature tree has a clustering feature, including a leaf node and a clustering feature, and each clustering feature is a triple and can be represented by (N, LS, SS). Wherein N represents the number of sample data owned by this cluster feature; LS is the sum of the feature attribute values of sample data owned in this cluster feature; SS represents the square sum of all the characteristic dimensions of the sample data owned by the cluster characteristic. The establishment of the clustering feature tree comprises the following steps:

1. parameters of the CF Tree are defined.

Specifically, a maximum CF number B of internal nodes, a maximum CF number L of leaf nodes, and a maximum sample radius threshold T of each CF of leaf nodes are defined.

2. And establishing the CF Tree.

Specifically, the clinical data of the first sample is read from the pre-grouping result, and is put into a new CF triple a, where N =1, and the new CF is put into the root node; continuing to read in the second sample clinical data, finding that the second sample clinical data and the first sample clinical data A are in a hyper-sphere range with the radius T, namely, the second sample clinical data and the first sample clinical data belong to a CF, adding a second point into a CF triple A, and updating the value of the triple A, wherein N =2 in the triple A; this time, the third node comes, but this node cannot be merged into the hyper-sphere formed by the node just before, that is, a new CF triple B is needed to accommodate this new value, and there are two CF triples a and B in the root node.

3. And traversing sample clinical data corresponding to the pre-grouping result to establish the CF Tree.

Flow of the BIRCH algorithm:

1. and sequentially reading all the clinical data of the pre-grouping result samples, and establishing a CF Tree in the memory.

2. And (5) preprocessing a CF tree.

Specifically, a sample number threshold value is set, and tree nodes with the number of sample clinical data smaller than the sample number threshold value are removed; and setting a sample merging threshold value, and merging tuples with distance of hyper spheres smaller than the sample merging threshold value.

3. And clustering all the CF tuples by using a clustering algorithm. This has the advantage that unreasonable tree structures due to the order in which the sample clinical data is read in, and some tree structure splits due to the limit of the number of nodes CF, can be eliminated.

4. And (3) clustering all sample points according to the distance by using the mass centers of all CF nodes of the CF Tree generated by the step (3) as initial mass center points to obtain a clustering result.

The method further comprises the following steps:

and S240, evaluating the plurality of clustering results by adopting a preset clustering result evaluation algorithm to obtain a clustering evaluation result.

And the clustering result evaluation algorithm is used for evaluating the clustering result and determining the quality of the clustering result. The evaluation algorithm of the clustering result may be classified into an internal evaluation (internal evaluation) algorithm and an external evaluation (external evaluation) algorithm. Among them, the external evaluation algorithm is used to evaluate the clustering result with the known true label (ground truth), for example, by Purity (Purity), rand Index (RI), F-score (F-score), and Adjusted Rand Index (ARI), and the like. The internal evaluation algorithm is used for completely unmarked data and is evaluated only according to the clustering result, for example, by using a contour Coefficient (Silhouette Coefficient) and a Calinski-Harabasz Index (Calinski-Harabasz Index), and the like.

Optionally, the evaluating the clustering result based on the contour coefficient includes: the contour coefficient detection value S (i) of the sample point is calculated. Specifically, the profile coefficient detection value S (i) corresponding to the ith sample is calculated by formula (1), and assuming that the clustering result includes N clusters, where a (i) is an average distance from the ith sample point to other sample points in the cluster corresponding to the ith sample point, b (i) is an average distance from the ith sample point to other (N-1) clusters, and the distance from the sample point to a cluster is determined by an average distance from the ith sample point to all sample points in the cluster, the profile coefficient detection value S (i) of the sample point is taken as the clustering evaluation result.

And S250, correcting the plurality of clustering results according to the clustering evaluation result to obtain a final clustering result.

Optionally, an evaluation threshold is set, and the clustering result is corrected. If S (i) is greater than the evaluation threshold, indicating that the cluster exists in the clustering result; if S (i) is not greater than the evaluation threshold, it indicates that the cluster is not present. And taking the clustering result of which S (i) is greater than the evaluation threshold value as a final clustering result.

And S260, carrying out statistical analysis on sample clinical data corresponding to the test data in each clustering result to obtain a target analysis result.

According to the technical scheme of the embodiment of the invention, clinical data of the PBC patient in the disease progress process are obtained from the preset database, and the clinical data are screened based on the preset key index data to obtain sample clinical data; pre-grouping the sample clinical data by using the log-likelihood distance among the test data to obtain a corresponding pre-grouping result; based on the pre-grouping result, carrying out balanced iterative clustering to obtain a plurality of clustering results; evaluating the plurality of clustering results by adopting a preset clustering result evaluation algorithm to obtain a clustering evaluation result; correcting the plurality of clustering results according to the clustering evaluation result to obtain a final clustering result; and carrying out statistical analysis on the sample clinical data corresponding to the test data in each clustering result to obtain a target analysis result. According to the technical scheme of the embodiment of the invention, the clustering result is evaluated, and the clustering result is corrected according to the clustering evaluation result, so that the problems of less application and insufficient depth of the analysis of the clinical data of the PBC patient are solved, the clinical data of the PBC patient can be sufficiently mined and analyzed, and data support is provided for the clinical performance classification and prognosis judgment of the PBC patient.

Fig. 3 is a flowchart of another method for analyzing primary biliary cholangitis clinical test data according to an embodiment of the present invention, which belongs to the same inventive concept as the method for analyzing PBC clinical test data according to the foregoing embodiment, and further describes a process of performing statistical analysis on sample clinical data corresponding to test data in each clustering result based on the foregoing embodiment. The method can be executed by a primary biliary cholangitis clinical examination data analysis device, and the device can be realized by software and/or hardware and is integrated in electronic equipment with an application development function.

As shown in fig. 3, the method for analyzing clinical examination data of primary biliary cholangitis includes the following steps:

s310, obtaining clinical data of the PBC patient in the disease progress process from a preset database, and screening the clinical data based on preset key index item data to obtain sample clinical data.

And S320, clustering the test data in the sample clinical data by stages by adopting a preset hierarchical clustering algorithm to obtain a plurality of clustering results.

S330, carrying out statistical analysis on at least one statistical item of gender distribution, age distribution, clinical characteristics, complications, positive antibody categories and clinical endpoint events aiming at the sample clinical data corresponding to the test data in each clustering result to obtain a target statistical result.

Complications are one or more diseases or clinical states that are concurrent with and independent of the primary disease. PBC may be associated with ascites, portal hypertension and hepatic failure, and other complications such as hepatic failure and symptoms related to liver cirrhosis may occur in the late stage, and may also be associated with liver cancer. In addition, the complications of PBC may include one or more autoimmune diseases, such as sjogren's syndrome, thyroiditis, rheumatoid arthritis, systemic sclerosis, and systemic lupus erythematosus.

The clinical end-point event mainly refers to death related to liver disease and decompensation of liver cirrhosis (ascites, upper gastrointestinal hemorrhage and/or hepatic encephalopathy), liver cancer, liver transplantation, death related to liver disease, and the like.

It can be understood that due to the large amount of PBC-related clinical data, the purpose of estimating the population by performing statistical analysis on the sample clinical data is needed, and the influence of the statistical items on PBC is more intuitively displayed.

Further, the method comprises the following steps:

and S340, acquiring PBC clinical data to be analyzed.

And S350, determining classification results and prognosis features corresponding to the PBC clinical data to be analyzed according to the target analysis results corresponding to the clustering results.

Specifically, firstly, PBC clinical data to be analyzed are obtained; then, determining a classification result corresponding to the PBC clinical data to be analyzed according to the corresponding relation between each clustering result and the target analysis result; meanwhile, determining the prognosis characteristics of the PBC clinical data to be analyzed according to the prognosis characteristics corresponding to each clustering result in the target statistical result.

In a specific embodiment, fig. 4 is a flowchart of a specific method for analyzing primary biliary cholangitis clinical test data according to an embodiment of the present invention, and as shown in fig. 4, the method for analyzing primary biliary cholangitis clinical test data includes the following steps:

and S410, acquiring clinical data of the PBC patient.

Specifically, the clinical data of PBC patients are extracted from hospital medical information systems (HIS system and LIS system), a retrospective study cohort is established, and baseline and follow-up data, including discharge diagnosis, demographic data, medical history, physical examination results, complications, and laboratory test results (including blood routine, biochemical indicators, virological markers, and autoantibody results) are recorded. .

And S420, determining sample clinical data based on the autoantibody type of the clinical data.

The clinical data of PBC patients with complete data and 19 autoantibody detections in the clinical data are used as sample clinical data, and the sample clinical data with the sample quantity of 537 is obtained.

And S430, pre-grouping the sample clinical data by using the log-likelihood distance to obtain a pre-grouping result.

To reduce the distance between all possible clusters, 19 autoantibodies were pre-grouped. Specifically, the contour coefficient detection value S (i) of the sample point is calculated, and the clustering result of S (i) >0.5 is reserved as the pre-grouping result.

And S440, based on the pre-grouping result, performing clustering analysis on the sample clinical data through a BIRCH algorithm to obtain a plurality of clustering results.

And (3) performing cluster analysis on the sample clinical data with the sample volume of 537 by adopting a BIRCH two-step clustering algorithm, converting the corresponding autoantibody result into two classification variable negatives and positives, and performing cluster analysis to obtain five clustering results (refer to fig. 5). Wherein the sample size corresponding to cluster 1 is 107, the sample size corresponding to cluster 2 is 120, the sample size corresponding to cluster 3 is 125, the sample size corresponding to cluster 4 is 101, and the sample size corresponding to cluster 5 is 84; and meanwhile, generating a corresponding number according to the clustering result, wherein the number is used for representing the clustering type corresponding to the sample clinical data and the patient corresponding to the sample clinical data.

S450, carrying out statistical analysis on sample clinical data corresponding to the test data in each clustering result to obtain a target analysis result.

The baseline demographics, clinical signs, complications, laboratory test indicators, follow-up visits, and clinical endpoint events of the sample clinical data corresponding to the test data in the five clusters are compared, survival analysis comparisons are performed for different clusters, the clinical characteristics of different autoantibody clusters are described, and target analysis results corresponding to each cluster are obtained (refer to fig. 5).

And S460, determining classification results and prognosis characteristics corresponding to the PBC clinical data to be analyzed according to the target analysis results corresponding to the clustering results.

Specifically, the classification result and the prognosis feature corresponding to the PBC clinical data to be analyzed are determined according to the target analysis results corresponding to the five clusters (refer to fig. 5).

According to the technical scheme of the embodiment of the invention, clinical data of the PBC patient in the disease progress process are obtained from the preset database, and the clinical data are screened based on the preset key index data to obtain sample clinical data; clustering the test data in the sample clinical data by stages by adopting a preset hierarchical clustering algorithm to obtain a plurality of clustering results; performing statistical analysis on at least one statistical item of gender distribution, age distribution, clinical characteristics, complications, positive antibody categories and clinical endpoint events aiming at sample clinical data corresponding to the test data in each clustering result to obtain a target statistical result; acquiring PBC clinical data to be analyzed; and determining the classification result and the prognosis characteristic corresponding to the PBC clinical data to be analyzed according to the target analysis result corresponding to each clustering result. According to the technical scheme of the embodiment of the invention, the sample clinical data and the clinical data are subjected to statistical analysis based on the clustering result, the classification result and the prognosis characteristic of PBC clinical test data to be analyzed can be directly obtained, the problems that the PBC patient clinical data analysis is less in application and not deep enough are solved, the PBC patient clinical data can be fully mined and analyzed, and data support is provided for the PBC patient clinical performance classification and prognosis judgment.

Fig. 6 is a block diagram of a primary biliary cholangitis clinical examination data analysis device according to an embodiment of the present invention, which is applicable to a PBC clinical examination data analysis scenario, and is more applicable to a case where the PBC clinical examination data analysis is implemented based on clinical data and disease progression. The device can be realized in the form of hardware and/or software and is integrated in computer equipment with application development function.

As shown in fig. 6, the apparatus for analyzing clinical examination data of primary biliary cholangitis includes: a sample data obtaining module 601, a sample data clustering module 602, and a sample data analyzing module 603.

The sample data acquisition module 601 is used for acquiring clinical data of the PBC patient in the disease progress process from a preset database, and screening the clinical data based on preset key index item data to obtain sample clinical data; the sample data clustering module 602 is configured to cluster the test data in the sample clinical data by stages by using a preset hierarchical clustering algorithm to obtain a plurality of clustering results; the sample data analysis module 603 is configured to perform statistical analysis on sample clinical data corresponding to the test data in each clustering result to obtain a target analysis result.

Optionally, the sample data obtaining module 601 is configured to: and selecting data which contains all preset key index items and is effective in data of all the preset key index items from the clinical data as sample clinical data.

Optionally, the sample data clustering module 602 is configured to:

pre-grouping the sample clinical data by using the log-likelihood distance among the sample test data to obtain a corresponding pre-grouping result;

and carrying out balanced iterative clustering to obtain a plurality of clustering results based on the pre-grouping result.

Optionally, the sample data clustering module 602 is configured to:

evaluating the plurality of clustering results by adopting a preset clustering result evaluation algorithm to obtain a clustering evaluation result;

and correcting the plurality of clustering results according to the clustering evaluation result to obtain a final clustering result.

Optionally, the sample data analysis module 603 is further configured to: and performing statistical analysis on at least one statistical item of gender distribution, age distribution, clinical characteristics, complications, positive antibody categories and clinical endpoint events aiming at the sample clinical data corresponding to the test data in each clustering result.

Optionally, the apparatus further comprises a clinical test data analysis module for:

acquiring PBC clinical data to be analyzed;

and determining the classification result and the prognosis characteristic corresponding to the PBC clinical data to be analyzed according to the target analysis result corresponding to each clustering result.

The PBC clinical examination data analysis device provided by the embodiment of the invention can execute the PBC clinical examination data analysis method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

Fig. 7 is a block diagram of an electronic device according to an embodiment of the present invention. The electronic device 10 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, or other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), or other similar computing devices. The components shown herein, their connections, and their functions are exemplary only, and are not intended to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 7, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 may also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard or a mouse; an output unit 17 such as various types of displays or speakers; a storage unit 18 such as a magnetic disk or an optical disk; and a communication unit 19 such as a network card, modem, or wireless communication transceiver. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), any suitable processor, controller or microcontroller, and so forth. Processor 11 performs the various methods and processes described above, such as the PBC clinical trial data analysis method.

In some embodiments, the PBC clinical test data analysis method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When loaded into RAM 13 and executed by processor 11, the computer program may perform one or more of the steps of the PBC clinical test data analysis methods described above. Alternatively, in other embodiments, processor 11 may be configured to perform the PBC clinical trial data analysis method by any other suitable means (e.g., by way of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for implementing the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine or partly on a machine, partly on a machine and partly on a remote machine or entirely on a remote machine or server as a stand-alone software package.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a Display device (e.g., a CRT (Cathode Ray Tube) or LCD (Liquid Crystal Display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the Internet.

The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.

It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.

The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for analyzing clinical examination data of primary biliary cholangitis is characterized by comprising the following steps:

2. The method of claim 1, wherein the screening the clinical data based on the preset key indicator data to obtain sample clinical data comprises:

and selecting data which contains all preset key index items and is effective in the data of each preset key index item from the clinical data as the sample clinical data.

3. The method according to claim 2, wherein the preset key index items comprise a plurality of preset categories of antinuclear antibodies relevant to diagnosis and prognosis of primary biliary cholangitis.

4. The method according to claim 1, wherein the clustering test data in the sample clinical data in stages by using a preset hierarchical clustering algorithm to obtain a plurality of clustering results comprises:

pre-grouping the sample clinical data by using the log-likelihood distance between the test data to obtain a corresponding pre-grouping result;

and carrying out balanced iterative clustering on the basis of the pre-grouping result to obtain a plurality of clustering results.

5. The method of claim 4, further comprising:

6. The method of claim 1, wherein the performing a statistical analysis on the sample clinical data corresponding to the test data in each clustering result comprises:

and performing statistical analysis on at least one statistical item of gender distribution, age distribution, clinical characteristics, complications, positive antibody categories and clinical endpoint events aiming at the sample clinical data corresponding to the test data in each clustering result.

7. The method according to any one of claims 1-6, further comprising:

acquiring clinical data of primary biliary cholangitis to be analyzed;

and determining a classification result and a prognosis characteristic corresponding to the primary biliary cholangitis clinical data to be analyzed according to a target analysis result corresponding to each clustering result.

8. The method according to any one of claims 1 to 6, wherein the preset key index item data includes: antinuclear antibodies, anti-mitochondrial antibodies and/or anti-AMA-M2 antibodies, anti-centromere antibodies and/or anti-centromere B antibodies, anti-Ro 52 antibodies, anti-SSA antibodies, anti-SS-B antibodies, anti-Smith antibodies, anti-ribonucleoprotein antibodies, anti-double-stranded DNA antibodies, anti-ribosomal P protein antibodies, anti-histone antibodies, anti-Nuk antibodies, anti-Scl-70 antibodies, anti-Jol antibodies, anti-gp 210 antibodies, anti-sp 100 antibodies, anti-soluble liver antigen antibodies, anti-liver and kidney microsome antibodies, and anti-hepatocyte cytoplasmic 1 type 1 antibodies.

9. The method according to any one of claims 1 to 6, wherein the predetermined hierarchical clustering algorithm is a BIRCH two-step clustering method.

10. A primary biliary cholangitis clinical examination data analysis device is characterized by comprising: