CN114548259A

CN114548259A - PISA fault identification method based on Semi-supervised Semi-KNN model

Info

Publication number: CN114548259A
Application number: CN202210152424.4A
Authority: CN
Inventors: 于霞; 张占虎; 李鸿儒; 周健; 陆静毅; 马晓静
Original assignee: Northeastern University China; Shanghai Sixth Peoples Hospital
Current assignee: Northeastern University China; Shanghai Sixth Peoples Hospital
Priority date: 2022-02-18
Filing date: 2022-02-18
Publication date: 2022-05-27
Anticipated expiration: 2042-02-18
Also published as: CN114548259B

Abstract

The invention relates to a PISA fault identification method based on a Semi-supervised Semi-KNN model, which comprises the following steps: s10, obtaining blood sugar information to be detected in a preset time period and preprocessing the blood sugar information to be detected to obtain preprocessed blood sugar information to be detected; s20, acquiring a constraint relation by adopting a similarity measurement processing mode based on a pre-established PISA constraint set and the preprocessed blood glucose information to be detected; s30, inputting the preprocessed blood sugar information to be tested and the constraint relation into a pre-trained Semi-supervised Semi-KNN model, and outputting a classification result of the blood sugar information to be tested by the Semi-supervised Semi-KNN model; the Semi-supervised Semi-KNN model is a Semi-supervised mode model for identifying abnormal blood glucose information, which is obtained by training the KNN model by adopting a training data set and a PISA constraint set. The method improves the reliability of the blood sugar information detection, improves the accuracy of the fault diagnosis result, and improves the processing efficiency.

Description

PISA fault identification method based on Semi-supervised Semi-KNN model

Technical Field

The invention relates to a PISA fault identification technology, in particular to a PISA fault identification method based on a Semi-supervised Semi-KNN model.

Background

In recent years, continuous blood glucose monitoring systems CGM have gained more and more attention. Continuous blood glucose monitoring signals are used as an aid in diagnosing and guiding various types of diabetes. The continuous blood sugar monitoring signals are usually analyzed by a data-driven method, and the problems that the blood sugar signals are easily influenced by noise, a blood sugar monitor is easily broken down, the blood sugar prediction alarm is influenced by data errors, the accuracy is low and the like exist. Failure identification methods for continuous blood glucose monitoring systems are mostly plagued by poor performance and high false positive rates, which limit the ancillary clinical utility.

The development of digital signal processing has been rapid in recent years, and the noise problem of CGM signals has been solved by finite and infinite impulse response filters. CGM failure detection remains a challenge of concern and a very active area of research and application.

CGM readings fall rapidly when the skin around the sensor used by continuous glucose monitoring systems CGM is subjected to significant stress, and algorithms based on CGM readings such as predictive pump shut-off rely on an estimate of the rate of change of the glucose sensor to shut down the insulin pump to avoid hypoglycemia. However, as the PISA event cannot be noticed in time when it occurs at night, improper pump shut-down may be caused; in addition, the prediction algorithm also has low prediction data due to the PISA fault and has more serious influence on the prediction and early warning. Therefore, how to distinguish the PISA fault from other low-signal-value events such as insulin event, hypoglycemia event, and motion event becomes a technical problem to be solved. Therefore, there is a need for a semi-supervised PISA fault identification method with sufficiently fast execution times for real-time operation.

Disclosure of Invention

Technical problem to be solved

In view of the above disadvantages and shortcomings of the prior art, the present invention provides a PISA fault identification method based on a Semi-supervised Semi-KNN model.

(II) technical scheme

In order to achieve the purpose, the invention adopts the main technical scheme that:

in a first aspect, an embodiment of the present invention provides a PISA fault identification method based on a Semi-supervised Semi-KNN model, including:

s10, obtaining blood sugar information to be detected in a preset time period, and preprocessing the blood sugar information to be detected to obtain preprocessed blood sugar information to be detected;

s20, acquiring a constraint relation to which the blood glucose information to be detected belongs by adopting a similarity measurement processing mode based on a pre-established PISA constraint set and the preprocessed blood glucose information to be detected;

the PISA constraint set is a set which is constructed based on prior knowledge and has ML constraint and CL constraint in a Semi-supervised Semi-KNN model training stage, and each element in the set is information of first-order difference characteristics of a blood glucose subsequence;

s30, inputting the preprocessed blood sugar information to be detected and the constraint relation into a pre-trained Semi-supervised Semi-KNN model, and outputting a classification result of the blood sugar information to be detected by the Semi-supervised Semi-KNN model;

the Semi-supervised Semi-KNN model is a Semi-supervised mode model for identifying abnormal blood glucose information, which is obtained by adopting a training data set and the PISA constraint set to train the KNN model, and the training data set comprises blood glucose data processed by first-order difference.

Optionally, before the S10, the method further includes:

s01, acquiring a plurality of historical blood glucose data by means of CGM equipment, preprocessing each historical blood glucose data, and obtaining a blood glucose sequence; each blood glucose sequence comprises blood glucose data with PISA time stamp labels and blood glucose data with non-PISA time stamp labels;

s02, dividing each blood sugar sequence into a plurality of subsequences, and performing first-order difference calculation on each subsequence to obtain a training data set;

s03, forming a rule based on the priori knowledge and the training data with the PISA timestamp labels in the training data set according to a semi-supervised constraint condition, and generating a PISA constraint set;

s04, training the training data set and the PISA constraint set on the Semi-supervised Semi-KNN model to obtain the trained Semi-supervised Semi-KNN model;

the Semi-supervised Semi-KNN model is constructed by improving the KNN model and adopting a Semi-supervised mode.

Optionally, the S04 includes:

s04-1, traversing all subsequences of the training data set, and constructing an offline K-dimensional search binary tree to obtain a K-D tree;

s04-2, traversing a PISA constraint set based on the K-D tree, and obtaining an abnormal threshold sigma of the Semi-supervised Semi-KNN model, wherein the boundary threshold of the abnormal threshold is sigma 1 and sigma 2 and is represented as sigma ═ sigma 1, sigma 2;

calculating the average distance between each PISA event and other events in the PISA constraint set by adopting a DTW similarity measurement function to obtain a distance set;

then the anomaly threshold σ ═ σ 1, σ 2 is obtained according to the following formula (1);

σ 1 ═ Q3+1.5(Q3-Q1), equation (1)

σ2＝Q1-1.5(Q3-Q1)，

If the distance dist < sigma 2 of the blood glucose data to be detected from the sample of the ML relation in the PISA constraint is smaller, determining the blood glucose data to be detected as an abnormal sample; if the sample distance dist of the CL relation between the blood glucose data to be detected and the PISA constraint is larger than sigma 1, determining the blood glucose data to be detected as an abnormal sample;

q3 is the upper quartile in the distance set and Q1 is the lower quartile in the distance set.

Optionally, the S02 includes:

performing sliding window processing on each blood sugar sequence, and forming a plurality of sub-sequences qi ═ X { X1, X2, …, xn } after sliding windows with the size of w in the blood sugar sequence X ═ { X1, X2, …, xn }, wherein_i,x_i+1,…,x_i+k}，

A subset of sequences D ═ q1, q2, …, qm, a first order difference calculation is performed for each subsequence qi according to equation (2),

h is the variation of the first order difference formula, and the value of h is 0.8-1.2;

and after the first-order difference is calculated for all the subsequences, the first-order difference value of each subsequence is used as a training data set.

Optionally, the S10 includes:

acquiring blood sugar information to be detected for more than or equal to 30-45 minutes by means of CGM equipment;

and performing filtering processing, and preprocessing the blood glucose information to be detected in a sliding window mode to remove isolated noise points in the blood glucose information to be detected and fill up missing values to obtain a blood glucose sequence to be detected of the blood glucose information to be detected. That is, filtering processing can be performed on the blood glucose information to be measured, the blood glucose information to be measured can be traversed in a sliding window mode with a proper size, and when the micro areas exist in the sliding window range, the micro areas are processed averagely, so that quasi-isolated noise points of the blood glucose information to be measured are removed. When missing values exist in the blood sugar information to be detected due to sensor problems, the number of the missing values which continuously exist can be judged, then missing value filling is carried out on the blood sugar information to be detected through a general linear interpolation method, and a blood sugar sequence to be detected after preprocessing of the blood sugar information to be detected is obtained.

Optionally, the S20 includes:

when the SBD distance between the blood glucose data A of each sequence in the blood glucose sequence to be detected and one PISA event B in the PISA constraint set is smaller than the threshold lambda, f is_SBD(A,B)<Lambda, determining a constraint relation ML (A, B), and updating the PISA constraint set; λ is a preset value greater than 0; f represents a function of the distance between the two sequences for calculating the SBD;

when the blood sugar data A of each sequence in the blood sugar sequence to be tested and one in the PISA constraint setSBD distance of CL constraint relation of PISA event B is less than threshold lambda, i.e. f_SBD(A,B)<Lambda, determining a constraint relation CL (A, B) and updating the PISA constraint set;

and traversing each sequence in the blood sugar sequence to be detected, and taking the updated PISA constraint set as the constraint relation to which the blood sugar information to be detected belongs.

Optionally, the S30 includes:

based on the K-D tree and the blood sugar sequence to be detected, acquiring K data points closest to each data in the blood sugar sequence to be detected in a circulating iteration mode, and acquiring the K-D tree in a use stage,

traversing the constraint relation based on the K-D tree in the using stage to obtain the classification result of the PISA abnormal information of the blood glucose sequence to be detected;

and calculating the actual distance between each PISA event and other events in the constraint relation by adopting a DTW similarity measurement function, and comparing the actual distance with an anomaly threshold value sigma [ sigma 1, sigma 2] to obtain classification results belonging to the PISA events and non-PISA events.

Optionally, after comparing the actual distance with the abnormal threshold, determining a data volume belonging to the ML constraint in the constraint relationship, and determining an abnormal level value belonging to the PISA event according to the data volume.

In a second aspect, an embodiment of the present invention further provides an electronic device, which includes: a memory for storing a computer program and a processor for executing the computer program stored in the memory and executing the steps of the method for PISA fault identification based on Semi-supervised Semi-KNN model according to any of the above first aspects.

In a third aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method for PISA fault identification based on Semi-supervised Semi-KNN model as described in any one of the above first aspects.

(III) advantageous effects

The method of the embodiment of the invention carries out abnormity classification based on the Semi-KNN model, and solves the problem of uncertainty of the KNN model in the process of carrying out abnormity detection; the prior knowledge is introduced in a constraint form for the first time to form a semi-supervised anomaly detection method, so that the effective prior knowledge is utilized to the maximum extent, and the reliability of a detection result is improved; by grading the results, the reliability of the results is improved, and the effect of helping doctors to realize clinical judgment is achieved.

In the embodiment of the invention, the CGM sensor fault diagnosis is firstly carried out by a semi-supervised method, and the accuracy of the fault diagnosis result is improved by introducing prior knowledge (such as expert experience); compared with the effect of applying the traditional unsupervised fault identification method to the field of CGM sensor fault diagnosis, the method has higher detection result accuracy, and can ensure higher identification rate aiming at PISA faults and confidence coefficient of detection of uncalibrated abnormity (such as PISA abnormal events occurring at night) by introducing the PISA constraint set of the semi-supervised model.

In addition, the semi-supervised model provided by the invention can update the distance measurement mode from the original Euclidean distance to a DTW and SBD similarity measurement method aiming at the time series data type, thereby improving the measurement accuracy of the time series data type and accelerating the running speed of the whole calculation program.

The method can be applied to a continuous blood glucose monitoring system (CGM) to enhance CGM data. The fault detection not only enhances the security of CGM, but also can avoid the reliability reduction of tasks such as treatment scheme change or forecast early warning caused by faults. CGM applying the method of the invention is used for detecting a pressure sensing transducer attenuation (PISA) pseudo signal, and the confidence coefficient of detection is improved.

Drawings

Fig. 1 is a flowchart of a PISA fault identification method based on a Semi-supervised Semi-KNN model according to an embodiment of the present invention;

FIG. 2(a) is a process diagram of a sample of constructing a K-dimensional search binary tree K-d tree;

FIG. 2(b) a schematic of a K-d tree;

FIG. 3 is a representation of a new sample;

FIG. 4 is a schematic diagram of constraint relationship versus KNN anomaly detection iterative process guidance;

fig. 5 is a flowchart of a PISA fault identification method based on a Semi-supervised Semi-KNN model according to another embodiment of the present invention.

Detailed Description

For a better understanding of the present invention, reference will now be made in detail to the present embodiments of the invention, which are illustrated in the accompanying drawings.

Example one

As shown in fig. 1, fig. 1 shows a flowchart of a PISA fault identification method based on a Semi-supervised Semi-KNN model, the execution subject of the method can be any computer/electronic device/CGM, and the method can include the following steps:

s10, obtaining blood sugar information to be detected in a preset time period, and preprocessing the blood sugar information to be detected to obtain preprocessed blood sugar information to be detected.

In the embodiment, the blood sugar information to be detected for more than or equal to 30-45 minutes can be acquired by means of CGM; and filtering the blood sugar information to be detected, and preprocessing the blood sugar information to be detected in a sliding window mode to remove isolated noise points in the blood sugar information to be detected and fill up missing values to obtain a blood sugar sequence to be detected of the blood sugar information to be detected.

It should be noted that the time period of the blood glucose information to be measured can be adjusted, and the adjustment is performed according to the parameter value of the sliding window in the preprocessing.

the PISA constraint set is a set which is constructed based on prior knowledge and has ML constraint and CL constraint in the Semi-supervised Semi-KNN model training stage, and each element in the set is information of first-order difference characteristics of a blood glucose subsequence. The PISA constraint set necessarily contains PISA information, namely, the PISA constraint set is created after blood glucose data in a training phase is preprocessed according to PISA timestamp labels.

the Semi-supervised Semi-KNN model is a Semi-supervised mode model for identifying abnormal blood glucose information, which is obtained by adopting a training data set and the PISA constraint set to train the KNN model, and the training data set comprises blood glucose data processed by first-order difference. It is understood that the training data set is a data set obtained by performing first-order difference processing on the blood glucose data acquired in the training phase through the sub-sequence set after the wayside processing.

The method of the embodiment performs anomaly classification based on the Semi-KNN model, and solves the problem of uncertainty of the KNN model in the process of performing anomaly detection; the prior knowledge is introduced in a constraint form for the first time to form a semi-supervised anomaly detection method, so that the effective prior knowledge is utilized to the maximum extent, and the reliability of a detection result is improved; by grading the results, the reliability of the results is improved, and the effect of helping doctors to realize clinical judgment is achieved.

In practical applications, before the step S10, the method shown in fig. 1 may further include the following steps not shown in the drawings:

s01, acquiring a plurality of historical blood sugar data by means of a continuous blood sugar monitoring system (CGM) (namely CGM equipment), preprocessing each historical blood sugar data and obtaining a blood sugar sequence; each blood glucose sequence comprises blood glucose data with PISA time stamp labels and blood glucose data with non-PISA time stamp labels;

for example, the S02 may include:

h is the variation of the first-order difference formula, the value of h is 0.8-1.2, and 1 is preferably selected;

For example, S04 may include:

σ 1 ═ Q3+1.5(Q3-Q1), equation (1)

σ2＝Q1-1.5(Q3-Q1)，

For better understanding of the above step S20, the step S20 can be specifically described as follows:

when the SBD distance between the blood glucose data A of each sequence in the blood glucose sequence to be detected and one PISA event B in the PISA constraint set is smaller than the threshold lambda, f is_SBD(A,B)<Lambda, determining a constraint relation ML (A, B), and updating the PISA constraint set; λ is a preset value greater than 0; f. of_SBDA function representing the calculated SBD distance of the two sequences;

when the SBD distance of the CL constraint relation between the blood sugar data A of each sequence in the blood sugar sequence to be detected and one PISA event B in the PISA constraint set is smaller than a threshold value lambda, namely f_SBD(A,B)<Lambda, determining a constraint relation CL (A, B) and updating the PISA constraint set;

Accordingly, the step S30 may include:

and calculating the actual distance between each PISA event and other events in the constraint relation by adopting a DTW similarity measurement function, and comparing the actual distance with an anomaly threshold value sigma [ sigma 1, sigma 2] to obtain classification results belonging to the PISA events and non-PISA events. Specifically, after the actual distance is compared with the abnormal threshold value, the data volume belonging to the ML constraint in the constraint relation is determined, and the abnormal grade value belonging to the PISA event is determined according to the data volume.

The method of the present embodiment can be integrated into an electronic device such as an abnormality detector that can identify abnormal problems of PISA, thereby, using the above abnormality detector in clinical emergency patient care, effectively monitoring and reliably quantifying abnormal states of patients, solving the hysteresis in the prior art, enabling real-time monitoring and analysis.

In the embodiment, the CGM sensor fault diagnosis is provided for the first time through a semi-supervised method, and the accuracy of the fault diagnosis result is improved by introducing prior knowledge (such as expert experience). Meanwhile, the problem of uncertainty of the KNN algorithm in abnormal detection is solved.

Example two

The method of this embodiment can be described in detail in the order of the preparation phase, the training phase and the use phase, and is shown in fig. 5.

1. Preparation phase-acquisition and preprocessing of historical CGM blood glucose data

The continuous blood Glucose monitoring system CGM (continuous Glucose monitoring) is one of the key components of artificial pancreas, and the blood Glucose level of patients can be continuously monitored by the device, so as to help patients with type I diabetes (T1DM) to maintain the blood Glucose concentration within a safe range.

1.1CGM blood glucose data acquisition

The CGM indirectly reflects the blood sugar level by monitoring the glucose concentration of subcutaneous interstitial fluid through a glucose sensor, and can provide continuous, comprehensive and reliable blood sugar information all day long. The obtained historical blood glucose data should include complete three-day data, wherein blood glucose values are collected every five minutes, and the total number of the blood glucose values is 3 x 288, wherein besides normal physiological activities such as meal, exercise, sleep and the like, a plurality of PISA typical fault information obtained by experimental compression should be included, and each PISA typical fault information can include an accurate compression label for establishing a subsequent Semi-supervised Semi-KNN model.

1.2CGM blood glucose data preprocessing

Historical blood glucose data collected by the device, namely CGM, is stored in a storage device, and can be preprocessed in a digital signal analysis mode, such as filtering, missing value filling, marking and the like. The filtering is to remove isolated noise points on the CGM glucose sequence, where an isolated noise point means data with too large blood glucose value deviation before and after the time, a sliding window traversal sequence is used, and when such micro-regions exist in the sliding window range, the micro-regions are averaged, and this step is an optional step, so that the filtered glucose sequence is smoother and convenient for subsequent processing. Missing value filling is to prevent the situation that blood sugar values are vacant, a blood sugar sequence used in general is a blood sugar sequence after the missing value filling, otherwise, a fault of a blood sugar curve in subsequent processing is easy to occur, so that a model output result is inaccurate.

In the actual processing, the filled blood sugar sequence is marked, the experimental interval with the pressing action is marked, and the rest time is not marked, so that the PISA event and other unknown events are distinguished, and the accuracy of the Semi-supervised Semi-KNN model can be effectively ensured.

The above is a description of the preprocessing process of the CGM collected blood glucose data. And preprocessing the continuously collected CGM blood glucose sequence containing the PISA pressing experiment event to obtain the blood glucose sequence and the PISA timestamp label after filtering and missing value filling.

2. Preparation phase-construction of features and addition of initial seeds according to a priori knowledge

As can be seen from the PISA event time period of the experimental press, some data in the blood glucose sequence may be subjected to a marking process, and the PISA timestamp labels obtained by the preprocessing are, for example, 9: 00-9: in the 45-interval, 9 blood sugar values in the blood sugar sequence are obtained based on the PISA time stamp label, the blood sugar values are blood sugar values when the PISA event occurs, and the corresponding first-order difference features correspond to PISA event features.

2.1 constructional features

Generally, the blood glucose data in the blood glucose sequence is more, and the characteristics of the whole sequence cannot be expressed by using several numerical values, and the blood glucose sequence is processed by adopting a sliding window mode in the embodiment. Specifically, in the blood glucose sequence X ═ { X1, X2, …, xn }, several subsequences qi ═ { X } are formed after a sliding window with size w_i,x_i+1,…,x_i+kAnd a sequence subset is defined as D ═ { q1, q2, …, qm }, and a first order difference calculation is performed on each subsequence qi, and a calculation formula (1) is as follows, wherein h is a change amount of the first order difference formula, in the embodiment, blood glucose is measuredH in the sequence characteristic structure takes a value of 0.8 to 1.2, preferably 1;

and calculating first-order differences of all the subsequences, and using the first-order differences as the input of a subsequent model, namely a Semi-supervised Semi-KNN model. In the embodiment, the first-order difference characteristic is adopted, so that the influence on the model universality caused by different forms of the original blood glucose sequence is avoided, and the first-order difference characteristic has a good inhibition effect on the volatility of the time sequence, so that the input of the Semi-supervised Semi-KNN model is simpler.

2.2 adding initial seeds according to expert experience (a priori knowledge)

Representative samples of several data points in a dataset of current semi-supervised operations are called Seed, and their representation may be sample points themselves or exist in a constrained form. The paired constraint consists of a must-link (ML) and a cannot-link (CL), where two data points in the ML constraint must be in the same cluster, and two data points of the CL constraint declaration must be in different clusters.

In this embodiment, a known PISA event tag is defined as an initial seed of a semi-supervised model, and a CL constraint is constructed in comparison with a blood glucose fluctuation situation caused by other physiological events (such as eating, sports), and a different PISA event constitutes an ML constraint.

In this embodiment, a feature is constructed for a blood glucose sequence and an initial seed part is added, that is, after the blood glucose sequence and the PISA timestamp label which are originally collected are preprocessed, a first-order difference feature of the blood glucose sequence and a pair of constraints included in PISA are obtained.

That is, the training data set and the PISA constraint set are obtained after the preprocessing.

3. Training phase-training Semi-supervised Semi-KNN model

Inputting the preprocessed training data set and the pair-wise constraints contained in the PISA into a Semi-KNN model to obtain a Semi-supervised fault detection model;

3.1 basic model KNN

The existing KNN is a non-parametric supervised classifier, which determines the class of the test sample by the majority of classes of the nearest training samples. Taking the two classification problems under the blood sugar sequence as an example, the two classification problems and the solving process are formally defined as follows: the set of samples for a given glycemic sequence is S ═ (x)₁,y₁),(x₂,y₂),…,(x_N,y_N) Wherein x is_i∈R²As a point in the blood glucose measurement, y_i∈{c₁,c₂Represents the category to which the blood glucose level sample belongs. For a new blood glucose value sample x, the blood glucose value sample category y can be solved with equation (2):

wherein N is_K(x) Represents a set of K samples closest to the blood glucose level sample x, and f is for y_iIs indicated by

The problem with using supervised binary algorithms for anomaly detection is that: 1) the proportion of the abnormal samples to the normal samples is extremely unbalanced, so that the classification result is unbalanced; 2) the types of the abnormal samples are various, the classification of the unknown abnormality and the known abnormality into one type has no explanation, and the simple classification cannot meet the requirements; 3) the data annotation is too costly. Therefore, a semi-supervised model requiring only a small amount of a priori knowledge is more suitable for the requirements of the application.

3.2 Semi-KNN model

In this embodiment, the semi-supervised anomaly detection model is more suitable for PISA fault identification, the semi-supervised model is that label input is changed to the partial PISA constraint set input obtained by the prior knowledge on the basis of a basic model, and the data object is preprocessed CGM blood glucose sequence data.

3.2.1 traversing all data objects of the training data set, constructing an offline K-dimensional search binary tree of the training data set

Constructing a K-dimensional search binary tree (K-D tree) by fitting samples in a training data set, wherein the specific construction process is as follows, and assuming simple two-dimensional sequence data: { (2,3), (4,7), (5,4), (7,2), (8,1), (9,6) }, first calculating the variance of the data in the x and y directions respectively to know that the variance in the x direction is large, so the domain is determined to be the x-axis direction; secondly, sorting the median value to be 7 according to the

values

2,5,9,4,8 and 7 in the direction of the x axis, so that the partition hyperplane is (7,2) and is vertical to the x axis; then, the left subspace and the right subspace are determined, and the partition hyperplane divides the entire space into two parts, as shown in fig. 2 (a). The left subspace contains 3 nodes { (2,3), (4,7), (5,4) }; the right subspace contains 2 nodes { (8,1), (9,6) }, and then recursion continues until only one data point is contained in each space, resulting in the final K-d tree, as shown in FIG. 2 (b).

3.2.2 traversing the PISA constraint set to determine the PISA anomaly threshold

The above-mentioned training data set includes PISA event data caused by experimental compression, so traversing the PISA constraint set can distinguish PISA anomalies of the training data set from other normal physiological events, and thus an anomaly threshold σ of the Semi-supervised Semi-KNN model can be obtained, whose boundary thresholds are σ 1 and σ 2, denoted as σ ═ σ 1, σ 2.

The average distance of each PISA event from other normal physiological events is calculated, the distance is usually in the form of euclidean distance, but in order to fit the type of the blood glucose sequence data and better express the shape similarity of the time sequence, a DTW similarity measure function which warns and warps under the time axis to achieve better alignment effect is used. After all the average distances are obtained, the upper quartile Q3 and the lower quartile Q1 of the distance set are calculated, and then the anomaly threshold value σ 1 ═ Q3+1.5(Q3-Q1) and σ 2 ═ Q1-1.5(Q3-Q1) of the model are calculated, and when the distance dist > σ 1 or dist < σ 2 from the normal sample, the sample is considered as an anomalous sample.

That is, if the distance dist < σ 2 of the blood glucose data to be measured from the sample of the ML relationship in the PISA constraint is determined as an abnormal sample; and if the sample distance dist of the CL relation between the blood glucose data to be detected and the PISA constraint is larger than sigma 1, determining the blood glucose data to be detected as an abnormal sample.

4. Using phase-obtaining New-to-blood glucose data, constraint propagation is performed

After the Semi-supervised Semi-KNN model training is completed, the blood glucose data collected in real time can be checked, and when a new blood glucose sequence contains PISA fault events (namely non-PISA events), accurate identification can be achieved.

4.1 blood glucose data to be measured

Obtaining continuous blood sugar values at least within 45 minutes to be analyzed, namely 9 blood sugar measurement points, performing pretreatment, such as filtering treatment, removing isolated noise points in the blood sugar information to be detected and filling missing values in a sliding window mode to obtain a blood sugar sequence to be detected of the blood sugar information to be detected. 4.2 constraint propagation

And executing a constraint propagation algorithm on the blood glucose sequence to be tested, and aiming at expanding a PISA constraint set in the original training data set, so that the judgment is assisted when the Semi-supervised Semi-KNN model is used, and the result is more accurate.

The specific implementation process is as follows:

4.2.1 traverse the PISA constraint set in the original training data set, calculate the SBD distance

When a subset of sequences is D ═ { q1, q2, …, qm }, any qi in D considers the other sequences as nearest neighbors, which means that the distance from the current sequence to the other subsequences ql is less than a threshold λ, which may be generally specified by human beings, and the pair-wise constraint described above contains the following properties:

1) any q ∈ D, ML (q, r) can be generated, wherein r ∈ D;

2) given ML (p, q), (q, r) ML (p, r) can be obtained;

3) given ML (p, q), ML (c, p) and ML (p, r) can be generated, where c ∈ D, r ∈ D;

4) given CL (p, q), CL (c, p) and CL (p, r) can be generated, where c ∈ D, r ∈ D.

The distance measurement selected in the embodiment is the SBD distance, the SBD algorithm is a cross-correlation-based shape similarity measurement, the efficient and parameter-free characteristics of the SBD algorithm are incomparable with those of the DTW measurement, and the DTW measurement method is high in accuracy and calculation cost. And the accuracy of the SBD algorithm is close to that of the DTW algorithm, so that the SBD can be better used for measuring the similarity between CGM curves, and the online similarity calculation is conveniently realized.

4.2.2 performing constraint propagation

When the distance between the blood glucose data A of each sequence in the blood glucose sequence to be detected and the SBD of a PISA event B in the training data set is smaller than a threshold lambda, namely f_SBD(A,B)<λ, then a constraint relation ML (a, B) can be determined, from which the constraint propagation property can update the constraint set.

When the SBD distance of the CL constraint relation between the blood glucose data A of each sequence in the blood glucose sequence to be detected and a PISA event B in the training data set is smaller than a threshold lambda, namely f_SBD(A,B)<λ, then a constraint relation CL (a, B) can be determined, and the constraint set can be updated by the constraint propagation property as well.

5. Using stage-Using Semi-supervised Semi-KNN model to detect anomalies and to judge anomaly level

After step 4 is executed, inputting a constraint relation of the Semi-supervised Semi-KNN model, which is the blood glucose data of each sequence in the blood glucose sequence to be detected and the expansion of the PISA constraint set, to the Semi-supervised Semi-KNN model established in step 3.2.2 for anomaly detection, and judging the anomaly level according to the constraint relation, specifically executing the following steps:

5.1 search for k nearest neighbors and calculate the average distance

The blood glucose data (i.e., new blood glucose data) of each sequence in the blood glucose sequence to be tested is input into the Semi-supervised Semi-KNN model, and k data points closest to the blood glucose data of each sequence can be obtained through the k-dimensional search binary tree established in step 3.2.1, for example, according to the k-dimensional search binary tree sample established in step 3.2.1, it is assumed that the blood glucose data of each sequence is (2.1,3.1), as shown in fig. 3. Firstly, finding out approximate points (2,3) of nearest neighbors through binary search, and calculating the distance to be 0.1414; then, a father node (5,4) is traced back, a center of the father node (2.1,3.1) is taken as a center, 0.1414 is taken as a radius, and the father node do not intersect with a hyperplane with y being 4, so that the right subspace is not required to be searched; then backtracking to the parent node (7,2), the circle also does not intersect with the hyperplane where x is 7, so the right subspace search of (7,2) is not needed to be entered; and (3) obtaining the nearest samples (2,3) after the backtracking is finished, and circularly iterating for k times to obtain the nearest k sample points which are called as k neighbors. k is obtained in the neighborhood, the average distance with the blood glucose data of each sequence is calculated, the distance is used as the score of abnormal judgment, and when the score meets the condition of 3.2.2 threshold values, the sample point is regarded as the PISA fault event.

It should be noted that, in the iterative process, when a constraint relationship between a new arrival sample and PISA exists, it should be satisfied that the k neighbor sample does not include a CL relationship, that is, if a distance between a sample encountering a CL relationship and the new arrival sample in the iterative process satisfies a requirement of the k neighbor, the sample should be discarded, and the iterative process continues to iterate downwards, and a guiding effect of the constraint relationship on the anomaly detection iterative process of KNN can be represented by fig. 4.

5.2 outputting the abnormal grade according to the constraint relation

While outputting the abnormal detection result, the above steps can consider the constraint relation of whether the k neighbor contains the PISA or not. I.e. the distance in the Semi-supervised Semi-KNN model for each sequence of blood glucose data, needs to be considered in addition to the comparison with the anomaly threshold, whether a constraint relationship exists, i.e. the result obtained during the constraint propagation phase described above.

The specific operation is as follows: when no constraint relation is contained, judging that the abnormal level is 1; when the ML relationships comprising the extended PISA constraint set are included, a higher anomaly level of 2,3, … n is assigned depending on the number of ML relationships comprising the extended PISA constraint set. The more ML constraints there are in the expanded set of PISA constraints, the higher the anomaly level.

The aforementioned dtw (dynamic Time warping) algorithm is used to detect the similarity between two Time sequences, and stretch or compress the Time sequences to align them as much as possible. In most cases, the two sequences have very similar shapes overall, but these shapes are not aligned in the x-axis. Before comparing the similarity, one (or both) of the sequences needs to be warped in the time axis to achieve better alignment. DTW is an effective way to achieve this warping distortion.

The SBD algorithm is a cross-correlation (cross-correlation) based shape similarity measure, whose efficient and parameter-free properties are incomparable with the DTW measure, which is a highly accurate but computationally expensive measurement method. And the accuracy of the SBD algorithm is close to that of the DTW algorithm, so that the SBD can be better used for measuring the similarity between CGM curves, and the online similarity calculation is conveniently realized.

The sliding inner product between two pieces of timing data is calculated by cross-correlation, and is inherently robust to phase shift. Given two time sequences x ═ x (x)₁,x₂,…,x_m) And y ═ y₁,y₂,…,y_m) And given the corresponding phase difference s, the inner product of the two curves results as follows:

the standard cross-correlation NCC and distance metric SBD can be calculated as follows:

in the embodiment, the similarity measurement mode is tested, and the test result shows that the SBD similarity measurement algorithm has strong anti-noise capability, can effectively distinguish the waveform difference between the PISA fault event and the normal sequence, and can effectively avoid the noise difference between other sequences; the DTW similarity measurement algorithm is sensitive to shape features, and tiny differences of shapes can be amplified after distance output normalization; the Euclidean distance is sensitive to the amplitude of the blood glucose curve, i.e. the absolute distance difference in the original sequence distribution can be completely represented.

EXAMPLE III

The present embodiment also provides an electronic device, including: a memory and a processor; the processor is configured to execute the computer program stored in the memory to implement the steps of executing the method for PISA fault identification based on the Semi-supervised Semi-KNN model according to any of the first embodiment and the second embodiment.

It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the terms first, second, third and the like are for convenience only and do not denote any order. These words are to be understood as part of the name of the component.

Furthermore, it should be noted that in the description of the present specification, the description of the term "one embodiment", "some embodiments", "examples", "specific examples" or "some examples", etc., means that a specific feature, structure, material or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Moreover, various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without being mutually inconsistent.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, the claims should be construed to include preferred embodiments and all changes and modifications that fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention should also include such modifications and variations.

Claims

1. A PISA fault identification method based on a Semi-supervised Semi-KNN model is characterized by comprising the following steps:

2. The method according to claim 1, wherein before the S10, the method further comprises:

s01, acquiring a plurality of historical blood glucose data by means of a continuous blood glucose monitoring system (CGM), preprocessing each historical blood glucose data, and obtaining a blood glucose sequence; each blood glucose sequence comprises blood glucose data with PISA time stamp labels and blood glucose data with non-PISA time stamp labels;

3. The method according to claim 2, wherein the S04 includes:

σ 1 ═ Q3+1.5(Q3-Q1), equation (1)

σ2＝Q1-1.5(Q3-Q1)，

4. The method according to claim 2, wherein the S02 includes:

One sequence subset is D ═ q1, q2, …, qm }, a first order difference calculation is carried out on each subsequence qi according to a formula (2), n represents the total length of the blood sugar sequence, i is any value from 1 to n-w and represents the ith sequence, and qi is the ith in the sequence subset;

h is the variation of the first-order difference formula, and the value of h is 0.8-1.2;

5. The method according to claim 1, wherein the S10 includes:

obtaining blood sugar information to be detected for more than or equal to 30-45 minutes by means of CGM;

and performing filtering processing, and preprocessing the blood sugar information to be detected in a sliding window mode to remove isolated noise points in the blood sugar information to be detected and realize missing value filling, so as to obtain a blood sugar sequence to be detected of the blood sugar information to be detected.

6. The method according to claim 5, wherein the S20 includes:

when the blood of each sequence in the blood glucose sequence to be testedWhen the SBD distance between the sugar data A and one PISA event B in the PISA constraint set is smaller than a threshold value lambda, namely f_SBD(A,B)<Lambda, determining a constraint relation ML (A, B), and updating the PISA constraint set; λ is a preset value greater than 0; f. of_SBDA function representing the calculated SBD distance of the two sequences;

when the SBD distance of the CL constraint relation between the blood glucose data A of each sequence in the blood glucose sequence to be tested and one PISA event B in the PISA constraint set is smaller than the threshold lambda, namely f_SBD(A,B)<Lambda, determining a constraint relation CL (A, B) and updating the PISA constraint set;

7. The method according to claim 3, wherein the S30 includes:

and calculating the actual distance between each PISA event and other events in the constraint relation by adopting a DTW similarity measurement function, and comparing the actual distance with an abnormal threshold value sigma [ sigma 1, sigma 2] to obtain classification results of the PISA events and non-PISA events.

8. The method of claim 7, wherein the actual distance is compared to an anomaly threshold to determine an amount of data pertaining to the ML constraint in the constraint relationship, and wherein the determination of the value of the anomaly level pertaining to the PISA event is based on the amount of data.

9. An electronic device, comprising a memory in which a computer program is stored and a processor that executes the computer program stored in the memory and performs the steps of the Semi-supervised Semi-KNN model based PISA fault identification method of any of the preceding claims 1 to 8.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the Semi-supervised Semi-KNN model based PISA fault identification method as set forth in any one of the preceding claims 1 to 8.