CN116211316B

CN116211316B - Type identification method, system and auxiliary system for multi-lead electrocardiosignal

Info

Publication number: CN116211316B
Application number: CN202310400960.6A
Authority: CN
Inventors: 赵韡; 周亚; 袁靖; 刁晓林; 霍燕妮
Original assignee: Fuwai Hospital of CAMS and PUMC
Current assignee: Fuwai Hospital of CAMS and PUMC
Priority date: 2023-04-14
Filing date: 2023-04-14
Publication date: 2023-07-28
Anticipated expiration: 2043-04-14
Also published as: CN116211316A

Abstract

The application discloses a type identification method, a system and an auxiliary system of multi-lead electrocardiosignals, which are characterized in that firstly, multi-lead electrocardiosignals and patient characteristic information and electrocardiosignal type labels which are related to partial multi-lead electrocardiosignals in the multi-lead electrocardiosignals are acquired through a data acquisition module, then a data preprocessing module and a data set dividing module are sequentially input to respectively complete data preprocessing and data set dividing, then an electrocardiosignal self-supervision model is trained and stored through a model generating module, finally, when a service computing module receives a type identification request of the electrocardiosignals, the trained electrocardiosignal self-supervision model is automatically called, probability information corresponding to various set electrocardiosignal types is acquired based on data carried by the request, and meanwhile, an electrocardiosignal interpretation model built in an electrocardiosignal interpretation module can be called to interpret an electrocardiosignal. The method and the system can train and obtain the type recognition model of the multi-lead electrocardiosignal based on fewer label data, and improve the model recognition accuracy.

Description

Type identification method, system and auxiliary system for multi-lead electrocardiosignal

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a type identification method, a type identification system and an auxiliary system for multi-lead electrocardiosignals.

Background

Currently, many diagnostic gold standards for cardiovascular diseases are based on imaging examinations, typically ultrasound, CT, nuclear magnetic resonance, interventional radiography, etc., which are relatively expensive, limited to specialists, long patient waiting times, and radiation or trauma damage to the patient's body. Therefore, there is a need for a low cost, convenient and safe inspection method to cope with the current situation of a large number of cardiovascular patients.

Compared with imaging examination, electrocardiographic (signal) examination has the advantages of non-invasiveness, simple operation, economy, effectiveness and the like. In recent years, deep learning has been developed remarkably in the electrocardiographic field, and many deep learning models exist to assist in completing the traditional electrocardiographic tasks such as arrhythmia recognition, and further, research has shown that the deep learning can capture the mode which is difficult to recognize in the electrocardiograph of the cardiovascular patient, so that the accuracy of electrocardiographic type recognition is improved. However, the current method for type identification based on electrocardiosignals mainly adopts a supervised deep learning model. Training of such models requires the use of a large number of electrocardiographic signals and patient electrocardiographic signal type label information (e.g., disease label information, etc.) associated with the multi-lead electrocardiographic signals. On the one hand, a large amount of electrocardiographic data without electrocardiographic type tag information of the patient is abandoned, and on the other hand, in many practical situations, there is not enough electrocardiographic type tag information of the patient, so that some electrocardiographic type recognition models are also lacked.

There are many electrocardio signal type recognition systems based on deep learning at present, which can better complete the traditional electrocardio tasks such as arrhythmia recognition. However, due to the lack of a large number of multi-mode tag data capable of being matched with electrocardiosignals and the lack of an electrocardiosignal unsupervised learning method, most institutions have difficulty in establishing a deep learning model for identifying the types of the electrocardiosignals such as adult congenital heart disease, valvular disease, coronary heart disease, cardiomyopathy and the like based on the electrocardiosignals. Therefore, there are few systems currently performing type recognition with higher accuracy based on electrocardiographic signals.

Disclosure of Invention

The application provides a type identification method, a type identification system and an auxiliary system for multi-lead electrocardiosignals, which can train to obtain a type identification model based on fewer tag data and improve the accuracy of type identification.

In order to achieve the above purpose, the present application adopts the following technical scheme: a type identification method of multi-lead electrocardiosignals comprises the following steps:

data acquisition, namely acquiring n multi-lead electrocardiosignals, and patient characteristic information and an electrocardiosignal type label which are associated with part of the multi-lead electrocardiosignals in the n multi-lead electrocardiosignals;

Data preprocessing, generating a multi-lead electrocardiosignal data set D representing all electrocardiosignals based on the n multi-lead electrocardiosignals ₁ The corresponding sample size is n; generating an associated dataset D based on the partial multi-lead electrocardiograph signal and patient characteristic information and an electrocardiograph signal type tag associated with the partial multi-lead electrocardiograph signal ₂ The corresponding sample size is m, wherein m is less than or equal to n;

data set division, namely dividing the multi-lead electrocardiosignal data set D ₁ Divided into multiple lead electrocardiosignal training set D _1,train And multi-lead electrocardiosignal verification set D _1,vali The associated data set D ₂ Dividing into associated data training sets D _2,train Associated data verification set D _2,vali Associated data test set D _2,test ；

An electrocardiograph self-supervision model framework is constructed, and the model framework is based on a transducer module and comprises a cutter, a double-classification masker, an encoder, a decoder and a classifier;

model pre-training, initializing model parameters, and then training the multi-lead electrocardiosignal training set D _1,train Multi-lead electrocardiosignal verification set D _1,vali Training set D of associated data _2,train Associated data verification set D _2,vali Inputting a model framework, performing self-supervision learning, punishing the self-supervision learning, and obtaining a pre-trained electrocardiographic self-supervision model;

Model fine tuning based on the associated data training set D _2,train Associated data verification set D _2,vali Fine tuning the pre-trained electrocardiographic self-supervision model to finish the training of the model;

model testing, based on the associated data test set D _2,test Testing the trimmed electrocardiograph model, evaluating the model effect, if the model evaluation result does not meet the preset requirement, adjusting model parameters, and repeating the model pre-training to obtain the final productThe model is finely adjusted until the model evaluation result meets the preset requirement;

in the application stage, the acquired multi-lead electrocardiosignals and the characteristic information of the patient are input into a trained model to obtain probability information corresponding to various set electrocardiosignal types.

Preferably, the patient characteristic information includes age, sex and abnormal electrocardiosignal, the electrocardiosignal type tag represents whether multi-mode data matched with the part of multi-lead electrocardiosignal contains information of a certain cardiovascular disease, the multi-mode data is CT, ultrasonic, contrast or nuclear magnetic data acquired for the same patient, and the cardiovascular disease includes at least one of adult congenital heart disease, valvular disease, coronary heart disease, cardiomyopathy and pulmonary vascular disease.

Preferably, the model pre-training comprises the steps of:

a. randomly initializing model parameters;

b. in the multi-lead electrocardiosignal training set D _1,train The multi-lead electrocardiosignal verification set D _1,vali Performing self-supervision learning on a cutter, a double-classification masker, an encoder and a decoder based on the model;

c. at the associated data training set D _2,train Associated data verification set D _2,vali And performing punishment self-supervision learning on the basis of the cutter, the double-classification masker, the encoder, the decoder and the classifier of the model.

Preferably, the model-based cutter, the dual-classification mask, the encoder and the decoder self-supervised learning comprises the following steps:

forward propagation of the D _1,train Sequentially passing through a cutter and a double-classification masker to obtain a self-training vector group and a transformed self-estimation vector group, splicing the self-training vector group and one classification vector, sequentially passing through an encoder and a decoder, and outputting a group of prediction vectors as estimation results of the transformed self-estimation vector group; wherein the classification vector is a preset learnable classification vector;

parameter updating, at D, with a self-monitoring loss function reflecting errors between the prediction vector and the transformed set of self-estimated vectors as an objective function _1,train Updating all the learnable parameters in the encoder and decoder using the optimizer;

in verification set D _1,vali The optimal first superparameter combination is chosen such that the self-supervising loss function is minimal compared to the total amount of all learnable parameters in the decoder.

Preferably, the transformation includes at least one of sampling dimension reduction, element-to-element power exponent, normalization within a vector, classification according to a threshold, the self-supervising loss function includes l ₁ Loss/l ₂ One of the loss and cross entropy loss functions; the first hyper-parameter combination includes a hidden dimension and an attention header of a transducer sub-block of the decoder in a model.

Preferably, the punishment self-supervision learning comprises the following steps:

forward propagation, D _2,train Sequentially passing through a cutter and a double-classification masker to obtain a self-training vector group and a transformed self-estimated vector group, splicing the self-training vector group and one classification vector, and outputting the encoded self-training vector group and the encoded classification vector through an encoder, wherein the one classification vector is a preset learnable classification vector; the encoded self-training vector group and the encoded classification vector enter branch one, the encoded classification vector and the D _2,train The patient characteristic information in the model is entered into a branch II;

the decoder processes the input self-training vector group after coding to obtain a predictive vector which is used for estimating the self-estimated vector group after transformation;

branch two, the encoded classification vector and the D _2,train Inputting the patient characteristic information into a classifier for processing to obtain the prediction probability of the electrocardiosignal type;

parameter updating, taking penalty loss function as objective function, at D _2,train Used on the upper partThe optimizer updates all the learnable parameters in the encoder, the decoder and the classifier, wherein the penalty loss function is a self-supervision loss function +lambda cross entropy of a self-estimation vector group after prediction vector and transformation, wherein lambda is tens of super-parameters, and the cross entropy represents the prediction probability of the electrocardiosignal type and the cross entropy loss of the electrocardiosignal type label;

in verification set D _2,vali The optimal super parameter lambda is selected to maximize the selection measurement index of electrocardiosignal type identification, wherein the selection measurement index comprises AUC and F _β -one of score, accuracy.

Preferably, the electrocardiographic self-supervision model comprises:

a cutter for cutting each input electrocardiosignal into a number of columns K and a number of columns d _patch D of mutually exclusive _v Vectorizing the submatrices to obtain the element number d _v Is { x } of the global vector group of electrocardiosignals _1,…, x _dv }, where d _patch The super parameter is one, K is the electrocardio lead number;

double-classification masker for receiving electrocardiosignal full-vector group { x } _1,…, x _dv Equal probability random extraction of T+T 'vectors from which T+T'. Ltoreq.d is not put back _v The first T vectors form a self-training vector group, the last T 'vectors form an estimated vector group, T and T' are respectively a super parameter III and a super parameter IV, the output is the self-training vector group and a self-estimated vector group, and then the self-estimated vector group is transformed to obtain a transformed self-estimated vector group;

encoder consisting of projection layer, position embedding layer, and L hidden dimensions d connected in sequence _encoder The attention head is h _encoder Is formed by sequentially connecting transducer sub-blocks, wherein L is super-parameter five and d _encoder Is super parameter six, h _encoder In the pre-training stage, the input of the encoder is a self-training vector group and a classification vector, and the output is the encoded self-training vector group and classification vector; in the fine tuning and testing stage, the input of the encoder is an electrocardiosignal full vector group and a classification vector, and the output isA group of coded electrocardiosignal full vector groups and a classification vector, wherein the classification vector is a manually added learnable classification vector;

Decoder, composed of restoring layer, position embedding layer, 1 hidden dimension d _decoder The attention head is h _decoder And 1 full link layer, wherein d _decoder Eight is a super parameter, wherein h _decoder For super-parameter nine, the decoder is only used in the pre-training stage, the decoder inputs the encoded self-training vector group and classification vector, and decodes and outputs the prediction vector;

the classifier consists of a full connection layer and an activation layer, inputs the classification vectors after coding and characteristic information, and outputs a prediction probability value of the electrocardiosignal type.

Preferably, the data preprocessing includes the steps of:

a. filtering and denoising the electrocardiosignal;

b. the electrocardiosignals after the filtering treatment are standardized, so that the data range is between-1 and 1;

c. filling the standardized electrocardiosignal with a column with the value of 0 to ensure that the number of the filled column can be divided by the super parameter one to obtain the electrocardiosignal X _i ，i=1,...n；

d. Performing min-max standardization processing on the numerical variables in the patient characteristic information, and performing 0-1 coding on the classification variables in the patient characteristic information to obtain the patient characteristic information z of the electrocardio _j ，j=1,...m；

e. Acquisition of a Multi-lead electrocardiographic Signal dataset D ₁ Associated dataset D ₂ Wherein D is ₁ ={X _i I=1, …, n } represents all electrocardiographic signals; d (D) ₂ ={(X _j, z _j , y _j ) J=1, …, m } represents an associated dataset, where X _j Representing multi-lead electrocardiosignals, z _j Representing patient characteristic information associated with a multi-lead electrocardiograph signal, y _j Representing an electrocardiograph signal type tag associated with a multi-lead electrocardiograph signal.

Preferably, the model fine tuning comprises the steps of:

forward propagation, D _2,train Inputting the central electrocardiosignal into a cutter for processing to obtain an electrocardiosignal full-vector group, splicing the electrocardiosignal full-vector group with the classification vector, inputting the spliced electrocardiosignal full-vector group into an encoder for processing to obtain an encoded electrocardiosignal full-vector group and an encoded classification vector; sum D of the encoded classification vectors _2,train Inputting the patient characteristic information into a classifier for processing to obtain the prediction probability of the electrocardiosignal type;

parameter updating to predict probability and D for electrocardiosignal type _2,train Cross entropy loss of the electrocardiosignal type label in the center is taken as an objective function, and the cross entropy loss is taken as a D _2,train Updating all the learnable parameters in the encoder and the classifier by using the optimizer;

in the associated data verification set D _2,vali Selecting the optimal second super-parameter combination to maximize the selection measurement index of electrocardiosignal type identification; the second super-parameter combination comprises the number d of columns of the cut central electric signal of the cutter _patch The number d of the submatrices after the central electric signal of the cutter is cut _v The number of vectors T of the self-training vector group, the number of vectors T' of the self-estimated vector group, the number of transducer sub-blocks included in the encoder, the hidden dimension L of the transducer sub-blocks in the encoder, the attention header h of the transducer sub-blocks in the encoder _encoder 。

Preferably, the model test comprises the steps of:

in the associated data test set D _2,test And finally, evaluating the model effect by selecting the measurement index, if the model evaluation result meets the preset requirement, allowing the model to be used, and if the model evaluation result does not meet the preset requirement, adjusting the model parameters, and repeating the model pre-training and fine tuning steps until the model evaluation result meets the preset requirement.

A type recognition system of multi-lead electrocardiosignals comprises a data acquisition module, a data preprocessing module, a data set dividing module, a model generating module and a service calculating module, wherein

The data acquisition module is used for acquiring training data, and comprises n multi-lead electrocardiosignals, and patient characteristic information and electrocardiosignal type labels associated with part of the multi-lead electrocardiosignals in the n multi-lead electrocardiosignals;

The data preprocessing module is used for generating a multi-lead electrocardiosignal data set D representing all electrocardiosignals based on the n multi-lead electrocardiosignals ₁ The corresponding sample size is n; based on the n multi-lead electrocardiosignals, the patient characteristic information and the electrocardiosignal type label which are related to the multi-lead electrocardiosignals, the multi-lead electrocardiosignals of which the patient characteristic information or the electrocardiosignal type label is missing are removed, and a related data set D is generated ₂ The corresponding sample size is m, wherein m is less than or equal to n;

the data set dividing module is used for dividing the multi-lead electrocardiosignal data set D ₁ Divided into multiple lead electrocardiosignal training set D _1,train And multi-lead electrocardiosignal verification set D _1,vali The associated data set D ₂ Dividing into associated data training sets D _2,train Associated data verification set D _2,vali Associated data test set D _2,test ；

The model generation module is used for completing model training based on the multi-lead electrocardiosignal, the characteristic information, the label information and the built model frame, and obtaining and storing a trained electrocardio self-supervision model;

the service calculation module is used for receiving the electrocardiosignal type identification request and calling the trained electrocardiosignal self-supervision model to obtain probability information corresponding to the set electrocardiosignal types.

Preferably, the model generating module comprises a sample library, a model training engine and a model library, wherein the sample library is a multi-lead electrocardiosignal data set D sent by the data set dividing module ₁ Associated data set D ₂ And finishing the storage; the model training engine is used for completing model training based on a multi-lead electrocardiosignal data set and an associated data set stored in a sample library; the model library is used for storing a trained electrocardiographic self-supervision model;

the service computing module comprises a service triggering engine and a model computing engine; the service triggering engine is used for receiving the type identification request of the electrocardiosignal and the data carried by the request, and sending the request to the model calculation engine, wherein the data carried by the request comprises the multi-lead electrocardiosignal and the characteristic information of the patient; the model calculation engine is used for calling a trained electrocardio self-supervision model, obtaining probability information corresponding to various set electrocardio signal types based on multi-lead electrocardio signals carried by the request and the characteristic information of the patient, and completing result storage.

Preferably, the type recognition system of the multi-lead electrocardiosignal further comprises a front-end interaction module and a dynamic monitoring module,

The front-end interaction module comprises an identification result presentation sub-module and a label storage sub-module; the recognition result presentation submodule is used for carrying out visual prompt based on probability information corresponding to various electrocardiosignal types obtained by the service calculation module; the label storage sub-module is used for automatically acquiring a final electrocardiosignal type label generated in the application process and completing storage;

the dynamic monitoring module comprises a service monitoring evaluation submodule and a service update trigger engine; the service monitoring and evaluating sub-module is used for evaluating the model identification effect in real time based on the type label generated in the automatic accumulation application process; and the service update triggering engine is used for automatically triggering the update of the model and the service when the model effect does not meet the preset requirement, and realizing the dynamic optimization update of the model.

Preferably, the type recognition system of the multi-lead electrocardiosignal further comprises an electrocardio interpretation module, wherein an electrocardio interpretation model is arranged in the electrocardio interpretation module and is used for interpreting an electrocardiogram and recognizing the condition that the type of the electrocardiosignal is arrhythmia.

The intelligent electrocardio-assisting system comprises a type recognition system of the multi-lead electrocardiosignal and a knowledge base, wherein the knowledge base stores processing suggestions, and when the type recognition system of the multi-lead electrocardiosignal gives a type recognition result, the knowledge base is called to output the processing suggestions meeting preset conditions in the knowledge base.

An electronic device, comprising: a processor;

and a memory storing a program configured to implement the type recognition method of multi-lead electrocardiographic signals when executed by the processor.

A non-transitory computer readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the type recognition method of multi-lead electrocardiographic signals.

The invention builds an electrocardio self-supervision learning model, can effectively utilize the electrocardio data which cannot be utilized in the prior art by a self-supervision learning method, improves the electrocardio data utilization rate, properly reduces the sample size requirement on multi-mode data resources, promotes the research of the association relation between the electrocardio data and various electrocardio signal types (such as various disease types) mined from the electrocardio signals, promotes the type identification based on the electrocardio signals, and potentially expands the application boundary of the current electrocardiograph interpretation.

Drawings

FIG. 1 is a flow chart of a method for identifying the type of multi-lead electrocardiosignals in embodiment 1 of the invention;

FIG. 2 is a model pre-training flowchart of embodiment 1 of the present invention;

FIG. 3 is a diagram of a model architecture according to embodiment 1 of the present invention;

FIG. 4 is a schematic diagram of a type recognition system for multi-lead electrocardiosignals according to embodiment 1 of the invention;

FIG. 5 is a schematic diagram of a model generating module according to embodiment 1 of the present invention;

FIG. 6 is a schematic diagram of a type recognition system for multi-lead electrocardiosignals according to embodiment 2 of the invention;

fig. 7 is a schematic diagram of a type recognition system (with an electrocardiographic interpretation module) of a multi-lead electrocardiographic signal according to embodiment 2 of the present invention;

fig. 8 is a schematic diagram of an electronic device according to embodiment 2 of the present invention.

Description of the embodiments

In order to make the objects, technical means and advantages of the present application more apparent, the present application is further described in detail below with reference to the accompanying drawings.

The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The terms "comprising" and "having" and any variations thereof in the description and claims of the present application and in the foregoing drawings are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The technical scheme of the present application is described in detail below with specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

Example 1;

as shown in fig. 1, a type recognition method of a multi-lead electrocardiosignal includes the following steps:

data acquisition, namely acquiring n multi-lead electrocardiosignals, and patient characteristic information and electrocardiosignal type label information associated with part of the multi-lead signals in the n multi-lead electrocardiosignals;

in the data acquisition process, not every multi-lead electrocardiosignal can acquire the associated electrocardiosignal type label information of the patient, and a part of multi-lead electrocardiosignals lack of the electrocardiosignal type label of the patient, so that the multi-lead electrocardiosignals without the associated electrocardiosignal type label information cannot be utilized in the existing technology mainly based on supervised learning; in the application, the multi-lead electrocardiosignals of the electrocardiosignal type label information related to the missing can also be applied to the training of an electrocardiosignal self-supervision model; based on this, the patient characteristic information and the electrocardiograph signal type label acquired in the data acquisition operation may be associated with m' multi-lead electrocardiograph signals of the n multi-lead electrocardiograph signals, that is, for the n multi-lead electrocardiograph signals acquired, only the patient characteristic information and the electrocardiograph signal type label in which a part of the multi-lead electrocardiograph signals exist are associated; m' is less than or equal to n;

Data preprocessing, generating a multi-lead electrocardiosignal data set D representing all electrocardiosignals based on the n multi-lead electrocardiosignals ₁ The corresponding sample size is n; based on the partial multi-lead electrocardiosignals, the patient characteristic information and the electrocardiosignal type label which are related to the partial multi-lead electrocardiosignals, the multi-lead electrocardiosignals with the missing electrocardiosignal type label are removed, and a related data set D is generated ₂ The corresponding sample size is m, wherein m is less than or equal to m' is less than or equal to n; i.e. multi-lead electrocardiosignal data set D ₁ In the method, the number of multi-lead electrocardiosignals with associated patient characteristic information and electrocardiosignal type label information is m ', and m multi-lead electrocardiosignals in m' are used for forming an associated data set D ₂ ；

That is, the n multi-lead electrocardiosignals acquired in the data acquisition step are preprocessed to obtain n electrocardiosignals X _i I=1..n, n electrocardiographs make up a multi-lead electrocardiograph data set D ₁ ={X _i I=1, …, n }, preprocessing m multi-lead electrocardiosignals with associated electrocardiosignal type labels in the acquired n multi-lead electrocardiosignals, and patient characteristic information and electrocardiosignal type label information respectively associated with the m multi-lead electrocardiosignals to obtain m electrocardiosignals X _j M preprocessed patient characteristic information z _j Label information y of m electrocardiosignal types _j Composition of the associated dataset d2= { (X _j ,z _j ,y _j ):j=1,…,m}。

In this embodiment, m=8000, n=300000, and m is much smaller than n.

Data setDividing the multi-lead electrocardiosignal data set D ₁ Divided into multiple lead electrocardiosignal training set D _1,train And multi-lead electrocardiosignal verification set D _1,vali The associated data set D ₂ Dividing into associated data training sets D _2,train Associated data verification set D _2,vali Associated data test set D _2,test ；

Wherein when dividing, D _2,train The electrocardio signal data in the heart belongs to D _1,train ，D _2,vali The electrocardio signal data in the heart belongs to D _1,vali . Wherein D is _2,train ,D _2,vali And D _2,test The probabilities of belonging to the same type of electrocardiosignal are almost identical. Divided multi-lead electrocardiosignal training set D _1,train And multi-lead electrocardiosignal verification set D _1,vali Includes a large number of electrocardiosignals which are unavailable in the prior art and lack associated characteristic information of patients and electrocardiosignal type label information. Wherein, preferably, the multi-lead electrocardiosignal training set D _1,train And multi-lead electrocardiosignal verification set D _1,vali Mutually exclusive, associated data training set D _2,train Associated data verification set D _2,vali Associated data test set D _2,test Are mutually disjoint.

The electrocardiograph self-supervision model framework is constructed, as shown in fig. 3, wherein the model is based on a transducer module and comprises a cutter, a double-classification masker, an encoder, a decoder and a classifier;

modelTesting, based on the associated data test set D _2,test Testing the trimmed electrocardiograph model, evaluating the model effect, if the model evaluation result does not meet the preset requirement, adjusting the model parameters, repeating the steps of model pre-training and model trimming until the evaluation result of the electrocardiograph self-supervision model meets the preset requirement;

in the application stage, the acquired multi-lead electrocardiosignals and associated patient characteristic information are input into a trained electrocardio self-supervision model to obtain probability information corresponding to various set electrocardiosignal types.

The invention builds an electrocardio self-supervision learning model, and utilizes electrocardio signal data which cannot be utilized in the prior art, such as a multi-lead electrocardio signal training set D, through a self-supervision learning method in model pre-training _1,train Multi-lead electrocardiosignal verification set D _1,vali The electrocardiosignals of the electrocardiosignal type label are deleted, and only a small amount of electrocardiosignal type label information, such as a related data training set D, is used in consideration of the information of the electrocardiosignals and the electrocardiosignal type information corresponding to the electrocardiosignals _2,train Associated data verification set D _2,vali Associated data test set D _2,test The data in the data processing module can be used for obtaining an electrocardio self-supervision model for carrying out electrocardio signal type identification, and can be used for carrying out type identification according to electrocardio signals. The model provided by the invention can identify various electrocardiosignal types which are difficult to be found by human eyes, including the electrocardiosignal types associated with adult congenital heart disease, valvular disease, coronary heart disease, cardiomyopathy and pulmonary vascular diseases. Because the used electrocardiosignal type label information is less, a new solution for solving the problem is provided for a plurality of medium-sized medical institutions/research rooms with unoccupied data resources.

The whole processing flow of the electrocardiosignal type identification method can be seen, and specifically comprises three parts: the method comprises the steps of preparing training data, training an electrocardio self-supervision model, and carrying out type identification on electrocardio signals by using the trained electrocardio self-supervision model. The method comprises the steps of data acquisition, data preprocessing and data set division, wherein the data acquisition, the data preprocessing and the data set division are used for preparing training data, and a complete training process of the electrocardio self-supervision model is formed by constructing a structure of the electrocardio self-supervision model, pre-training the model, fine-tuning the model and testing the model. By training to obtain a proper electrocardio self-supervision model, the electrocardio self-supervision model can be utilized to conveniently and effectively identify the type of electrocardio signals. Next, a training process of the electrocardiographic self-supervision model will be described in detail.

In the training of the electrocardiographic self-supervision model, training data used comprises patient characteristic information and electrocardiographic signal type label information associated with part of the multi-lead electrocardiographic signals besides the acquired multi-lead electrocardiographic signals.

Wherein the patient characteristic information includes age, sex, and abnormal electrocardiosignal conditions, wherein the abnormal electrocardiosignal conditions can be obtained by manually marking the electrocardiosignal by an expert, and the abnormal conditions include, but are not limited to, ST-T change, T wave abnormality, left ventricular high voltage, sinus bradycardia, abnormal Q wave, atrial fibrillation, complete right bundle branch block, sinus arrhythmia, ST segment change, ventricular premature beat, P wave abnormality, incomplete right bundle branch block, electric axis left deviation, one-degree atrioventricular block, sinus tachycardia, atrial premature beat undelived, right ventricular hypertrophy, ventricular pacing cardiac rhythm, nonspecific indoor conduction block, complete left bundle branch block and left anterior branch block;

the electrocardiosignal type label represents whether the multi-mode data matched with the partial multi-lead electrocardiosignal contains the information of a certain cardiovascular disease or not, namely the label is a label of a certain type of electrocardiosignal; alternatively, it may be determined manually by an expert, wherein the cardiovascular disease comprises adult heart disease, valvular disease, coronary heart disease, cardiomyopathy, pulmonary vascular disease. Valvular diseases include, but are not limited to, aortic stenosis, aortic insufficiency, mitral stenosis, and mitral insufficiency. The congenital heart disease includes, but is not limited to, atrial septal defects and ventricular septal defects. Cardiomyopathy includes, but is not limited to, hypertrophic cardiomyopathy. Coronary heart disease includes, but is not limited to, acute myocardial infarction and angina pectoris. Pulmonary vascular diseases include, but are not limited to, pulmonary arterial hypertension and pulmonary embolism. For valvular disease, cardiomyopathy, coronary heart disease and pulmonary vascular disease, the multi-mode data matched with the electrocardiosignals of the patient are CT, ultrasonic, radiography or nuclear magnetic data acquired by the same patient within 90 days before and after the electrocardiosignals are acquired; for adult coronary heart disease, the multi-mode data matched with the electrocardiosignals of the patient are ultrasonic data of the same patient at any time.

As described above, the training of the electrocardiographic self-monitoring model comprises the steps of constructing a model structure, model pre-training, model fine-tuning and model testing, and three processes of model pre-training, model fine-tuning and model testing are respectively described in detail below in combination with the constructed model structure.

As shown in fig. 2, the pre-training process may specifically include the following steps:

a. randomly initializing model parameters;

b. in the multi-lead electrocardiosignal training set D _1,train Multi-lead electrocardiosignal verification set D _1,vali The cutter, the double-classification masker, the encoder and the decoder based on the electrocardiograph self-supervision model perform self-supervision learning;

as described above, training data for the electrocardiographic self-monitoring model is prepared by the processes of data acquisition, data preprocessing, and data set division before training the self-monitoring model. Wherein, the multi-lead electrocardiosignal training set D is obtained after the processing of data acquisition, data preprocessing and data set division _1,train Multi-lead electrocardiosignal verification set D _1,vali Training set D of associated data _2,train Associated data verification set D _2,vali Associated data test set D _2,test ；

c. Training set D of associated data _2,train Associated data verification set D _2,vali And performing punishment self-supervision learning by using a cutter, a double-classification masker, an encoder, a decoder and a classifier based on the electrocardiographic self-supervision model.

Wherein step a) may comprise using xavier uniform initialization for all transform blocks and normal distribution initialization for other parameters.

The self-supervised learning process of step b may specifically include the steps of:

forward propagation, D _1,train Sequentially passing through a cutter and a double-classification masker to obtain a self-training vector group and a transformed self-estimated vector group, then splicing the self-training vector group and a classification vector (the specific splicing mode can be similar to vision transformer, for example, splicing one self-training vector in the self-training vector group with the self-training vector group), sequentially passing through an encoder and a decoder to obtain a group of predicted vectors output by a decoder, and estimating the transformed self-estimated vector group, namely, taking the predicted vectors output by the decoder as the estimated values of the transformed self-estimated vector group, wherein the classification vector is a preset (specifically, can be manually added) leavable classification vector; wherein the classification vector is a vector of length Kxd _patch Each component of which is a learnable (trainable) parameter; d (D) _1,train The dimension of the electrocardiosignal in the heart is Kxd _patch Where K is the number of electrocardiographic leads, d _patch Is the super parameter one. The learnable parameters may be regarded as a weight parameter of the neural network, which is iteratively updated during the training of the model, as is the case with other weight parameters in the neural network.

Parameter updating, taking a self-supervision loss function reflecting errors between a prediction vector (namely, an estimated value of a transformed self-estimated vector group output by a decoder) and an actual value of the transformed self-estimated vector group (namely, the transformed self-estimated vector group output by a double-classification masker) as an objective function, and taking the self-supervision loss function as an objective function, wherein D is that _1,train Updating all the learnable parameters in the encoder and the decoder by using an optimizer so as to reduce the deviation between the predicted vector and the actual value of the transformed self-estimated vector group, and synchronously updating the classification vector which is required to be input into the encoder and spliced with the self-training vector group in the next iterative training; the optimizer used in this embodiment is an AdamW optimizer with cosine learning rate scheduler, base learning rate 0.001, weight decay 0.05, batch size 256, optimizer impulse β ₁ =0.9, β ₂ =0.95, the number of preheating iterations is 40, the total number of iterations is 400；

The foregoing process of forward propagation and parameter updating is iterated continuously until the training termination condition is satisfied (i.e., the set maximum iteration number is reached), a training model is obtained, and the process of obtaining a training model through multiple iteration processes is referred to as a self-supervised learning training process.

Next, validation set D is output at the dual class mask _1,vali Selecting the optimal first super-parameter combination (the first super-parameter combination comprises eight d super-parameters) _decoder Super-parametric nine hours _decoder ) The self-supervising loss function is minimized compared to the total amount of all learnable parameters in the decoder. The comparison result of the self-supervision loss function and the total amount of all the learnable parameters in the decoder is selected as the optimal combination selection standard of the super-parameter eight and the super-parameter nine, and the larger the total amount of all the learnable parameters in the decoder is, the larger the decoder scale is, the smaller the self-supervision loss function is, the more accurate the model can estimate the transformation of the self-estimated vector group in the pre-training stage, but the larger the model calculation amount is correspondingly. Further, numerical experiments show that model generalization is reduced in the fine-tuning stage when the decoder growth amplitude is much larger than the smaller amplitude of the self-supervision loss function. Based on this, the present application considers balancing between self-supervision loss and decoder parameter quantity, which can balance self-supervision error and calculation quantity of the pre-training stage on one hand, and can help to obtain a model with higher generalization on the other hand.

Further, specific operations of training for performing self-supervised learning on the training set and selecting the optimal first super-parameter combination on the verification set will be described. Assuming N groups of first superparameter combinations to be selected, for each first superparameter combination, a training set D is utilized _1,train Executing the training process of the integral self-supervision learning (namely, the process of generating a model by forward propagation and parameter updating for a plurality of iterations in the self-supervision learning) to obtain a corresponding model, wherein the total value of all the learnable parameters in a decoder of the model is determined; for all first hyper-parameter combinations, N models are obtainedThe total value of all the learnable parameters in the decoder to which the model corresponds may be different. For each of the N models, validation set D will be performed _1,vali The signals of the model are input into the model, the values of the self-supervision loss function are obtained after processing, and then the values are compared with the total quantity of all the learnable parameters in a decoder of the model to obtain a comparison result; and selecting the smallest one from the comparison results of the N models, wherein the first super-parameter combination corresponding to the model is the optimal first super-parameter combination, and taking the selected model as the initial model for penalty self-supervision training in the next step.

In the above self-supervised learning process, the transformation of the self-estimated vector may include one of sampling dimension reduction, element-to-element power exponent, normalization within the vector, classification according to a threshold, and the self-supervised loss function may include l ₁ Loss/l ₂ The loss function and the cross entropy loss function are all common knowledge.

The super-parameters in the electrocardiographic self-supervision model comprise super-parameters one to ten, and the super-parameters one to ten are respectively described later.

Note that, since the dual-classification masker has randomness in extracting the full vector set of the electrocardiograph signals, the self-training vector set and the self-estimation vector set corresponding to each electrocardiograph signal sample need to be fixed in the verification stage, namely, only one dual-classification masker is used for each electrocardiograph signal in the verification set, so as to obtain and store the self-training vector set and the self-estimation vector set, and the stored self-training vector set and the self-estimation vector set are used for selecting the optimal first super-parameter combination in the pre-training verification stage, namely, the super-parameter eight and the super-parameter nine;

the punishment self-supervision learning of the step c comprises the following steps:

forward propagation, D _2,train The electrocardiosignals in the heart pass through a cutter and a double-classification masking device in sequence to obtain a self-training vector group and a transformed self-estimated vector group, wherein the self-training vector group and one classification vector are spliced (the specific splicing mode can be the same as the splicing mode in self-supervision learning) and then input into the codingThe encoder outputs an encoded self-training vector group and an encoded classification vector, wherein the classification vector is a preset (specifically, can be manually added) learnable classification vector; the coded self-training vector group and the coded classification vector enter a branch I to be processed, and the coded classification vector and D _2,train The characteristic information of the patient in the second branch is processed; in the process of punishing self-supervision learning, the transformation of the self-estimated vector group is the same as the transformation in self-supervision learning, and is the prior art.

The first branch is to obtain a predictive vector through a decoder and to be used for estimating a transformed self-estimated vector group, namely, the predictive vector output by the decoder is used as an estimated value of the transformed self-estimated vector group;

branch two, encoded classification vector sum D _2,train The patient characteristic information in the system is used for obtaining the prediction probability corresponding to various preset electrocardiosignal types through a classifier;

Parameter updating, taking penalty loss function as objective function, at D _2,train Updating all the learnable parameters in the encoder, the decoder and the classifier by using an optimizer, wherein the penalty loss function is a self-supervision loss function+lambda cross Entropy, the self-supervision loss function is the same as the self-supervision loss function in the self-supervision learning process, and lambda is tens of super-parameters, wherein the cross Entropy represents the prediction probability (namely the disease prediction probability) about the electrocardiosignal type and the cross entropy loss of label information of the electrocardiosignal type;

the optimizer used in this embodiment is an AdamW optimizer with cosine learning rate scheduler, base learning rate 0.001, weight decay 0.05, batch size 256, optimizer impulse β ₁ =0.9, β ₂ =0.999, the number of warm-up iterations is 10, the total number of iterations is 100;

lambda is penalty weight in the penalty loss function, and can take a preset value;

the foregoing process of forward propagation and parameter updating is iterated continuously until the training termination condition is satisfied (for example, the set maximum iteration number is reached), a training model is obtained, and the process of obtaining a training model through multiple iteration processes is called a training process of punishment self-supervision learning.

Next, in the verification set D _2,vali The optimal super parameter Tex is selected upwards, so that the selection measurement index of the electrocardiosignal classification is maximum, and the selection measurement index comprises AUC and F _β -one of score, accuracy. The value range of λ may be 0.00001-10, and in this embodiment, the optional value of λ may be 0.05,0.1 or 0.5;

wherein AUC (Area Under Curve) is the area enclosed by the coordinate axis and the lower part of the working characteristic curve (ROC) of the subject, F _β -score is defined as follows:

，

wherein β is 0.5,1 or 2, precision is the precision representing the duty ratio of the number of samples actually belonging to the corresponding type among the samples which are judged by the model to belong to the certain electrocardiograph signal type, and recall is the recall ratio representing the duty ratio of the number of samples which are judged by the model to belong to the corresponding electrocardiograph signal type among the samples actually belonging to the certain electrocardiograph signal type; accuracy refers to the accuracy of disease classification; the above mentioned measurable indicators are all common knowledge.

Further, a complete operation of performing training of punishment self-supervised learning on the training set and selecting the optimal super-parameters on the verification set will be described. Similarly to the above-described selection of the optimal first hyper-parameter combination, assuming that there are M hyper-parameters to be selected of several tens λ, for each value, the model finally selected in the self-supervised learning is used as the initial model, and the training set D is used _2,train Executing the training process of the whole punishment self-supervision learning (namely, the process of generating a model by forward propagation and parameter updating for a plurality of iterations in the punishment self-supervision learning) to obtain a corresponding model; for all lambda values, M models are obtained, for each of which a validation set D is to be obtained _2,vali The electrocardiosignal in the heart is input into a cutter for processing to obtain an electrocardio total signal vector group, and then the electrocardio total signal vector group is classifiedThe vector is spliced and then input into an encoder for processing, so as to obtain an encoded electrocardio total signal vector group and an encoded classification vector, wherein the encoded classification vector depends on the electrocardio total signal vector group according to the structure of a transducer; finally, the encoded classification vector sum D _2,vali Inputting the patient characteristic information in the classifier, processing to obtain the prediction probability of the electrocardiosignal type, and then synthesizing all the prediction probabilities to determine the value of the selection measurement index of the electrocardiosignal type; and selecting the largest one from the selected measurement index values of the M models, wherein the value of tens of lambda of the super parameter corresponding to the model is the optimal lambda, and taking the selected model as the initial model for performing the fine adjustment processing of the model in the next step.

As described above, as shown in fig. 3, the electrocardiographic self-monitoring model constructed in the present application includes a slicer, a dual-classification mask, an encoder, a decoder, and a classifier, and next, detailed descriptions are given for specific functions of each component, and detailed descriptions are given for ten super-parameters:

a cutter for cutting each input electrocardiosignal into a number of columns K and a number of columns d _patch D of mutually exclusive _v Vectorizing the submatrices to obtain the element number d _v Is { x } of the global vector group of electrocardiosignals ₁ ,…,x _dv }, where d _patch Is super parameter one, d _v The super parameter is II, K is the electrocardio lead number;

optionally, K is 12, super parameter d _patch 10-200, super-parameter two d _v 25-500, in this example k=12, d _patch =25, d _v =200。

Double-classification masker for receiving electrocardiosignal full-vector group { x } ₁ ,…,x _dv Equal probability random extraction of T+T 'vectors from which T+T'. Ltoreq.d is not put back _v The first T vectors form a self-training vector group, the last T 'vectors form an estimated vector group, T and T' are respectively super-parameter three and super-parameter four, and the output is the self-training vector group and the self-estimated vector group; then, the self-estimated vector group is transformed to obtain a transformed self-estimated vector group;

The super parameter three T is 5-400, the super parameter four T 'is 5-400, in this embodiment, t=50, T' =100.

Encoder consisting of projection layer, position embedding layer, and L hidden dimensions d connected in sequence _encoder The attention head is h _encoder Is formed by sequentially connecting transducer sub-blocks, wherein L is super-parameter five and d _encoder Is super parameter six, h _encoder In the pre-training stage, the input of the encoder is a self-training vector group and a classification vector, and the output is the encoded self-training vector group and classification vector; in the fine tuning and testing stage, the input of the encoder is an electrocardiosignal full vector group and a classification vector, and the input of the encoder is a group of coded electrocardiosignal full vector group and a classification vector, wherein the classification vector is a learnable classification vector added manually;

wherein the super parameter five L is 1-32, and the super parameter six d _encoder 128-1280, super parameter seven h _encoder 3-16, l=12, d in this embodiment _encoder =384，h _encoder =6 or l=12, d _encoder =256，h _encoder =4。

Decoder, composed of restoring layer, position embedding layer, 1 hidden dimension d _decoder The attention head is h _decoder A transducer sub-block, and 1 full link layer, wherein d _decoder Eight is a super parameter, wherein h _decoder For super-parameter nine, the decoder is only used in the pre-training stage, the decoder inputs the encoded self-training vector group and classification vector, and decodes and outputs the prediction vector;

Super parameter eight d _decoder 128-1280, super-parameters for nine hours _decoder 4-10, d in this embodiment _decoder =256，h _decoder =8 or d _decoder =128，h _decoder =4。

The classifier consists of 1 full-connection layer and 1 activation layer taking sigmoid as an activation function, and inputs a coded self-training vector group output by the encoder and outputs a prediction probability value of an electrocardiosignal type.

In addition, the data preprocessing can be performed according to the following steps:

a. filtering and denoising the electrocardiosignal;

d. Performing min-max standardization processing on numerical variables in the characteristic information of the patient, and performing 0-1 coding on classification variables in the characteristic information to obtain the characteristic information z of the patient of the electrocardio _j ，j=1,...m；

e. Acquisition of a Multi-lead electrocardiographic Signal dataset D ₁ Associated dataset D ₂ Wherein D is ₁ ={X _i I=1, …, n } represents all electrocardiographic signals; d (D) ₂ ={(X _j ,z _j ,y _j ) J=1, …, m } represents the residual electrocardiosignals and the characteristic information of the patient and the electrocardiosignal type label after the multi-lead electrocardiosignals with the characteristic information of the patient or the electrocardiosignal type label missing are removed, wherein X is the sum of the characteristic information of the patient and the electrocardiosignal type label _j After removing the multi-lead electrocardiosignals of the characteristic information or the electrocardiosignal type label deletion of the patient, the residual electrocardiosignals, z _j For patient characteristic information, y _j Is an electrocardiosignal type label.

The multi-lead electrocardiosignal is a numerical matrix of K multiplied by S, wherein K is the number of leads, S is the number of collected sample points, and the electrocardiosignal is filtered and denoised; normalizing the filtered electrocardiosignals to ensure that the data range is between-1 and 1; filling the standardized electrocardiosignal with a column with the value of 0 to ensure that the number of the filled column can be divided by the super parameter one to obtain the electrocardiosignal X _i I=1,..n; performing min-max standardization processing on the numerical variables in the characteristic information, and performing 0-1 coding on the classification variables in the characteristic information to obtain the artificial characteristic z of the electrocardio _j Where the numeric variable refers to a value that is numeric data, such as age, the class type variable is a name of the class of the thing, and the value is class data, such as gender; obtaining the obtainedTaking a multi-lead electrocardiosignal data set D ₁ Associated dataset D ₂ Wherein D is ₁ ={X _i I=1, …, n } represents all electrocardiographic observations; d (D) ₂ ={(X _j ,z _j ,y _j ) J=1, …, m } represents the rest electrocardiosignals, the patient characteristic information and the electrocardiosignal type label information after the multi-lead electrocardiosignals with the characteristic information of the patient or the electrocardiosignal type label information missing are removed; wherein y is _j Is an electrocardiosignal type label.

For example, i=1, x ₁ Is a 12 x 5000 matrix of values; j=1, z ₁ Is a one-dimensional array, (0.5,1,1,1,0,1,1,1,0,1,1,1,0,1,1,1,0,1,1,1,0,1,0) where 0.5 represents age, the second 1 represents male, and the following 1 or 0 represents whether there is some abnormality in an electrocardiograph signal; y is ₁ =0, indicating that there is no cardiovascular disease.

The model fine tuning comprises the following steps:

forward propagation, D _2,train Inputting the central electrocardiosignal into a cutter for processing to obtain an electrocardiosignal full-vector group, splicing the electrocardiosignal full-vector group and the classification vector, inputting the spliced electrocardiosignal full-vector group and the classification vector into an encoder for processing to obtain an encoded electrocardiosignal full-vector group and an encoded classification vector; sum of the encoded classification vectors D _2,train Inputting the patient characteristic information into a classifier for processing to obtain the prediction probability of the electrocardiosignal type;

parameter updating to reflect the prediction probability and D of electrocardiosignal type _2,train Cross entropy loss function between electrocardiosignal type labels in the center is taken as an objective function, and D is taken as _2,train Updating all the learnable parameters in the encoder and the classifier by using the optimizer;

the foregoing process of forward propagation and parameter updating is iterated continuously until the training termination condition is satisfied (for example, the set maximum number of iterations is reached), a training model is obtained, and the process of obtaining a training model through multiple iteration processes is referred to as a fine tuning training process.

Next, in the association data validation set D _2,vali Upper selectionA second optimal superparameter combination (specifically comprising superparameter d _patch Super-parameter two d _v Super-parameter three T, super-parameter four T', super-parameter five L, super-parameter six d _encoder Super-parameter seven hours _encoder ) And the selection measurement index of electrocardiosignal classification is maximized.

Wherein the cross entropy loss function is common knowledge, and the selection metric comprises AUC, F _β -one of score, accuracy; the optimizer used in this embodiment is an AdamW optimizer with cosine learning rate scheduler, base learning rate 0.001, weight decay 0.05, batch size 256, optimizer impulse β ₁ =0.9, β ₂ =0.999, the number of warm-up iterations is 5, and the total number of iterations is 50.

In addition, the complete operation of the pre-training and fine tuning will be described in connection with the pre-training described above. Assuming that there are X second superparameter combinations to be selected, for each superparameter combination, training set D is utilized _1,train And D _2,train Verification set D _1,vali And D _2,vali The method comprises the steps of obtaining an optimal first super-parameter combination, an optimal super-parameter dozens and a corresponding punishment self-supervision learned model A through pre-training treatment, taking the punishment self-supervision learned model A as an initial model, and executing a complete fine-tuning training process in fine tuning (namely, a process of generating a model through forward propagation and parameter updating for a plurality of iterations in fine tuning) by utilizing a training set to obtain a corresponding model; for all second hyper-parameter combinations, X models are obtained, for each of which a validation set D is to be obtained _2,vali The signals of the model are input into an encoder and a classifier of the model, and the values of the selection measurement indexes of the electrocardiosignal classification are obtained after the signals are processed; selecting the largest one from the selected metric index values of the X models, wherein the second super-parameter combination corresponding to the model is the optimal second super-parameter combination, taking the selected model as the model to be tested, and executing subsequent test processing;

the model test comprises the following steps:

in the associated data test set D _2,test In the above, model efficiency is evaluated by selecting a metricIf the model evaluation result meets the preset requirement, the model can be used, if the model evaluation result does not meet the preset requirement, the model parameters are adjusted, and the model pre-training and fine tuning steps are repeated until the model evaluation result meets the preset requirement.

In the fine tuning and testing stage, the input of the encoder is an electrocardiosignal full vector group and a classification vector, and the input of the encoder is a group of coded electrocardiosignal full vector group and a coded classification vector, wherein the classification vector is a learnable classification vector added manually; the encoded classification vector enters a classifier together with the characteristic information of the patient, and is output as a prediction probability value of the electrocardiosignal type; and comparing the predicted probability value of the electrocardiosignal type with the true value, and evaluating the model effect by selecting a measurement index. The preset requirements are: AUC 0.9 or more or F1-score 0.75 or more or accuracy 80% or more. In this embodiment, AUC 0.9 or more.

As shown in FIG. 4, the type recognition system of the multi-lead electrocardiosignal comprises a data acquisition module, a data preprocessing module, a data set dividing module, a model generating module and a service calculating module, wherein

The data acquisition module is used for acquiring training data, and comprises n multi-lead electrocardiosignals, and patient characteristic information and electrocardiosignal type labels associated with part of the multi-lead electrocardiosignals in the multi-lead electrocardiosignals;

a data preprocessing module for generating a multi-lead electrocardiosignal data set D representing all electrocardiosignals based on the n multi-lead electrocardiosignals ₁ The corresponding sample size is n; based on the n multi-lead electrocardiosignals, the patient characteristic information and the electrocardiosignal type label information which are related to the multi-lead electrocardiosignals, the multi-lead electrocardiosignals of the patient characteristic information or the electrocardiosignal type label information are removed, and a related data set D is generated ₂ The corresponding sample size is m, wherein m is less than or equal to n;

The model generation module is used for completing model training based on the multi-lead electrocardiosignal, the characteristic information, the label information and the built model frame to obtain a trained electrocardio self-supervision model;

Based on the electrocardio self-supervision model, the type identification system utilizes electrocardio signal data which cannot be utilized in the prior art through a self-supervision learning method, can obtain probability information corresponding to various electrocardio signal types (such as electrocardio signal types associated with various cardiovascular diseases) by using a small amount of multi-modal data, and can be used for early screening of cardiovascular diseases which cannot be identified in the prior art, such as adult heart disease, valvular disease, coronary heart disease, cardiomyopathy, pulmonary heart disease and the like.

As shown in FIG. 5, the model generation module comprises a sample library, a model training engine and a model library, wherein the sample library is used for generating a multi-lead electrocardiosignal data set D based on a data acquisition module, a data preprocessing module and a data set dividing module ₁ Associated data set D ₂ And finishing the storage; the model training engine is used for completing model training based on a multi-lead electrocardiosignal data set and an associated data set stored in a sample library; the model library is used for storing a trained electrocardiographic self-supervision model;

the service computing module comprises a service triggering engine and a model computing engine; the service triggering engine is used for receiving the type identification request of the electrocardiosignal and sending the type identification request to the data acquisition module; the data acquisition module is used for automatically acquiring data required by model prediction corresponding to the received electrocardiosignal type identification request, including multi-lead electrocardiosignals and characteristic information, and sending the data to the model calculation engine; the model calculation engine is used for calling a trained electrocardio self-supervision model, obtaining probability information corresponding to various set electrocardio signal types based on the multi-lead electrocardio signals and the characteristic information, and completing result storage.

Example 2

As shown in fig. 6, the type recognition system of multi-lead electrocardiograph signals in this embodiment is basically the same as that in embodiment 1, and is different in that it further includes a front-end interaction module and a dynamic monitoring module, where the front-end interaction module includes a recognition result presentation sub-module and a tag storage sub-module; the recognition result presentation submodule is used for displaying probability information corresponding to various electrocardiosignal categories obtained by the service calculation module; the label information storage sub-module is used for determining a final electrocardiosignal type label according to probability information corresponding to various electrocardiosignal types; particularly, the electrocardiosignal type labels generated in the model application process are dynamically updated to a sample library of the model generation module, new data are continuously accumulated, and updating optimization and iteration of a subsequent model are facilitated;

The dynamic monitoring module comprises a service monitoring evaluation submodule and a service update trigger engine; the service monitoring and evaluating sub-module is used for evaluating the model identification effect in real time based on the electrocardiosignal type label information generated in the automatic accumulation application process; and the service update triggering engine is used for automatically triggering the update of the model and the service when the model effect does not meet the preset requirement, and realizing the dynamic optimization update of the model.

The front-end interaction module provides a visual type recognition result for a clinician, assists the clinician in diagnosis, and simultaneously dynamically updates an electrocardiosignal type label generated in the model application process to a sample library of the model generation module in real time, continuously accumulates new data, and is convenient for updating, optimizing and iterating a subsequent model;

the dynamic monitoring module evaluates the model identification effect in real time based on probability information and label information of the electrocardiosignal type, automatically triggers the update of the model and the service when the model effect does not meet the preset requirement, realizes the dynamic optimization update of the model, and the preset requirement is over AUC 0.9 or over F1-score 0.75 or over 80 percent of accuracy. In this embodiment, AUC 0.9 or more.

As shown in FIG. 7, the system for identifying the type of the multi-lead electrocardiosignal further comprises an electrocardio-judging module, wherein an electrocardio-judging model is arranged in the electrocardio-judging module, so that the electrocardiograph can be judged, and the arrhythmia is identified.

The electrocardiographic interpretation model is prior art and the cardiac arrhythmias include, but are not limited to, sinus arrhythmia, atrial premature beat, ventricular premature beat, atrioventricular block, atrial fibrillation.

The intelligent electrocardio-assisting system comprises the type recognition system of the multi-lead electrocardiosignal and a knowledge base, wherein the knowledge base stores processing suggestions, and when the type recognition system of the multi-lead electrocardiosignal gives a type recognition result, the knowledge base is called to output the processing suggestions meeting preset conditions in the knowledge base. The preset condition is that the processing advice is matched with the electrocardio category identification result.

As shown in fig. 8, an electronic device includes: a processor;

A non-transitory computer readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the type recognition method of multi-lead electrocardiographic signals. The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the invention.

The flowcharts and block diagrams in the figures of the present application illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that the features recited in the various embodiments and/or claims of the present application may be combined and/or combined in various combinations, even if such combinations or combinations are not explicitly recited in the present application. In particular, the features recited in the various embodiments and/or claims of the present application may be combined in various combinations and/or combinations without departing from the spirit and teachings of the application, all of which are within the scope of the disclosure.

The principles and embodiments of the present application are described herein with reference to specific examples, which are provided to assist in understanding the methods and concepts of the present application and are not intended to be limiting. It will be apparent to those skilled in the art that variations can be made in the present embodiments and in the scope of the application in light of the spirit and principles of this application, and any modifications, equivalents, improvements, etc. are intended to be included within the scope of this application.

Claims

1. The type identification method of the multi-lead electrocardiosignal is characterized by comprising the following steps of:

data preprocessing, baseGenerating a multi-lead electrocardiosignal data set representing all electrocardiosignals from the n multi-lead electrocardiosignalsThe corresponding sample size is n; generating an association data set based on the partial multi-lead electrocardiograph signal and the patient characteristic information and the electrocardiograph signal type label associated with the partial multi-lead electrocardiograph signal >The corresponding sample size is m, wherein m is less than or equal to n;

dividing the data set, namely dividing the multi-lead electrocardiosignal data setDividing into multiple lead electrocardiosignal training set +.>And a multi-lead electrocardiographic signal verification set +.>Associating the data set +.>Dividing into associated data training sets->Associated data verification set +.>Associative data test set +.>

model pre-training, initializing model parameters, and then training the multi-lead electrocardiosignal training setMulti-lead electrocardiosignal verification set +.>Associative data training set +.>Associated data verification set +.>Inputting a model framework, performing self-supervision learning, punishing the self-supervision learning, and obtaining a pre-trained electrocardiographic self-supervision model;

model fine tuning based on the correlated data training setAssociated data verification set +.>Fine tuning the pre-trained electrocardiographic self-supervision model to finish the training of the model;

model testing based on the associated data test setTesting the trimmed electrocardiograph model, evaluating the model effect, if the model evaluation result does not meet the preset requirement, adjusting model parameters, and repeating the model pre-training and the model trimming until the model evaluation result meets the preset requirement;

In the application stage, inputting the acquired multi-lead electrocardiosignals and patient characteristic information into a trained model to obtain probability information corresponding to various set electrocardiosignal types;

wherein, in the multi-lead electrocardiosignal training setThe multi-lead electrocardiosignal verification set +.>Above, the model-based slicer, dual-classification mask, encoder, decoder perform the self-supervised learning, the self-supervised learning comprising the steps of:

forward propagation of the saidSequentially passing through a cutter and a double-classification masker to obtain a self-training vector group and a transformed self-estimation vector group, splicing the self-training vector group and one classification vector, sequentially passing through an encoder and a decoder, and outputting a group of prediction vectors as estimation results of the transformed self-estimation vector group; wherein the classification vector is a preset learnable classification vector;

parameter updating, in which a self-supervision loss function reflecting an error between the prediction vector and the transformed self-estimated vector group is used as an objective functionUpdating all the learnable parameters in the encoder and decoder using the optimizer;

at the verification set The optimal first super-parameter combination is selected so that the self-supervision loss function is minimum compared with the total amount of all the learnable parameters in the decoder;

at the associated data training setAssociated data verification set +.>The slicer, the dual-classification masker, the encoder, the decoder and the classifier based on the model perform punishment self-supervision learning, and the punishment self-supervision learning comprises the following steps:

forward propagation, willSequentially passing through a cutter and a double-classification masker to obtain a self-training vector group and a transformed self-estimated vector group, splicing the self-training vector group and one classification vector, and outputting the encoded self-training vector group and the encoded classification vector through an encoder, wherein the one classification vector is a preset learnable classification vector; the encoded self-training vector set and the encoded classification vector enter branch one, the encoded classification vector and the +.>The patient characteristic information in the model is entered into a branch II;

Branch two, the encoded classification vector and the codeInputting the patient characteristic information into a classifier for processing to obtain the prediction probability of the electrocardiosignal type;

parameter updating, taking penalty loss function as target function, inUpdating encoder, decoder and decoder using optimizerAll the learnable parameters in the classifier, wherein the penalty loss function is the sum of a self-supervision loss function of a predicted vector and a transformed self-estimated vector group and lambda cross entropy, wherein the cross entropy represents the prediction probability of the electrocardiosignal type and the cross entropy loss of an electrocardiosignal type label, and lambda is tens of super-parameters;

at the verification setThe optimal super parameter lambda is selected to maximize the selection measurement index of electrocardiosignal type identification, wherein the selection measurement index comprises AUC and F _β One of accuracy; wherein AUC is the area enclosed by the coordinate axis and the curve lower part of the working characteristic curve of the subject, < ->Beta is taken to be 0.5,1 or 2, precision is an accuracy rate representing the duty ratio of the number of samples truly belonging to the corresponding type in the samples judged to belong to the certain electrocardiosignal type by the model, recovery is a recall rate representing the duty ratio of the number of samples truly belonging to the corresponding electrocardiosignal type by the model in the samples truly belonging to the certain electrocardiosignal type.

2. The method of claim 1, wherein the patient characteristic information includes age, gender, and an electrocardiographic anomaly, the electrocardiographic type tag representing information whether multimodal data matching the partial multi-lead electrocardiographic signal contains a cardiovascular disease, the multimodal data being CT, ultrasound, contrast, or nuclear magnetic data acquired for the same patient, the cardiovascular disease including at least one of adult coronary heart disease, valvular disease, coronary heart disease, cardiomyopathy, pulmonary vascular disease.

3. The method according to claim 2, characterized in that the model pre-training comprises the steps of:

a. randomly initializing model parameters;

b. in the multi-lead electrocardiosignal training setThe multi-lead electrocardiosignal verification set +.>Performing self-supervision learning on a cutter, a double-classification masker, an encoder and a decoder based on the model;

c. at the associated data training setAssociated data verification set +.>And performing punishment self-supervision learning on the basis of the cutter, the double-classification masker, the encoder, the decoder and the classifier of the model.

4. The method of claim 3, wherein the transforming comprises at least one of sampling a dimension reduction, an element-to-element power exponent, a normalization within a vector, classifying according to a threshold, the self-supervising loss function comprising l ₁ Loss/l ₂ One of the loss and cross entropy loss functions; the first hyper-parameter combination includes a hidden dimension and an attention header of a transducer sub-block of the decoder in a model.

5. The method of claim 3, wherein the electrocardiographic self-monitoring model comprises:

a cutter for cutting each input electrocardiosignal into a number of columns K and a number of columns d _patch D of mutually exclusive _v Vectorizing the submatrices to obtain the element number d _v Is an electrocardio signal full vector groupWherein d is _patch The super parameter is one, K is the electrocardio lead number;

double-classification mask for receiving electrocardiosignal full-vector groupFrom which the equal probability is not put back, T+T 'vectors are randomly extracted, wherein T+T'. Ltoreq.d _v The first T vectors form a self-training vector group, the last T 'vectors form an estimated vector group, T and T' are respectively a super parameter III and a super parameter IV, the output is the self-training vector group and a self-estimated vector group, and then the self-estimated vector group is transformed to obtain a transformed self-estimated vector group;

6. The method according to claim 1 or 5, wherein the data preprocessing comprises the steps of:

a. filtering and denoising the electrocardiosignal;

c. filling the standardized electrocardiosignal with a column with the value of 0 to ensure that the number of the filled column can be divided by the super parameter one to obtain the electrocardiosignal X _i ，i＝1,...n；

d. Performing min-max standardization processing on the numerical variables in the patient characteristic information, and performing 0-1 coding on the classification variables in the patient characteristic information to obtain the patient characteristic information z of the electrocardio _j ，j＝1,...m；

e. Acquiring a multi-lead electrocardiographic signal datasetAssociative dataset +.>Wherein->Representing all of the electrocardiographic signals; />Representing an associated dataset, wherein X _j Representing multi-lead electrocardiosignals, z _j Representing patient characteristic information associated with a multi-lead electrocardiograph signal, y _j Representing an electrocardiograph signal type tag associated with a multi-lead electrocardiograph signal.

7. The method of claim 6, wherein the model fine tuning comprises the steps of:

forward propagation, willInputting the central electrocardiosignal into a cutter for processing to obtain an electrocardiosignal full-vector group, splicing the electrocardiosignal full-vector group with the classification vector, inputting the spliced electrocardiosignal full-vector group into an encoder for processing to obtain an encoded electrocardiosignal full-vector group and an encoded classification vector; summing the encoded classification vectors +. >Inputting the patient characteristic information into a classifier for processing to obtain the prediction probability of the electrocardiosignal type;

parameter updating to predict probability and type of electrocardiosignalCross entropy loss of the electrocardiosignal type tag in the center is an objective function, at +.>Updating all the learnable parameters in the encoder and the classifier by using the optimizer;

in the associated data verification setSelecting the optimal second super-parameter combination to maximize the selection measurement index of electrocardiosignal type identification; the second super-parameter combination comprises the number d of columns of the cut central electric signal of the cutter _patch The number d of the submatrices after the central electric signal of the cutter is cut _v The number of vectors T of the self-training vector group, the number of vectors T' of the self-estimated vector group, the number of transducer sub-blocks included in the encoder, the hidden dimension L of the transducer sub-blocks in the encoder, and the attention head h of the transducer sub-blocks in the encoder _encoder 。

8. The method of claim 7, wherein the model test comprises the steps of:

in the associated data test setAnd finally, evaluating the model effect by selecting the measurement index, if the model evaluation result meets the preset requirement, allowing the model to be used, and if the model evaluation result does not meet the preset requirement, adjusting the model parameters, and repeating the model pre-training and fine tuning steps until the model evaluation result meets the preset requirement.

9. A type recognition system of multi-lead electrocardiosignals is characterized by comprising a data acquisition module, a data preprocessing module, a data set dividing module, a model generating module and a service calculating module, wherein the data acquisition module, the data preprocessing module, the data set dividing module, the model generating module and the service calculating module are arranged in the system

the data preprocessing module is used for generating a multi-lead electrocardiosignal data set representing all electrocardiosignals based on the n multi-lead electrocardiosignalsThe corresponding sample size is n; based on n multi-lead electrocardiosignals, and patient characteristic information and electrocardiosignal type labels related to the multi-lead electrocardiosignals, eliminating multi-lead electrocardiosignals with missing patient characteristic information or electrocardiosignal type labels, and generating a related data set->The corresponding sample size is m, wherein m is less than or equal to n;

the data set dividing module is used for dividing the multi-lead electrocardiosignal data setDividing into multiple lead electrocardiosignal training set +.>And a multi-lead electrocardiographic signal verification set +.>Associating the data set +.>Dividing into associated data training sets- >Associated data verification set +.>Associative data test set +.>

The model generating module is used for initializing model parameters and then training the multi-lead electrocardiosignal training setMulti-lead electrocardiosignal verification set +.>Associative data training set +.>Associated data verification set +.>Inputting a model framework, performing self-supervision learning, punishing the self-supervision learning, and obtaining a pre-trained electrocardiographic self-supervision model; is also used for training the set based on the associated data>Associated data verification set +.>Fine tuning the pre-trained electrocardiographic self-supervision model to finish the training of the electrocardiographic self-supervision model; also for testing the set based on said association data>Testing the trimmed electrocardiograph model, evaluating the model effect, if the model evaluation result does not meet the preset requirement, adjusting model parameters, repeating the model pre-training and the model trimming until the model evaluation result meets the preset requirement, and obtaining and storing a trained electrocardiograph self-supervision model; the electrocardiograph self-supervision model framework is based on a transducer module and comprises a cutter, a double-classification masker, an encoder, a decoder and a classifier;

the service calculation module is used for receiving the electrocardiosignal type identification request, and calling a trained electrocardiosignal self-supervision model to obtain probability information corresponding to various set electrocardiosignal types;

Wherein in the model generation module, in the multi-lead electrocardiosignal training setThe multi-lead electrocardiosignal verification set +.>Above, the model-based slicer, dual-classification mask, encoder, decoder perform the self-supervised learning, the self-supervised learning comprising the steps of:

forward propagation of the saidThe central electrocardiosignal sequentially passes through a cutter and a double-classification masking device to obtain self-trainingThe method comprises the steps of training a vector set and a transformed self-estimated vector set, wherein the self-training vector set and a classified vector are spliced and sequentially pass through an encoder and a decoder to output a set of predicted vectors as an estimation result of the transformed self-estimated vector set; wherein the classification vector is a preset learnable classification vector;

at the verification setThe optimal first super-parameter combination is selected so that the self-supervision loss function is minimum compared with the total amount of all the learnable parameters in the decoder;

at the associated data training set Associated data verification set +.>The slicer, the dual-classification masker, the encoder, the decoder and the classifier based on the model perform punishment self-supervision learning, and the punishment self-supervision learning comprises the following steps:

forward propagation, willThe electrocardiosignal in the heart is sequentially transmitted through a cutter and a double-classification masking device to obtain a self-training vector group and a transformed self-estimated vector group, the self-training vector group and one classification vector are spliced and then are transmitted through an encoder to output an encoded self-training vector group and an encoded classification vector, wherein the one classification vector isA preset learnable classification vector; the encoded self-training vector set and the encoded classification vector enter branch one, the encoded classification vector and the +.>The patient characteristic information in the model is entered into a branch II;

Parameter updating, taking penalty loss function as target function, inUpdating all the learnable parameters in the encoder, the decoder and the classifier by using an optimizer, wherein the penalty loss function is the sum of a self-supervision loss function of a predicted vector and a transformed self-estimated vector group and lambda cross entropy, wherein the cross entropy represents the prediction probability of an electrocardiosignal type and the cross entropy loss of an electrocardiosignal type label, and lambda is tens of super-parameters;

at the verification setThe optimal super parameter lambda is selected to maximize the selection measurement index of electrocardiosignal type identification, wherein the selection measurement index comprises AUC and F _β One of accuracy; wherein AUC is the area enclosed by the coordinate axis and the curve lower part of the working characteristic curve of the subject, < ->Beta extraction0.5,1 or 2, precision is the precision representing the duty ratio of the number of samples actually belonging to the corresponding type among the samples which are judged by the model to belong to the certain type of electrocardiograph signal, and recovery is the recall representing the duty ratio of the number of samples which are judged by the model to belong to the corresponding type of electrocardiograph signal among the samples actually belonging to the certain type of electrocardiograph signal.

10. The system of claim 9, wherein the model generation module comprises a sample library, a model training engine, and a model library, wherein the sample library is a multi-lead electrocardiographic data set from the received data set partitioning module And associated data set->And finish storing; the model training engine is used for completing model training based on a multi-lead electrocardiosignal data set and an associated data set stored in a sample library; the model library is used for storing a trained electrocardiographic self-supervision model;

11. The multi-lead electrocardiographic signal type recognition system of claim 10 further comprising a front-end interaction module, a dynamic monitoring module,

12. The system for identifying the type of the multi-lead electrocardiosignal according to claim 11, further comprising an electrocardio-interpretation module, wherein the electrocardio-interpretation module is internally provided with an electrocardio-interpretation model for interpreting an electrocardiogram and identifying the condition that the type of the electrocardiosignal is arrhythmia.

13. An intelligent electrocardio-assisted system, which is characterized by comprising the type recognition system of the multi-lead electrocardiosignals as claimed in claims 9-12, and further comprising a knowledge base, wherein the knowledge base stores processing suggestions, and when the type recognition system of the multi-lead electrocardiosignals gives a type recognition result, the knowledge base is called to output the processing suggestions meeting preset conditions in the knowledge base.

14. An electronic device, comprising: a processor;

A memory storing a program configured to implement the type recognition method of a multi-lead electrocardiographic signal according to any one of claims 1-8 when executed by the processor.

15. A non-transitory computer readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the type recognition method of a multi-lead electrocardiographic signal according to any one of claims 1-8.