CN114972266A

CN114972266A - Lymphoma ultrasonic image semantic segmentation method based on self-attention mechanism and stable learning

Info

Publication number: CN114972266A
Application number: CN202210604099.0A
Authority: CN
Inventors: 韩迎康; 陈德华; 潘乔; 王梅
Original assignee: Donghua University
Current assignee: Donghua University
Priority date: 2022-05-31
Filing date: 2022-05-31
Publication date: 2022-08-30

Abstract

The invention relates to a lymphoma ultrasonic image semantic segmentation method based on a self-attention mechanism and stable learning. And cutting an interested area of the marked instrument scanning image and adjusting the pixel spacing to obtain a new data set for training the model through a data preprocessing module. The non-local interaction between the coding characteristics of the coder is realized by adopting a self-attention mechanism, the problem of information degradation caused by multiple times of sampling is relieved, and the structural boundary of a more accurate segmented target object is obtained; the dependence between the environmental characteristics and the essential characteristics is eliminated by adopting a stable learning method through a random Fourier characteristic and a sample weighting mode, so that the false correlation problem is reduced, and the segmentation precision is improved; and adopting a counter-fact interpretation method to interpret the model result at an instance level, thereby improving the credibility of the model.

Description

Lymphoma ultrasonic image semantic segmentation method based on self-attention mechanism and stable learning

Technical Field

The invention relates to the field of medical image processing, in particular to a lymphoma ultrasonic image semantic segmentation method based on a self-attention mechanism and stable learning.

Background

Lymphoma is a fatal cancer formed by abnormal mutations of cells of the immune system. There are many different histological subtypes, the diagnosis of which is usually based on sampling (biopsy). Lymphoma originates in lymph nodes and tissues, can occur in any part of the body, and has various clinical manifestations. The disease is usually characterized by painless lymph node enlargement, and also can invade extranodal organs to cause damage to the corresponding organs. They are mainly classified into hodgkin and non-hodgkin lymphomas.

Lymphoma is diagnosed by combining clinical manifestations, physical examination, laboratory examination, imaging examination, pathological examination results, etc. of patients. These different methods play different roles in diagnosing the stage and type of lymphoma. Imaging techniques play an important role in the staging and typing of lymphomas. Commonly used Imaging methods include CT, Nuclear Magnetic Resonance Imaging (MRI), positron Emission computed Tomography (PET-CT), ultrasound, endoscopy, and the like.

Different imaging modalities produce images with different effects on the diagnosis of lymphoma due to different imaging techniques. Compared with other imaging technologies, the ultrasonic examination is a safe examination without radioactive damage, and belongs to a completely nondestructive, noninvasive and radiationless examination technology. Because the probe of supersound can be placed at will, consequently can take the section of multiple position, multi-angle to inspect during the human body inspection, very nimble can carry out the location measurement to the focus, and the relation of the position of deciding pathological change and surrounding tissue is clear away, and real-time dynamic display especially to the heart in the position of different periods heart beats, the change of the Doppler blood flow of beating, can observe the condition ultrasonic examination of hemodynamics and can obtain the inspection result immediately, can also repeat the inspection many times repeatedly. Ultrasonic methods have wide availability (including bedside) and relatively low cost. Ultrasound examination therefore plays an important role in the early diagnosis of lymphoma.

But due to the inherent acoustic characteristics of ultrasonic imaging, the obtained image has high noise, low contrast and poor imaging quality. The image needs to be segmented, which is the key to the analysis of the ultrasound image. Typically, this segmentation is performed manually by the clinician, which reduces the objectivity of the diagnosis and is labor intensive. Even experts, the description is slightly different according to their experience and skill. Therefore, for many medical image applications, correct segmentation of lesion regions by models is the key to successful application. An automatic segmentation model that can accurately capture a region of interest (ROI) from an image can provide a basis for a clinician to diagnose or perform pathology studies. However, the existing lymphoma image segmentation is mostly focused on three-dimensional Positron Emission Tomography (PET) images and Computed Tomography (CT) images, and the lymphoma image is not segmented.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a lymphoma ultrasonic image semantic segmentation method based on a self-attention mechanism and stable learning, which can effectively segment a lymphoma ultrasonic image under the condition of a small amount of data.

A lymphoma ultrasonic image semantic segmentation method based on a self-attention mechanism and stable learning comprises the following steps:

acquiring a lymphoma ultrasonic image, and processing the image to be used as a training sample;

constructing a lymphoma ultrasonic image segmentation network: the lymphoma ultrasonic image segmentation network comprises a data preprocessing module, a feature extraction network module, a self-attention mechanism module, a stable learning module and a counterfactual interpretation module;

the data preprocessing module is used for cutting and labeling all lymphoma ultrasonic image samples and taking the preprocessed samples as input data of a segmentation network;

the characteristic extraction network module is used for extracting the spatial information and the global information of the image, fusing the extracted characteristics and capturing a clearer object boundary;

the self-attention mechanism module is used for realizing non-local interaction among the features and relieving the problem of information decline caused by multiple sampling;

the stable learning module is used for eliminating the dependency relationship between the environmental characteristics and the essential characteristics in a random Fourier characteristic and sample weighting mode to realize generalization;

and the counterfactual interpretation module is used for carrying out example level interpretation on the model result and improving the credibility of the model.

Preferably, the operation steps of the data preprocessing module of the present invention include:

step S11, cutting the ultrasonic image scanned by the ultrasonic instrument to remove the extra content added by the ultrasonic scanning instrument;

step S12, uploading the cut image to a PLAbel marking system for free marking, submitting the image marked in the system to an experienced medical expert for examination and correction after marking is finished, and exporting the final marking result;

step S13 reads the exported json file to generate a real mask image;

step S14 adjusts the pixel pitch between the annotation image and the original image to obtain an image with a resolution of 256 × 256, and cuts out the region-of-interest image, and adjusts the resolution to 512 × 512.

Preferably, the operation steps of the feature extraction network module of the present invention include:

step S21, using the encoder structure to perform downsampling to extract low-level features, thereby performing accurate segmentation using the extracted spatial information and global information;

the decoder recovers the spatial information step by upsampling, fusing the features extracted during the encoder encoding process to capture clearer object boundaries at step S22.

Preferably, the operation steps of the self-attention mechanism module of the present invention include:

step S32, inputting the last layer of features encoded by the encoder into a TSA module, and the TSA module carries out different linear changes on the input feature map and the generated position embedding vector to generate three vectors, namely, query (Q), key (K), value (V), which are used for calculating the attribute;

step S32 multiplies the transposes of Q and K to obtain the degree of similarity between the Q and K elements, divided by the dimension d of the vector Q _k The evolution of (c) guarantees the gradient of softmax and is normalized by softmax, resulting in a contextual attention map A, which is multiplied by V to obtain the attention weighted value, the formula is as follows:

step S33 adds the return value obtained by the attention mechanism module element by element with the last layer feature to obtain a fused feature map F for the input of the decoder.

Preferably, the operation steps of the stable learning module of the present invention include:

step S41, inputting the feature map into a stable learning module, mapping the input features from low dimension to high dimension space through random Fourier features, and eliminating the correlation among the features, wherein the random Fourier feature formula is as follows:

h is a high-dimensional space feature obtained by performing random Fourier transform on the input low-dimensional feature x, omega is a random variable sampled from a standard normal distribution, omega x represents the random variable omega multiplied by x for transformation,

is a random variable sampled from a uniform distribution;

step S42 obtains an optimal value w of the sample weighting weight by calculating the minimum value of the covariance between two random variables ^* The formula is as follows:

w ^* ＝argmin _w∈Δn ∑ _1≤i<j≤n cov(w _i X _i ,w _j X _j )，

in the formula

n is the number of input batches, i.e. the number of incoming sample profiles, X _i And X _j Respectively, characteristic maps, w, of different samples in the sample space _i And w _j Are respectively a sample X _i And X _j A corresponding weighting weight;

step S43 deep learning requires a huge overhead to learn sample weights and features globally using all samples, so it is necessary to store and reload sample weights, with a learnable parameter α _i To perform global weight and feature update, the weight update formula is as follows:

X′ _Gi ＝α _i X _Gi +(1-α _i )X _L ，

W′ _Gi ＝α _i W _Gi +(1-α _i )W _L ，

X _Gi and X _L Global sample feature and current sample feature, W, respectively _Gi And W _L Are respectively globalA sample weight and a current sample weight;

step S44 multiplies the calculated optimal weight by the loss value of the sample to obtain a new loss for training the model, where the loss update formula is as follows:

loss＝SoftDiceLoss(SR,GT).view(1,-1).mm(w ^* ).view(1)，

SR and GT are the predicted value and the true value given by the model respectively.

Preferably, the operation steps of the counterfactual explanation module of the present invention include:

step S51 generating segmentation segments of an image by a fast-shift image segmentation algorithm

Step S52 finds a set of irreducible segments by masking the segmented segments so that the IoU score of the model is reduced most, as shown below:

T(I\S)<T(I)(IoU reduce)，

wherein I represents a segmentation segment of a generated image, S represents an over-segmentation segment, and T represents a segmentation model;

step S53 maps the set of segments to artwork, generating an instance-level interpretation of the model result.

The method is used for preprocessing the data aiming at the lymphoma ultrasonic scanning image data to obtain two data sets for training the model, so that the model obtains higher performance; the lymphoma ultrasonic image is subjected to automatic semantic segmentation based on a deep learning framework, the problems of fuzzy and generalization of ultrasonic image boundaries are solved through a self-attention module and a stable learning module, and meanwhile, the model is subjected to instance-level explanation through a counterfactual explanation module, so that the segmentation precision and the reliability of the model are improved; the automatic segmentation does not need human factor intervention, eliminates the interference of subjective factors, saves a large amount of time cost and labor cost, and improves the efficiency and precision of diagnosis.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a schematic flow diagram of a data preprocessing module according to the present invention; the system comprises a data preprocessing module, a self-attention mechanism module, a stable learning module and a counterfactual interpretation module;

FIG. 3 is a sample structure diagram of a normative dataset according to the present invention;

FIG. 4 is a diagram of a neural network architecture according to the present invention;

FIG. 5 is a schematic diagram of a feature extraction network module architecture according to the present invention;

FIG. 6 is a schematic diagram of a self-attention mechanism module architecture of the present invention;

FIG. 7 is a block diagram of a stable learning module according to the present invention;

FIG. 8 is a diagram of a counterfactual explanation module architecture according to the present invention;

FIG. 9 is a schematic diagram of the process of mapping IOU 0.89105 to 0.84559;

fig. 10 is a schematic diagram of the process of mapping IOU 0.95403 to 0.91483.

Detailed Description

the stable learning module is used for eliminating the dependency relationship between the environmental characteristics and the essential characteristics in a mode of weighting the samples by the random Fourier characteristics to realize generalization;

As shown in FIG. 1, the flow framework of the present invention is mainly composed of data preprocessing, deep neural network and counter-facts. Firstly, taking a lymphoma ultrasonic image of a patient as an original data set, and preprocessing the original data set to obtain a data set for neural network training; the trained neural network can provide a segmentation result for the input picture; the counterfactual interpretation method gives an example level of interpretation in conjunction with a deep neural network.

As shown in fig. 2, a preprocessing module based on a patient lymphoma ultrasound image is implemented, and the operation steps include:

step S13 reads the exported json file to generate a real mask image;

And aiming at the preprocessed data set, dividing a training set and a test set, wherein the test set is used for training the neural network, and the test set is used for evaluating the accuracy of the model. Referring to FIG. 3, the preprocessed image can be seen, and the image of stage1 is used for the training of the first stage of deep neural network for a rough location of the lesion area; the image of stage2 is the region-of-interest image used for the training of the second stage of the deep neural network for optimizing the target boundary of the segmented object.

As shown in fig. 4 and 5, a feature extraction network module based on a patient lymphoma ultrasound image is implemented, the module extracts information through encoder cavity convolution, recovers spatial information through decoder sampling, and obtains an image segmentation result, and the operation steps include:

step S21 down-sampling using the encoder structure to extract low-level features, thereby accurately segmenting using the extracted spatial information and global information;

As shown in fig. 6, a module of a self-attention mechanism based on an ultrasound image of a patient lymphoma is implemented, the module adds position embedding to features of the patient lymphoma image, and obtains a feature map weighted by attention through the self-attention mechanism, and the operation steps include:

step S32 multiplies the transposes of Q and K to obtain the degree of similarity between the Q and K elements, divided by the dimension d of the vector Q _k The evolution of (a) guarantees the gradient of softmax and is normalized by softmax to obtain a contextual attention map a, which is multiplied by V to obtain the attention weighted value, as shown below:

As shown in fig. 7, a stable learning module based on a patient lymphoma ultrasound image is implemented, which obtains sample weights for updating the loss function by performing a random fourier transform (RFF) on sample characteristics of the patient lymphoma ultrasound image and a weighted decorrelation (LSWD) of the learning sample, and the operation steps include:

is a random variable sampled from a uniform distribution;

w ^* ＝argmin _w∈Δn ∑ _1≤i<j≤n cov(w _i X _i ,w _j X _j )，

in the formula

n is the number of input batches, i.e. the number of incoming sample profiles, X _i And X _j Respectively, characteristic maps, w, of different samples in the sample space _i And w _j Are respectively a sample X _i And X _j Corresponding weighting weights;

step S43 deep learning requires a huge overhead to learn sample weights and features globally using all samples, so it is necessary to store and reload samplesWeight, by a learnable parameter α _i To perform global weight and feature update, the weight update formula is as follows:

X′ _Gi ＝α _i X _Gi +(1-α _i )X _L ，

W′ _Gi ＝α _i W _Gi +(1-α _i )W _L ，

X _Gi and X _L Global sample feature and current sample feature, W, respectively _Gi And W _L Global sample weight and current sample weight respectively;

step S44 multiplies the calculated optimal weight by the loss value of the sample to obtain a new loss for model training, where the loss update formula is as follows:

loss＝SoftDiceLoss(SR,GT).view(1,-1).mm(w ^* ).view(1)，

As shown in fig. 8, a Counterfactual interpretation module based on a patient lymphoma ultrasound image is implemented, and the Counterfactual interpretation module processes the patient lymphoma ultrasound image through a rapid displacement image segmentation algorithm SETC and a computational Counterfactual module to generate an example-level Counterfactual interpretation, and the operation steps include:

T(I\S)<T(I)(IoU reduce)，

As shown in fig. 9 and 10, fig. 9 is a schematic diagram of a process of mapping IOU 0.89105 to 0.84559, fig. 9 is a schematic diagram of a process of mapping IOU 0.95403 to 0.91483, and it can be seen that the counterfactual interpretation result generated by the counterfactual module is shown, where a is an input original image in fig. 9 and 10; b is a visual graph of the segmentation segment generated by the SETC algorithm; c is a counterfactual explanation generated by the counterfactual module; d is a reflection of the fact interpretation mapping on the original image.

Experimental results show that the lymphoma ultrasonic image of a patient can be efficiently utilized, the segmentation precision of the lymphoma patient is effectively improved, the boundary blurring and generalization problems of the lymphoma patient can be solved by the model, the model is subjected to example-level explanation through counterfactual explanation, and the obtained segmentation result has more confidence for auxiliary diagnosis of doctors.

Claims

1. A lymphoma ultrasonic image semantic segmentation method based on a self-attention mechanism and stable learning is characterized by comprising the following steps:

2. The method for semantic segmentation of lymphoma ultrasound images based on self-attention mechanism and stable learning according to claim 1, wherein the operation steps of said data preprocessing module comprise:

step S13 reads the exported json file to generate a real mask image;

3. The lymphoma ultrasound image semantic segmentation method based on self-attention mechanism and stable learning according to claim 2, wherein the feature extraction network module operating step comprises:

4. The lymphoma ultrasound image semantic segmentation method based on self-attention mechanism and stable learning according to claim 3, wherein the self-attention mechanism module is operated by the following steps:

5. The lymphoma ultrasound image semantic segmentation method based on self-attention mechanism and stable learning according to claim 4, wherein the stable learning module is operated by the following steps:

h is a high-dimensional space feature obtained by performing random Fourier transform on the input low-dimensional feature x, omega is a random variable sampled from a standard normal distribution, omega x represents the transformation by multiplying the random variable omega by x,

is a random variable sampled from a uniform distribution;

w ^* ＝argmin _w∈Δn ∑ _1≤i<j≤n cov(w _i X _i ,w _j X _j )，

in the formula

X′ _Gi ＝α _i X _Gi +(1-α _i )X _L ，

W′ _Gi ＝α _i W _Gi +(1-α _i )W _L ，

X _Gi and X _L Global sample feature and current sample feature, W, respectively _Gi And W _L Respectively a global sample weight and a current sample weight;

loss＝SoftDiceLoss(SR,GT).view(1,-1).mm(w ^* ).view(1)，

6. The method for semantic segmentation of lymphoma ultrasound images based on self-attention mechanism and stable learning according to claim 5, wherein said counterfactual interpretation module comprises:

T(I\S)<T(I)(IoU reduce)，

i represents a segmentation segment of a generated image, S represents an over-segmentation segment, and T represents a segmentation model;

step S53 maps the set of fragments to the original image, and generates an instance-level interpretation of the model result.