CN116862765A

CN116862765A - Medical image super-resolution reconstruction method and system

Info

Publication number: CN116862765A
Application number: CN202310594669.7A
Authority: CN
Inventors: 张瑜; 马涵骁; 刘丽霞; 张友梅; 李彬; 张明亮
Original assignee: Qilu University of Technology
Current assignee: Qilu University of Technology
Priority date: 2023-05-23
Filing date: 2023-05-23
Publication date: 2023-10-10

Abstract

The application provides a medical image super-resolution reconstruction method and a medical image super-resolution reconstruction system, which relate to the technical field of computer vision and acquire a low-resolution medical image to be reconstructed; processing the low-resolution image through the trained super-resolution reconstruction model to generate and output a final high-resolution medical image; the super-resolution reconstruction model comprises a shallow layer feature extraction module, a deep layer feature extraction module and a reconstruction module, wherein in the deep layer feature extraction module, a parallel convolution module and a local window self-attention module are used for carrying out dynamic calculation and sharing of channel weight and space weight through a weight sharing module, a space feature map and a channel feature map are generated based on the dynamically calculated weight, and the deep layer feature map is obtained after channel splicing; compared with a model based on a transducer and a CNN, the method provided by the application can better recover important information of a black background and a central area, recover more detailed information of a medical image, and improve the super-resolution reconstruction precision of the medical image.

Description

Medical image super-resolution reconstruction method and system

Technical Field

The application belongs to the technical field of computer vision, and particularly relates to a medical image super-resolution reconstruction method and system.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

With the advancement of medical imaging technology, magnetic Resonance Imaging (MRI) plays an increasingly important role in routine clinical diagnostic evaluation, while High Resolution (HR) magnetic resonance imaging can provide more abundant detail, which is more diagnostic. However, it is difficult to capture high quality high resolution MRI images with imaging devices due to time constraints, required signal-to-noise ratio, and body motion; in recent years, image super-resolution (SR) has been applied to reconstruct an image of High Resolution (HR) from an image of Low Resolution (LR); therefore, attention has been paid to a practical way to improve the quality of MRI images.

Image super-resolution methods reconstruct HR images by learning the mapping between LR and HR images, many conventional interpolation methods have been proposed for image super-resolution, such as bicubic interpolation, lanczos- σ, however, the performance of these methods is not satisfactory; with the vigorous development of deep learning technology, deep neural networks, particularly Convolutional Neural Networks (CNNs) and transformers, have dominated the task of image super-resolution; many methods have been proposed for processing MRI images. MRI images typically contain large areas of background that are much less informative than the central region.

Most CNN-based methods have two disadvantages: firstly, CNN is difficult to capture long-distance dependency relationship, so that similarity of large areas is difficult to fully utilize, and efficiency of modeling information is reduced; second, most previous CNN-based methods have seen the same kernel for all spatial pixels of the image, preventing its representation capability. The transducer-based approach performs very surprisingly in image super-resolution SR, which depends on the strong ability to capture global information, and the SwinIR based on swinsformer proposed by Liang et al achieves good results in SR, however, using transducer alone greatly increases computation time, which is not tolerable in MRI.

Disclosure of Invention

In order to overcome the defects in the prior art, the application provides a medical image super-resolution reconstruction method and a system, which are used for better recovering important information of a black background and a central area, recovering more detailed information of a medical image and improving the medical image super-resolution reconstruction precision compared with the existing model based on a transducer and a CNN.

To achieve the above object, one or more embodiments of the present application provide the following technical solutions:

the first aspect of the application provides a medical image super-resolution reconstruction method;

a medical image super-resolution reconstruction method comprises the following steps:

acquiring a low-resolution medical image to be reconstructed;

processing the low-resolution image through the trained super-resolution reconstruction model to generate and output a final high-resolution medical image;

the super-resolution reconstruction model comprises a shallow feature extraction module, a deep feature extraction module and a reconstruction module, wherein in the deep feature extraction module, a parallel convolution module and a local window self-attention module perform dynamic calculation and sharing of channel weights and space weights through a weight sharing module, a space feature map and a channel feature map are generated based on the dynamically calculated weights, and the deep feature map is obtained after channel splicing.

Further, the shallow feature extraction module extracts a shallow feature map from the low resolution medical image using a 3×3 convolution layer.

Further, the processing procedure of the deep feature extraction module specifically includes: the input shallow layer feature map is subjected to a plurality of shared residual SwinTransformer modules and a 3×3 convolution layer which are connected in sequence to obtain a deep layer feature map.

Further, the processing procedure of the shared residual swinTransformer module specifically includes: the input feature map is subjected to a plurality of sharing SwinTransformer modules which are connected in sequence to obtain a sharing feature map, and the sharing feature map is subjected to residual connection with the input feature map to obtain a sharing residual feature map.

Further, the shared swinTransformer module consists of a channel splitting operation, a local window self-attention module, a convolution module with a convolution kernel size of 3×3, a weight sharing module, a channel splicing operation and a feedforward neural network, and the processing process specifically comprises the following steps:

splitting an input feature diagram into two feature subgraphs through channel splitting operation, and respectively inputting the two feature subgraphs into a local window self-attention module and a convolution module;

the local window self-attention module and the convolution module generate a space feature map and a channel feature map based on the dynamic weight shared by the weight sharing module;

the space feature map, the channel feature map and the input feature map are spliced through channel splicing operation;

and inputting the spliced characteristic images into a feedforward neural network to obtain a shared characteristic image.

Further, the specific generation process of the spatial feature map and the channel feature map is as follows:

(1) Convolution module is from characteristic subgraph X after split ₁ Extracting convolution feature diagramAnd an initial channel profile X' _C ；

(2) Based on convolution characteristicsThe channel weight sharing sub-module calculates the channel weight W _C And sharing to the local window self-attention module;

(3) Based on channel weight W _C Local window self-attention module is from characteristic subgraph X after split ₂ Extracting space feature map X _A ；

(4) Based on space feature map X _A The spatial weight sharing sub-module calculates a spatial weight W _A And sharing to a convolution module;

(5) Based on spatial weight W _A Convolution module maps initial channel characteristics X' _C Spatial weighting is carried out to obtain a channel characteristic diagram X _C 。

Further, the reconstruction module uses an up-sampling layer composed of a sub-pixel convolution layer and a 3×3 convolution to reconstruct the deep feature map M _D And shallow feature map M _S Upsampling the added feature map, reforming the information stream into a feature map with specified upsampling magnification to obtain a high-resolution image I _SR 。

The second aspect of the application provides a medical image super-resolution reconstruction system.

A medical image super-resolution reconstruction system comprises an acquisition module and a generation module:

an acquisition module configured to: acquiring a low-resolution medical image to be reconstructed;

a generation module configured to: processing the low-resolution image through the trained super-resolution reconstruction model to generate and output a final high-resolution medical image;

A third aspect of the present application provides a computer readable storage medium having stored thereon a program which when executed by a processor performs steps in a medical image super-resolution reconstruction method according to the first aspect of the present application.

In a fourth aspect, the present application provides an electronic device, including a memory, a processor and a program stored in the memory and executable on the processor, the processor implementing the steps in a medical image super-resolution reconstruction method according to the first aspect of the present application when the program is executed by the processor.

The one or more of the above technical solutions have the following beneficial effects:

the application adopts the parallel processing of the convolution module and the local window self-attention module, splits the feature map into two feature subgraphs through a channel separation technology, respectively inputs the two feature subgraphs into the two parallel modules, realizes the calculation and sharing of dynamic weights between the two modules, and simultaneously acquires the long-distance dependence of the feature map and the detail information of a central area.

The application dynamically calculates the channel weight and the space weight through the channel weight sharing sub-module and the space weight sharing sub-module of the weight sharing module to generate a space feature map and a channel feature map, and the deep feature map is obtained after channel splicing, so that the attention mechanism is combined with convolution, the capability of acquiring the channel feature by the convolution is improved, and the capability of acquiring the space feature is also improved; the convolution module is limited by the convolution receptive field, the difficulty that long-distance dependence is difficult to obtain is overcome, and the capability of obtaining image details by self-attention of a local window is enhanced.

Additional aspects of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application.

Fig. 1 is a block diagram of a super-resolution reconstruction model according to a first embodiment.

Fig. 2 is a block diagram of the first embodiment shared residual swintransducer module RSSTB.

Fig. 3 is a block diagram of the first embodiment sharing the swinformer module SACB.

Detailed Description

It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

Example 1

In one or more embodiments, a medical image super-resolution reconstruction method is disclosed, comprising the steps of:

(1) A low resolution medical image to be reconstructed is acquired.

(2) And processing the low-resolution image through the trained super-resolution reconstruction model to generate and output a final high-resolution medical image.

The super-resolution reconstruction model comprises a shallow feature extraction module, a deep feature extraction module and a reconstruction module, wherein in the deep feature extraction module, a parallel convolution module and a local window self-attention module perform dynamic calculation and sharing of channel weights and space weights through a weight sharing module, a space feature map and a channel feature map are generated based on the dynamically calculated weights, and the deep feature map is obtained after processing.

The following describes the implementation process of the medical image super-resolution reconstruction method in detail from the angles of super-resolution reconstruction model construction, training and use, namely the following three aspects:

1. and constructing a super-resolution reconstruction model based on the convolution and the self-attention, and realizing weight sharing of the convolution and the self-attention.

Fig. 1 is a block diagram of a super-resolution reconstruction model, and as shown in fig. 1, the super-resolution reconstruction model includes a shallow feature extraction module, a deep feature extraction module, and a reconstruction module. The shallow feature extraction module is used for extracting a shallow feature map of an input low-resolution medical image; the deep feature extraction module is formed by connecting a plurality of shared residual error SwinTransformer modules RSSTBs and convolution layers, and extracts a deep feature map; the reconstruction module consists of a subpixel convolution layer and a 3×3 convolution layer, and is responsible for upsampling a feature map obtained by adding a deep feature map and a shallow feature map to obtain a final high-resolution medical image.

Shallow layer feature extraction module

Shallow feature extraction is carried out on an input low-resolution medical image, and a convolution layer with the size of 3 multiplied by 3 is used for extracting a shallow feature map, and the specific operation is as follows:

M _S ＝F _S (I _LR ) (1)

wherein F is _S Representing shallow feature extraction modules, I _LR Representing an input low resolution medical image, M _S A shallow feature map is shown.

Deep feature extraction module

As shown in deep feature extraction in fig. 1, feature fusion is performed on shallow feature graphs by using a deep feature extraction module composed of n shared residual swin converter modules RSSTB and a 3×3 convolution layer, and deep feature graphs are extracted, where the shared residual swin converter modules RSSTB include m shared swin converter modules SACB, in this embodiment, n=3, m=3 are taken, and the structure of the deep feature extraction module is expressed as:

M _i ＝F _RSSTB (M _i-1 )，i＝1，2，...，n

M _D ＝F _conv (M _n ) (2)

wherein F is _RSSTB Representing a shared residual SwinTransformer module, n representing the commonThe number of shared residual SwinTransformer modules, M _i 、M _i-1 、M _n Respectively representing the i-th, i-1-th and n-th shared residual error SwinTransformer module output shared residual error characteristic diagram, F _conv Representing a convolution operation with a convolution kernel size of 3X3, M _D Representing a deep feature map.

Fig. 2 is a block diagram of a shared residual swin converter module RSSTB, and as shown in fig. 2, the shared residual swin converter module includes a plurality of shared swin converter modules SACB and a residual connection, and an ith shared residual swin converter module RSSTB is expressed as:

M _t ＝F _3*SACB (M _i-1 )，i＝1，2，...，n

M _i ＝M _t +M _i-1 (3)

wherein F is _3*SACB Representing the data through 3 shared SwinTransformer modules SACB, M _t Representing a shared feature map output through 3 shared SwinTransformer modules, M _i 、M _i-1 And respectively representing the shared residual characteristic diagrams output by the ith shared residual SwinTransformer module and the ith-1 shared residual SwinTransformer module.

Fig. 3 is a block diagram of a shared swinTransformer module SACB, and as shown in fig. 3, the shared swinTransformer module SACB is composed of a channel splitting operation, a local window self-attention module, a convolution module with a convolution kernel size of 3×3, a weight sharing module, a channel splicing operation and a feedforward neural network, and the processing procedure specifically includes:

(1) Splitting an input feature graph X into two feature subgraphs X through a channel splitting operation ₁ And X ₂ Respectively inputting the partial window self-attention module and the convolution module;

(2) The local window self-attention module and the convolution module generate a space feature graph X based on the dynamic weight shared by the weight sharing module _A And channel profile X _c ；

(3) Space feature map X _A Channel profile X _C Splicing the input characteristic images X through channel splicing operation;

(4) And inputting the spliced characteristic images into a feedforward neural network to obtain a shared characteristic image.

In step (2) the spatial signature X _A And channel profile X _C In the generation process of the system, a parallel convolution module and a local window self-attention module perform dynamic calculation and sharing of channel weights and space weights through a weight sharing module, and a space feature map and a channel feature map are generated based on the dynamically calculated weights.

Specifically, the weight sharing module includes a channel weight sharing sub-module and a space weight sharing sub-module, the local window self-attention module and the convolution module share weights through the two sub-modules of the weight sharing module, and the specific generation process of the feature map is as follows:

(3) Based on channel weight WC, the local window self-attention module divides characteristic subgraph X ₂ Extracting space feature map X _A ；

The channel weight sharing module calculates the channel weight W _C The specific process is as follows: convolution characteristics extracted by convolution moduleSequentially performing Global Average Pooling (GAP), convolution with convolution kernel of 1×1, leaky Relu activation function, convolution with convolution kernel of 1×1 and sigmoid activation function to obtain channel weight W _C The method comprises the steps of carrying out a first treatment on the surface of the The spatial weight sharing module calculates a spatial weight W _A The specific process is as follows: map the space characteristic diagram X _A The space weight W is obtained by sequentially carrying out convolution with a convolution kernel of 1 multiplied by 1, a Leaky Relu activation function, convolution with a convolution kernel of 1 multiplied by 1 and a sigmoid activation function _A 。

Based on the above description, a spatial feature map X is generated _A And channel profile X _C Is expressed as:

X ₁ ，X ₂ ＝F _SP (F _1×1 (X))

Q＝X ₂ P _Q

K＝X ₂ P _K

V＝X ₂ P _V W _C

W _A ＝F _Sig (F _1×1 (F _LRelu (F _1×1 (X _A ))))

X _C ＝X′ _C W _A

wherein X represents the input feature map, X ₁ 、X ₂ For two split feature subgraphs of X, F _1×1 、F _3×3 Convolution operations of 1×1 and 3×3, respectively; f (F) _LRelu The representation represents a LeakyRelu activation function,representing a convolution profile, X' _C Representing an initial channel profile, W _C Representing channel weights, GAP represents a global 1-average pooling operation, F _Sig Representing a sigmoid activation function, Q, K and V representing query, key and value matrices, respectively, P _Q 、P _K 、P _V Transform matrix, X, representing Q, K, V _A Representing a spatial signature, attention representing self-Attention computation, softMax representing SoftMax function, B representing a learnable relative position code, W _A Representing spatial weights, X _C Representing channel characteristics, F _CAT Representing channel stitching operations,/->The characteristic diagram after splicing is represented, FFN represents a feedforward network, and Y represents a shared characteristic diagram, namely the characteristic outputted by a shared SwinTransformer module SACB.

Reconstruction module

A reconstruction module for reconstructing the deep feature map M by using an up-sampling layer composed of a sub-pixel convolution layer and a convolution of 3×3 _D And shallow feature map M _S Upsampling the added feature map, reforming the information stream into a feature map with specified upsampling magnification to obtain a high-resolution image I _SR The process is described as:

I _SR ＝F _UP (M _D +M _S ) (6)

wherein I is _SR Representing the super-resolution reconstructedHigh resolution image, F _UP Representing the reconstruction module.

2. And training a super-resolution reconstruction model by using the cross-sectional MRI image oasis data set, and testing the training effect of the model.

Specifically, N low-resolution images and corresponding high-resolution images are obtained through processing in the cross-section MRI image oasis data set, and different scaling scales are carried out on the images to obtain a training data set.

Training the super-resolution reconstruction model by using a training data set, optimizing the model by using a loss function in the training process, and constructing the loss function by using L2, wherein the specific formula is as follows:

wherein I is _sr Representing the recovered super-resolution image, I _hr The original high-resolution image is represented, i and j represent corresponding pixel points in an image matrix, H and W represent the height and width of the image respectively, and L2 represents an L2 loss function.

After training is completed, the training effect of the model is tested through the PSNR and SSIM, and the higher the values of the two indexes, the better the training effect.

The PSNR is generally called Peak Signal to Noise Ratio, i.e. peak signal-to-noise ratio, is an objective standard for evaluating image quality, and is often used for calculating visual errors of two images, such as a visual difference between an original image and a compressed image, a difference between an image generated by a rain removing network and a real image, and the like, and the calculating method specifically includes:

assuming that the processed image is I, the real image is K, and the sizes are m×n, the peak-to-noise ratio is defined as:

wherein MAX represents the maximum value of the pixel point on the image, and since the image output by the network is the normalized image, it is defined as 1 in the embodiment, and mse is defined as:

the SSIM is generally called Structural Similarity, i.e. structural similarity, and is used for evaluating the similarity index of two images, and is commonly used for measuring the similarity between the images before distortion and after distortion, and also for measuring the authenticity of the generated image of the model, such as image rain removal, image defogging, image harmony, etc., and the calculation method specifically comprises:

the calculation of the SSIM is realized based on a sliding window, namely, a window with the size of N multiplied by N is taken from the picture in each calculation, SSIM indexes are calculated based on the window, and the values of all the windows are averaged after the whole image is traversed to be used as the SSIM indexes of the whole image.

Let x denote the data in the first image window and y denote the data in the second image window; wherein the similarity of images is composed of three parts: luminance, contrast, and structure.

The luminance calculation formula is:

the calculation formula of the confast is:

the structural calculation formula is:

wherein mu _x Sum mu _y Represents the average value of x and y, sigma in turn _x Sum sigma _y Representing in turn the variance, σ, of x and y _xy Representing the covariance between x and y, c ₁ ＝(k ₁ L) ² 、c ₂ ＝(k ₂ L) ² Andthree constants are represented, avoiding denominator of 0, k ₁ And k is equal to ₂ Defaults to 0.01 and 0.03 in this order, L represents the range of image pixel values, and L is set to 1 in this embodiment.

And finally, the calculation formula of the SSIM is as follows:

SSIM(x,y)＝[l(x,y) ^α ·c(x,y) ^β ·s(x,y) ^γ ] (13)

let α, β, γ be 1 in this embodiment, namely:

this is also a commonly used SSIM calculation formula.

3. And performing actual medical image super-resolution reconstruction by using the trained super-resolution reconstruction model.

After model training is completed, packaging the trained super-resolution model for actual reconstruction, wherein the method specifically comprises the following steps: the low resolution medical image to be reconstructed is input into the trained super resolution model to generate and output a final high resolution medical image.

Example two

In one or more embodiments, a medical image super-resolution reconstruction system is disclosed, comprising an acquisition module and a generation module:

Example III

An object of the present embodiment is to provide a computer-readable storage medium.

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of a medical image super-resolution reconstruction method according to an embodiment of the present disclosure.

Example IV

An object of the present embodiment is to provide an electronic apparatus.

The electronic device comprises a memory, a processor and a program stored in the memory and capable of running on the processor, wherein the processor realizes the steps in the medical image super-resolution reconstruction method according to the first embodiment of the disclosure when executing the program.

The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. The medical image super-resolution reconstruction method is characterized by comprising the following steps of:

acquiring a low-resolution medical image to be reconstructed;

2. The method of claim 1, wherein the shallow feature extraction module extracts the shallow feature map from the low resolution medical image using a 3x3 convolution layer.

3. The medical image super-resolution reconstruction method as set forth in claim 1, wherein the processing procedure of the deep feature extraction module specifically comprises: the input shallow layer feature map is subjected to a plurality of shared residual SwinTransformer modules and a 3×3 convolution layer which are connected in sequence to obtain a deep layer feature map.

4. The method for reconstructing a super-resolution image of claim 1, wherein the processing procedure of the shared residual swin converter module specifically comprises: the input feature map is subjected to a plurality of sharing SwinTransformer modules which are connected in sequence to obtain a sharing feature map, and the sharing feature map is subjected to residual connection with the input feature map to obtain a sharing residual feature map.

5. The method for reconstructing super-resolution image of claim 4, wherein said shared swinTransformer module comprises a channel splitting operation, a local window self-attention module, a convolution module with a convolution kernel size of 3×3, a weight sharing module, a channel splicing operation and a feedforward neural network, and the processing procedure is specifically as follows:

6. The medical image super-resolution reconstruction method as set forth in claim 1, wherein the specific generation process of the spatial feature map and the channel feature map is as follows:

7. The method of claim 1, wherein the reconstruction module uses an upsampling layer consisting of a convolution of sub-pixels and a convolution of 3x3 for the deep feature map M _D And shallow feature map M _S Upsampling the added feature map, reforming the information stream into a feature map with specified upsampling magnification to obtain a high-resolution image I _SR 。

8. The medical image super-resolution reconstruction system is characterized by comprising an acquisition module and a generation module:

9. An electronic device, comprising:

a memory for non-transitory storage of computer readable instructions; and

a processor for executing the computer-readable instructions,

wherein the computer readable instructions, when executed by the processor, perform the method of any of the preceding claims 1-7.

10. A storage medium, characterized by non-transitory storing computer-readable instructions, wherein the instructions of the method of any one of claims 1-7 are performed when the non-transitory computer-readable instructions are executed by a computer.