CN118154468A

CN118154468A - Universal fuzzy image enhancement method based on hidden space diffusion model

Info

Publication number: CN118154468A
Application number: CN202410582588.XA
Authority: CN
Inventors: 李慧琦; 韩浩楠; 杨邴予; 张蔚航
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2024-05-11
Filing date: 2024-05-11
Publication date: 2024-06-07
Anticipated expiration: 2044-05-11

Abstract

The invention discloses a general fuzzy image enhancement method based on a hidden space diffusion model, which relates to the field of image enhancement and comprises the following steps: s1: constructing a reconstruction training set and a reconstruction testing set; s2: constructing a fine tuning training set and a fine tuning test set; s3: constructing a hidden space diffusion model; s4: and enhancing the fuzzy image in the fine adjustment test set by using the trained hidden space diffusion model to obtain an enhancement result. The hidden space diffusion model is trained by adopting a strategy of reconstruction and fine tuning, the fuzzy image is directly enhanced by utilizing the trained model, compared with the existing image enhancement method, the enhancement result effectively realizes the enhancement of the fuzzy image, the image detail is well restored, and the generated result is true and reliable; only unpaired images are needed in the reconstruction process, and the paired image data needed in the fine tuning stage has small scale, so that the problems of high cost and high difficulty in paired data acquisition are well avoided.

Description

Universal fuzzy image enhancement method based on hidden space diffusion model

Technical Field

The invention relates to the technical field of image enhancement, in particular to a general enhancement method for a blurred image based on a hidden space diffusion model.

Background

Image enhancement is a technology for improving the definition of high-frequency details and the visibility of low-frequency contents of an image, can effectively improve visual effects, and is beneficial to subsequent image processing such as target detection, image segmentation, target identification and the like. Common image enhancement tasks are low-light image enhancement, underwater image enhancement, medical image enhancement, and the like.

The traditional enhancement algorithm relies on a large amount of priori knowledge, and aims at weak contrast of a blurred image and low brightness of an imaging area, improves global contrast and brightness, and can be specifically divided into three types: filter-based and statistical histogram-based methods based on transform functions. Although the contrast of the background and the high-frequency details can be increased by the method, the enhanced image loses important color information, and local noise is enhanced while the contrast is improved, so that the method is unfavorable for the follow-up tasks of the image, such as target detection of an underwater image and auxiliary diagnosis and treatment in a medical image. In addition, most of traditional enhancement algorithms use priori knowledge, complex algorithm models need to be designed, more consistent enhancement effects are difficult to obtain on different data sets, and algorithm generalization is weak.

Enhancement algorithms based on deep learning are usually model trained by domain pairing images, of which there are representatives: the variation is from an encoder, energy-based models, generation of an countermeasure network, etc. Although these models achieve a certain effect in the image enhancement task, two major problems remain: poor generalization ability of the model and difficult fine tuning of the model. Since the performance of the deep learning-based enhancement algorithm is related to the training data, model enhancement tends to be reduced when there is a distribution bias, such as test data from different data sets, or the blur type is inconsistent with the training data, or the data modality used for training and testing is inconsistent. At present, retraining or overall fine tuning by using pre-training weights is often used as a method for solving the problem of poor generalization performance of an algorithm; most deep learning methods do not fully consider the requirement of retraining or fine tuning at the beginning of design, so that the fine tuning requires larger computing resources and has higher use cost.

Therefore, it is necessary to provide a general enhancement method for blurred images based on a hidden space diffusion model to solve the above-mentioned problems.

Disclosure of Invention

The invention aims to provide a general fuzzy image enhancement method based on a hidden space diffusion model, which solves the problems that when the distribution of an image to be enhanced is inconsistent with the distribution of training data corresponding to a used weight, the image enhancement effect is poor, the algorithm performance is lower than an expected standard, the model generalization performance is poor, and when an image algorithm for deep learning is retrained or fine-tuned, the whole model needs to be optimized, and the resource consumption is high in the existing image enhancement method based on deep learning.

In order to achieve the above object, the present invention provides a general enhancement method for blurred images based on a hidden space diffusion model, comprising the following steps:

S1: constructing a reconstruction training set and a reconstruction testing set;

s2: constructing a fine tuning training set and a fine tuning test set;

S3: constructing a hidden space diffusion model;

S4: and enhancing the fuzzy image in the fine adjustment test set by using the trained hidden space diffusion model to obtain an enhancement result.

Preferably, the step S1 specifically includes the following steps:

S11: selecting images to construct a reconstruction training set and a reconstruction testing set, wherein the images are clear image pairs matched at pixel level;

S12: the size of the pick image is adjusted to 2 ^K×2^K using nearest neighbor interpolation, where the value range of K is an integer of [7,9 ].

Preferably, in step S2, the fine tuning training set is formed by a pixel-level matched sharp-blurred image pair, wherein the blurred image is generated by sharp image degradation, and the construction of the fine tuning training set specifically includes the steps of:

S21: adjusting the size of the clear image to 2 ^K×2^K by using nearest neighbor interpolation, wherein the value range of K is an integer of [7,9 ];

S22: selecting the degradation type of the blurred image as dark image edge, and selecting the brightness adjustment factor as Wherein/>The value range of (2) is [0,1];

s23: selecting the degradation type of the blurred image as bright image edges, and executing the following steps:

S231: randomly selecting a point at one quarter of the center of the blurred image as a circle center, and randomly selecting the minimum length and width values of the blurred image as diameters to serve as a random center circle;

s232: a fuzzy mask area is built in the edge of the fuzzy image outside the random center circle, and the fuzzy mask area is filled with the mean value of the fuzzy image;

S233: using the absolute value of the difference between the diameter of the random center circle and the maximum value from the center to the boundary as a Gaussian blur radius, wherein two thirds of the Gaussian blur radius is the Gaussian blur standard deviation, and adding Gaussian blur to a blur mask area;

s234: expanding the fuzzy mask area added with the Gaussian blur to three channels, and carrying out brightness weighting to obtain a brightness weighted mask area;

s235: adding the obtained brightness weighted mask area with the clear image to obtain a blurred image with bright edges;

S24: selecting the degradation type of the blurred image as low contrast, comprising the following steps:

s241: convolving three channels of the clear image with two-dimensional Gaussian kernels with the size of 70 multiplied by 70 and the standard deviation of 1-pi;

S242: the convolved channels are weighted according to the transparency parameter and then added channel by channel with the blurred image after blurring degradation.

Preferably, the blurred image or the blurred image generated by degradation of the sharp image in step S242 is selected to construct a fine tuning dataset.

Preferably, in step S3, the hidden space diffusion model includes an automatic encoder, a hidden space denoising network, and a loss function, and the constructing the hidden space diffusion model specifically includes the following steps:

s31: constructing a locally fine-tuned vector quantization variation automatic encoder as an automatic encoder, including a fine-tunable encoder and a decoder;

S32: constructing a local fine-tuning hidden space fine-tuning denoising network, wherein the fine-tuning denoising network comprises a denoising downsampling module, an intermediate module and a denoising upsampling module;

S33: constructing a loss function of the hidden space diffusion model;

S34: training a hidden space diffusion model, updating parameters and storing.

Preferably, in step S31, constructing the trimmable encoder includes the steps of:

S311: the convolution kernel is of the size S, the step length is 1, a feature map with the space dimension of 2 ^K*2^K x 128 is extracted from an image of 2 ^K*2^K x 3, wherein the value range of K is an integer of [7,9], and the value range of S is an integer of [3,5 ];

S312: performing three downsampling by an encoder downsampling module, wherein the encoder downsampling module comprises an encoder local fine tuning residual error network and an encoder downsampling layer, and the spatial dimension of a feature map is converted from 2 ^K*2^K x 128 to 2 ^K-3*2^K-3 x 512, wherein the value range of K is an integer of [7,9 ];

The encoder local fine tuning residual error network comprises two groups of encoder local fine tuning residual error modules which are connected in series, wherein each encoder local fine tuning residual error module consists of three groups of normalization layers and convolution layers which are connected in series, the convolution kernel size is S, and the step length is 1; the normalization layer and the convolution layer at the bottommost layer of the encoder local fine tuning residual error module are fine-tuning layers; the convolution kernel of the downsampling layer has the size of S x S, and the step length is 2; wherein the value range of S is an integer of [3,5 ];

S313: performing feature integration by using an encoder integration feature module, wherein the encoder integration feature module comprises an encoder local fine tuning residual error network and an encoder grouping attention module; the encoder grouping attention module consists of a normalization layer and a plurality of convolution layers with the convolution kernel size 1*1 and the step length of 1;

S314: outputting hidden space codes by using a convolution layer with the convolution kernel size of S and the step length of 1, and outputting an image with the size of 2 ^K ^-3*2^K-3 x 3, wherein the value range of K is an integer of [7,9], and the value range of S is an integer of [3,5 ];

the decoder is constructed by the following steps:

S301: using a convolution kernel with the size of S and the step length of 1 to obtain a hidden space coding image characteristic diagram with the space dimension of 2 ^K-3*2^K-3 and 512, wherein the value range of K is an integer of [7,9], and the value range of S is an integer of [3,5 ];

s302: feature integration is performed by using a decoder integration feature module, wherein the decoder integration feature module comprises a decoder residual error network and a decoder grouping attention module;

the decoder residual network comprises two groups of decoder residual modules which are formed by connecting two groups of normalization layers and convolution layers in series, the convolution kernel size is S, and the step length is 1; wherein the value range of S is an integer of [3,5 ];

the decoder grouping attention module consists of a normalization layer and a plurality of convolution layers with the convolution kernel size 1*1 and the step length of 1;

S303: three upsampling is carried out through a decoder upsampling module, wherein the decoder upsampling module comprises a decoder residual error network and a decoder upsampling layer, and the spatial dimension of the feature map is converted from 2 ^K-3*2^K-3 x 512 to 2 ^K*2^K x 512, wherein the value range of K is an integer of [7,9 ];

The end of the first decoder residual module in the decoder residual network is connected in series with a convolution layer for short circuit connection, the convolution kernel used by the convolution layer is 1*1, and the step length is 1; the structure of the remaining decoder residual modules used is the same as that used in step 302; the convolution kernel size of the upper sampling layer of the decoder is S, the step length is 1, and the value range of S is an integer of [3,5 ];

s304: outputting a decoded blurred image or a clear image by using a convolution layer with a convolution kernel size of S and a step length of 1, wherein the size is 2 ^K*2^K x 3, the value range of K is an integer of [7,9], and the value range of S is an integer of [3,5 ];

The trimmable automatic encoder encodes the image into hidden space, and the encoder local trim residual network is used for local trim.

Preferably, in step S32, the construction of the denoising downsampling module specifically includes the following steps:

S321: a convolution kernel with the size of S and the step length of 1 is adopted, the convolution kernel is formed by connecting clear and fuzzy image pairs in series, a noise-containing feature image with the space dimension of 2 ^K-3*2^K-3 x 6 is formed after noise addition, a feature image with the space dimension of 2 ^K-3*2^K-3 x 160 is extracted, wherein the value range of K is an integer of [7,9], and the value range of S is an integer of [3,5 ];

S322: feature integration is carried out by using two denoising local fine tuning residual modules, each denoising local fine tuning residual module consists of three groups of normalization layers and convolution layers which are connected in series, the convolution kernel size is S, the step length is 1, and the value range of S is an integer of [3,5 ]; the normalization layer and the convolution layer at the bottommost layer of the denoising local fine tuning residual error module are fine tuning layers, and training can be selectively performed;

S323: the method comprises the steps of performing three times of downsampling by using a denoising downsampling module, wherein the denoising downsampling module comprises a denoising downsampling layer and two groups of denoising self-attention residual error networks with local fine adjustment of denoising downsampling, and the spatial dimension of a feature map is converted from 2 ^K-3*2^K-3 x 160 to 2 ^K-6*2^K-6 x 640, wherein the value range of K is an integer of [7,9 ];

The convolution kernel of the denoising downsampling layer has the size of S and the step length of 2; wherein the value range of S is an integer of [3,5 ];

The self-attention residual error network of the local fine adjustment of the denoising downsampling consists of a denoising local fine adjustment residual error module, a jump connection and a denoising self-attention module; the jump connection is realized by a convolution layer with a convolution kernel 1*1 and a step length of 1; the denoising self-attention module consists of a normalization layer and a plurality of convolution layers with the convolution kernel size of 1 and the step length of 1;

The middle module consists of two denoising local fine tuning residual error modules and a denoising self-attention module;

the method comprises the steps of performing three times of upsampling by using a denoising upsampling module, wherein the denoising upsampling module consists of a denoising upsampling layer and two groups of denoising upsampling self-attention residual error networks; transforming the space dimension of the feature map from 2 ^K-6*2^K-6 x 640 to 2 ^K-3*2^K-3 x 160, wherein the value range of K is [7,9];

S324: and outputting the decoded image by using a convolution layer with the convolution kernel size of S and the step length of 1, wherein the size of the decoded image is 2 ^K-3*2^K-3 x 3, the value range of K is an integer of [7,9], and the value range of S is an integer of [3,5 ].

Preferably, in step S33, the loss functions include a loss function for reconstructing a training process and a loss function for fine-tuning the training process;

the loss function used for reconstructing the training process consists of a perception loss function of a fine-tuning automatic encoder, an antagonism loss function based on a patch, a pixel space L1 loss function and a diffusion model loss function of a fine-tuning denoising network;

The loss function used for the fine tuning training process consists of a fine tuning loss function of the fine tuning automatic encoder and a fine tuning loss function of the fine tuning denoising network;

Fine tuning loss function of fine-tuning automatic encoder Representation of/>The calculation formula of (2) is as follows:

(1)

Wherein, Is a high quality image,/>Is a blurred image,/>Is a reconstruction encoder trained with a reconstruction training dataset and with fine-tuning layer disabled,/>The fine tuning encoder is trained by utilizing a fine tuning training data set, and a fine tuning part participates in training; /(I)Representing 1 norm of the high quality image after passing through the reconstruction encoder and the blurred image after passing through the fine tuning encoder; /(I)Representing 1-norm after high quality image passes through reconstruction encoder and after high quality image passes through fine tuning encoder,/>Representing the mean;

fine tuning loss function of fine-tuning denoising network Representation of/>The calculation formula of (2) is as follows:

(2)

Wherein, Representing the moment of the denoising step,/>Representing the noise-containing hidden space characteristics at time t,/>Representation/>Noise actually added at moment,/>Representation denoising network prediction/>Time noise,/>Representation/>European norms of moment actual noise and predicted noise,/>Representing the mean.

Preferably, in step S34, the following sub-steps are included:

S341: training a trimmable automatic encoder disabled by a trimmable layer using a reconstructed training set, comprising in particular the steps of:

s3410: inputting the clear images in the reconstructed training set into a reconstructed self-encoder, and generating reconstructed clear images by forward propagation;

s3411: calculating a perception loss function, an antagonism loss function based on a patch and a pixel space L1 loss function according to the clear image in the reconstructed training set and the input clear image;

S3412: performing back propagation to obtain an initial learning rate Q, wherein the value range of Q is [5e ^-4,5e^-6 ], and performing parameter optimization;

S3413: repeating the steps S3410 to S3412, recording the calculated loss function value output in the step S3411, traversing the images of all the reconstructed training sets into one round of Epoch at a time, and drawing loss curves of different rounds of Epoch according to the recorded loss function value;

S3414: saving the reconstructed from the encoder;

S342: training a trimmable denoising network disabled by a trimmable layer by using a reconstruction training set, specifically comprising the following steps:

S3420: inputting the clear images in the reconstructed training set into a reconstructed self-encoder trained in step S3414 to obtain a reconstructed hidden space feature map

S3421: according to DDPM algorithm, defining a Markov chain process with length T according to noise intensityGradually adding noise according to the definition in the formula (3) to generate a noise-containing characteristic diagram;

(3)

Wherein the method comprises the steps of Is random noise,/>Representing time steps,/>Representing an initial state of the diffusion process;

S3422: generating a hidden space noise-containing feature map according to the step S3421, inputting the noise-containing hidden space feature map into a reconstruction denoising network, and generating a denoising reconstruction hidden space feature map by forward propagation;

s3423: calculating a diffusion model loss function according to the noise-containing hidden space feature map and the denoising hidden space feature map;

s3424: performing back propagation to obtain an initial learning rate Q, wherein the value range of Q is [5e ^-4,5e^-6 ], and performing parameter optimization;

s3425: repeating the steps S3420-S3424, recording the calculated loss function value output in the step S3423, traversing the images of all the reconstructed training sets into one round of Epoch once, and drawing loss curves of different rounds of Epoch according to the recorded loss function value;

S3426: saving the trained reconstruction denoising network;

s343: the method for training the trimmable automatic encoder which is available to the trimmable layer by using the trimming training set specifically comprises the following steps:

s3430: loading the trained reconstruction encoder model weight, and copying the weights of the last group of normalization layers and convolution layers of the local fine tuning residual error module of the disabled fine tuning layer to the fine tuning layer;

s3431: inputting the blurred image into a fine tuning self-encoder, and generating a fine tuning enhanced clear image by forward propagation;

S3432: calculating a fine tuning loss function of the fine-tuning automatic encoder according to the fine-tuning enhanced image and the corresponding clear and fuzzy images;

S3433: performing back propagation to obtain an initial learning rate Q, wherein the value range of Q is [5e ^-4,5e^-6 ], and performing parameter optimization;

S3434: repeating the steps S3430-S3433, recording the calculated loss function value output in the step S3433, traversing the images of all the reconstructed training sets into one round of Epoch once, and drawing loss curves of different rounds of Epoch according to the recorded loss function value;

s3435: saving the trained fine tuning self-encoder;

S344: the method for training the trimmable denoising network available to the trimmable layer by using the trimming training set specifically comprises the following steps:

s3440: loading the trained reconstruction denoising network model weight, and copying the weights of the last group of normalization layers and convolution layers of the local trimming residual error module of the disabled trimmable layer to the trimmable layer;

S3441: inputting the blurred image into a fine tuning self-encoder trained in the step S3435 to obtain a hidden space feature map;

S3442: generating a hidden space noise-containing feature map according to the step S3421, inputting the noise-containing feature map into a fine-tuning denoising network, and generating a denoised hidden space feature map by forward propagation;

s3443: calculating a fine tuning loss function of a fine tuning denoising network according to the noise-containing hidden space feature map and the denoising hidden space feature map;

S3444: performing back propagation to obtain an initial learning rate Q, wherein the value range of Q is [5e-4,5e-6], and performing parameter optimization;

S3445: repeating the steps S3440-S3444, recording the calculated loss function value output in the step S3443, traversing the images of all the reconstructed training sets into one round of Epoch once, and drawing loss curves of different rounds of Epoch according to the recorded loss function value;

s3446: and saving the trained fine tuning denoising network.

Therefore, the universal fuzzy image enhancement method based on the hidden space diffusion model has the following beneficial effects:

(1) The hidden space diffusion model is trained by adopting a strategy of reconstruction and fine tuning, the fuzzy image is directly enhanced by utilizing the trained model, compared with the existing image enhancement method, the enhancement result can effectively enhance the fuzzy image, the image detail is well restored, and the generated result is true and reliable.

(2) The invention adopts the local fine-tuning module, and has the advantages of small quantity of model training parameters, high convergence speed, greatly reduced calculation cost of model fine tuning, high efficiency realization of algorithm generalization of enhancement tasks among different data sets, degradation types and image modes, and good universality while obtaining better image enhancement effect.

(3) In the reconstruction process, only unpaired images are needed, the paired image data needed in the fine tuning stage is small in scale, and the problems of high cost and high difficulty in paired data acquisition are well avoided.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

FIG. 1 is a schematic diagram of the overall model of a general enhancement method of blurred images based on a hidden space diffusion model according to the present invention;

FIG. 2 is a training process of reconstructing a self-encoder and reconstructing a denoising network in a reconstruction training phase of the present invention;

FIG. 3 is a training process of the fine self-encoder of the present invention;

FIG. 4 is a training process of the fine-tuning denoising network of the present invention;

FIG. 5 is a general structure of a local trimmable module in a trim training phase of the present invention;

FIG. 6 is a schematic flow chart of an embodiment of the invention.

Detailed Description

The technical scheme of the invention is further described below through the attached drawings and the embodiments.

Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs.

As used herein, the word "comprising" or "comprises" and the like means that elements preceding the word encompass the elements recited after the word, and not exclude the possibility of also encompassing other elements. The terms "inner," "outer," "upper," "lower," and the like are used for convenience in describing and simplifying the description based on the orientation or positional relationship shown in the drawings, and do not denote or imply that the devices or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the invention, but the relative positional relationship may be changed when the absolute position of the object to be described is changed accordingly. In the present invention, unless explicitly specified and limited otherwise, the term "attached" and the like should be construed broadly, and may be, for example, fixedly attached, detachably attached, or integrally formed; can be directly connected or indirectly connected through an intermediate medium, and can be communicated with the inside of two elements or the interaction relationship of the two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.

As shown in FIGS. 1-6, the present invention provides a general enhancement method for blurred images based on a hidden space diffusion model, whereinRepresenting a hidden space diffusion model; /(I)And/>Representing the reconstruction encoder that the fine tuning layer does not participate in training in the reconstruction training stage and the fine tuning encoder that the fine tuning layer participates in training in the fine tuning training stage respectively,/>Decoder representing a trimmable automatic encoder,/>And/>The codes can be respectively used for finely adjusting the reconstruction denoising network of the denoising network in the reconstruction training stage and the fine adjustment denoising network in the fine adjustment training stage; x represents an image in the reconstruction training set, x _l represents a blurred image in the fine-tuning training set, and x _h represents a blurred image in the fine-tuning training set; z ₀ denotes the hidden space feature at the start of the diffusion process, z _T denotes the hidden space feature at the end of the diffusion process, and z _t denotes the hidden space feature at time t of the diffusion process. The method comprises the following steps:

s1: constructing a reconstruction training set and a reconstruction testing set; the step S1 specifically comprises the following steps:

S2: constructing a fine tuning training set and a fine tuning test set; in step S2, the fine tuning training set is formed by a pixel-level matched sharp-blurred image pair, wherein the blurred image is generated by sharp image degradation, and the construction of the fine tuning training set specifically includes the steps of:

s232: a fuzzy mask area is constructed in the edge of the fuzzy image outside the random center circle, and the fuzzy image average value is used for filling;

And selecting a blurred image or a clear image, and constructing a fine adjustment data set through the blurred image generated by degradation in the step S242.

S3: constructing a hidden space diffusion model; in step S3, the hidden space diffusion model includes an automatic encoder, a hidden space denoising network, and a loss function, and the construction of the hidden space diffusion model specifically includes the following steps:

S31: constructing a locally fine-tuned vector quantization variation automatic encoder as an automatic encoder, including a fine-tunable encoder and a decoder; in step S31, constructing a trimmable encoder specifically includes the steps of:

S311: extracting a hidden space feature map with a space dimension of 2 ^K*2^K x 128 from an image with a convolution kernel size of S, a step length of 1 and a space dimension of 2 ^K*2^K x 3, wherein the value range of K is an integer of [7,9], and the value range of S is an integer of [3,5 ];

S313: the encoder integration feature module performs feature integration and comprises an encoder local fine tuning residual error network and an encoder grouping attention module; the encoder grouping attention module consists of a normalization layer and a plurality of convolution layers with the convolution kernel size 1*1 and the step length of 1;

the decoder is constructed by the following steps:

S302: using a decoder integration feature module integration feature, the decoder integration feature module comprising a decoder residual network and a decoder group attention module;

The end of the first decoder residual module of the decoder residual network is connected in series with a convolution layer for short circuit connection, the convolution kernel used by the convolution layer is 1*1, and the step length is 1; the decoder residual block used is the same structure as used in step 302; the convolution kernel size of the upper sampling layer of the decoder is S, the step length is 1, and the value range of S is an integer of [3,5 ];

S304: outputting a decoded blurred image or a decoded clear image by using a convolution layer with the convolution kernel size of S and the step length of 1, wherein the size is 2Kx.2Kx 3, the value range of K is an integer of [7,9], and the value range of S is an integer of [3,5 ];

S32: constructing a local fine-tuning hidden space fine-tuning denoising network, wherein the fine-tuning denoising network comprises a denoising downsampling module, an intermediate module and a denoising upsampling module; in step S32, the construction of the denoising downsampling module specifically includes the steps of:

S321: a convolution kernel with the size of S and the step length of 1 is adopted, the convolution kernel is formed by connecting clear and fuzzy image pairs in series, a noise-containing feature image with the space dimension of 2 ^K-3*2^K-3 x 6 is formed after noise addition, a hidden space feature image with the space dimension of 2 ^K-3*2^K-3 x 160 is extracted, wherein the value range of K is an integer of [7,9], and the value range of S is an integer of [3,5 ];

S322: integrating the features by using two denoising local fine tuning residual modules, wherein each denoising local fine tuning residual module consists of three groups of normalization layers and convolution layers which are connected in series, the convolution kernel size is S, the step length is 1, and the value range of S is an integer of [3,5 ]; the normalization layer and the convolution layer at the bottommost layer of the denoising local fine tuning residual error module are fine tuning layers, and training can be selectively performed;

S33: constructing a loss function of the hidden space diffusion model; in step S33, the loss functions include a loss function for reconstructing a training process and a loss function for fine-tuning the training process;

(1)

Wherein, Is a high quality image,/>Is blurred image/>Is a reconstruction encoder trained with a reconstruction training dataset and available to the fine-tuning layer,/>A fine tuning encoder which is trained by utilizing a fine tuning training data set and can finely tune a part to participate in training; Representing 1 norm of the high quality image after passing through the reconstruction encoder and the blurred image after passing through the fine tuning encoder; /(I) Representing 1-norm after high quality image passes through reconstruction encoder and after high quality image passes through fine tuning encoder,/>Representing the mean;

(2)

S34: training a hidden space diffusion model, updating parameters and storing. In step S34, the following sub-steps are included:

s3410: inputting the clear images in the reconstructed training set into a reconstruction self-encoder, and generating reconstructed clear images by forward propagation;

S3411: calculating a perception loss function, a patch-based contrast loss function and a pixel space L1 loss function according to the clear image in the reconstructed training set and the input clear image;

S3414: saving the reconstructed from the encoder;

(3)

S3426: saving the trained reconstruction denoising network;

s3435: saving the trained fine tuning self-encoder;

s3446: and saving the trained fine tuning denoising network.

Examples

The embodiment is the application of a general enhancement method of a blurred image aiming at a hidden space diffusion model in the retina field; the flow chart of the fuzzy fundus image enhancement algorithm is shown in fig. 6, and specifically comprises the following steps:

step one: constructing a reconstruction training set and a reconstruction testing set;

Selecting clear retinal fundus images from the clinically collected retinal fundus images, and removing fundus image pairs with invisible fundus structures caused by over-high blur level and over-dark blur level; 10000 clear retina fundus images including normal or different disease-level diabetic retina fundus lesions are screened out, the size is adjusted to 512 x 512, and a 10000-pair pixel-level matched reconstruction training set is formed; clear retina fundus images are selected, and fundus image pairs with invisible fundus structures caused by over-high and over-dark blur grades are removed;

Selecting clear retinal fundus images from the clinically collected retinal fundus images, and removing fundus image pairs with invisible fundus structures caused by over-high blur level and over-dark blur level; screening 100 clear retina fundus images comprising normal or different diseased level diabetic retina fundus lesions, and adjusting the size to 512 x 512 to form a 100-pixel level matched reconstruction test set; clear retina fundus images are selected, and fundus image pairs with invisible fundus structures caused by over-high and over-dark blur grades are removed.

Step two: constructing a fine tuning training set and a fine tuning test set;

Step 2.1: selecting a clear retinal fundus image

Selecting clear retinal fundus images from the clinically collected retinal fundus images, and removing fundus image pairs with invisible fundus structures caused by over-high blur level and over-dark blur level; screening 50 clear retina fundus images comprising normal glaucoma or glaucoma with different disease levels, and adjusting the size to 512 x 512;

step 2.2: RIO mask for generating clear retinal fundus image

Converting the image into a gray image, normalizing the gray image to be within the range of 0,1 pixel by pixel, and binarizing the gray image into a black-and-white image with the threshold value of 5/51; performing dilation and erosion operations on the binary image using morphological operations to remove noise and fill holes; expanding the generated ROI mask channel number into three channels, adjusting the size to 512 multiplied by 512 pixels, and storing the ROI mask;

Step 2.2: degradation to produce a blurred retinal fundus image with an over-bright edge

Randomly selecting a point at one quarter of the center of the image as a circle center, and randomly selecting a minimum value close to the length and width of the image as a diameter to be used as a random center circle; constructing a fuzzy mask area outside a random center circle and in the image edge, and filling the fuzzy mask area with an image mean value; using the absolute value of the difference between the diameter of the random center circle and the maximum value from the center to the boundary as a Gaussian blur radius, and adding Gaussian blur to a blur mask area by using two thirds of the absolute value as a Gaussian blur standard deviation; expanding the fuzzy mask area added with the Gaussian blur to three channels, and carrying out brightness weighting to obtain a brightness weighted mask area; adding the obtained brightness weighted mask region with the original image, and multiplying the obtained brightness weighted mask region with the ROI mask corresponding to the image; the blurred image generated by the degradation is saved.

Step three: the method comprises the following specific steps of:

Step 3.1: building a fine-tunable encoder:

extracting a latent space retinal fundus image feature map with a space dimension of 512 x 128 from a retinal fundus image of 512 x 3, wherein the convolution kernel size is 3*3, and the step length is 1;

Performing three downsampling by using a downsampling module consisting of a local fine tuning residual network and a downsampling layer, wherein the spatial dimension of the feature map is changed from 512 to 128 to 512; the local fine tuning residual network consists of two groups of local fine tuning residual modules which are connected in series, each local fine tuning residual module consists of three groups of normalization layers and convolution layers which are connected in series, the convolution kernel size is 3*3, and the step length is 1; the normalization layer and the convolution layer at the bottommost layer of each module are fine-tuning layers; the convolution kernel size of the downsampling layer is 3*3, and the step length is 2;

Further integrating features using a local fine-tuning residual network and a packet attention module, wherein the packet attention module is composed of a normalization layer and a convolution layer with a multilayer convolution kernel size 1*1 and a step size of 1;

using a convolution kernel of 3*3, outputting hidden space features with the space dimension of 128 x 3 by a convolution layer with the step length of 1;

step 3.2: constructing a decoder:

Extracting a hidden space coding image characteristic map with a convolution kernel size of 3*3 and a step length of 1, wherein the space dimension of the hidden space coding image characteristic map is 128 x 512; the method comprises the steps of further integrating features by using a residual network and a grouping attention module, wherein the residual network is formed by connecting two groups of residual modules in series, each residual module is formed by connecting two groups of normalization layers and convolution layers in series, the convolution kernel size is 3*3, and the step length is 1; the grouping attention module consists of a normalization layer and a plurality of convolution layers with the convolution kernel size 1*1 and the step length of 1;

Three upsampling steps are performed by using a residual network and an upsampling layer, and the space dimension of the feature map is changed from 128 to 512; the end of the first residual module of the residual network is connected in series with a convolution layer for short circuit connection, and the convolution layer uses convolution operation with a convolution kernel of 1*1 and a step length of 1; the convolution kernel size of the up-sampling layer is 3*3, and the step length is 1;

Using a convolution layer with a convolution kernel size of 3*3 and a step length of 1 to output a decoded image, wherein the size is 512 x 3;

Step 3.3 construction of the loss function

The reconstruction loss function consists of a perception loss function of the fine-tuning automatic encoder, a patch-based contrast loss function and a pixel space L1 loss function; the fine tuning loss function is composed of fine tuning loss functions of a fine-tuning automatic encoder;

Fine tuning loss function of fine-tuning automatic encoder The representation is that the quality of the generated image is improved on the premise of ensuring better extraction of the fuzzy image characteristics; /(I)The calculation formula of (2) is shown as (4):

(4)

Wherein, Is a high quality image,/>Is a blurred image,/>Is a reconstruction encoder trained with a reconstruction training dataset and available to the fine-tuning layer,/>A fine tuning encoder which is trained by utilizing a fine tuning training data set and can finely tune a part to participate in training; Representing 1 norm of the high quality image after passing through the reconstruction encoder and the blurred image after passing through the fine tuning encoder; /(I) Representing 1-norm after high quality image passes through reconstruction encoder and after high quality image passes through fine tuning encoder,/>Representing the mean;

step four: the method for constructing the fine-tuning denoising network comprises the following specific steps of:

step 4.1: and (3) constructing a downsampling module:

The convolution kernel is S, the step length is 1, the method is formed by extracting clear and fuzzy retina fundus images in series, a noise-containing feature image with the space dimension of 128 x 6 is formed after noise addition, and the retina fundus image hidden space feature image with the space dimension of 128 x 160 is extracted through a convolution layer with the convolution kernel size of 3*3 and the step length of 1;

further integrating features by using two local fine tuning residual modules, wherein each local fine tuning residual module consists of three groups of normalization layers and convolution layers which are connected in series, the convolution kernel size is 3*3, and the step length is 1; the normalization layer and the convolution layer at the bottommost layer of each module are fine-tuning layers, and training can be selectively performed;

Three downsampling is carried out by using a downsampling layer and two groups of downsampling local fine-tuning self-attention residual error networks, and the feature space dimension is changed from 128 x 160 to 32 x 640; the convolution kernel size of the downsampling layer is 3*3, and the step length is 2; each downsampled local fine tuning self-attention residual error network consists of a downsampled convolution layer, a local fine tuning residual error module, a jump connection and a self-attention module; the convolution layer of the downsampling convolution layer is 3*3, and the step length is 2; the used local fine tuning residual error module is consistent with the fine tuning automatic encoder; the jump connection is realized by a convolution layer with a convolution kernel 1*1 and a step length of 1; the self-attention module consists of a normalization layer and a plurality of convolution layers with the convolution kernel size of 1 and the step length of 1;

Step 4.2: building an intermediate module:

The middle module consists of two local fine tuning residual modules and a self-attention module, and the used modules are consistent with the construction of the downsampling module;

step 4.3: and (3) constructing an upsampling module:

Upsampling the spatial dimension of the feature map from 32 x 640 to 128 x 160 using a residual network, a hop connection, and a self-attention module; the residual network consists of two groups of normalization layers and convolution layers which are connected in series, the convolution kernel size is 3*3, and the step length is 1; the jump connection is realized by a convolution layer with a convolution kernel 1*1 and a step length of 1; the self-attention module consists of a normalization layer and a plurality of convolution layers with the convolution kernel size of 1 and the step length of 1;

Using a convolution layer with a convolution kernel size of 3*3 and a step length of 1 to output a decoded image, wherein the size is 128×128×3;

Step 4.4: construction of a loss function

The loss function of the fine-tuning denoising network consists of a diffusion model loss function of the fine-tuning denoising network for reconstruction and a fine-tuning loss function for fine-tuning;

fine tuning loss function of fine-tuning denoising network Representation, the goal is to predict/>, more accuratelyTime noise,/>The calculation formula of (2) is shown as (5):

(5)

Step five: training a model;

step 5.1: training a reconstructed self-encoder using a set of reconstruction training:

inputting a high-quality fundus image in the reconstruction data set into a reconstruction self-encoder, forward propagating to generate a reconstructed fundus image, and calculating a reconstruction loss function according to the reconstructed fundus image and the input fundus image; performing back propagation, and performing parameter optimization with an initial learning rate of 5 e-5; finally, saving the trained reconstruction from the encoder; the training platform is Ubuntu16.04, a Pytorch deep learning framework is adopted, and the GPU is utilized to accelerate training;

Step 5.2: training a reconstruction denoising network using a reconstruction training set:

Inputting the high-quality fundus image into a trained reconstruction denoising network to obtain a hidden space feature map; inputting the noise-containing hidden space feature map into a reconstruction denoising network, and generating a denoising reconstruction hidden space feature map by forward propagation; calculating a diffusion model loss function according to the noise-containing hidden space feature map and the denoising hidden space feature map; performing back propagation, and performing parameter optimization with an initial learning rate of 5 e-5; finally, saving the trained reconstruction denoising network; the training platform is Ubuntu16.04, a Pytorch deep learning framework is adopted, and the GPU is utilized to accelerate training;

step 5.3: training a fine self-encoder using a fine training set:

Loading the trained reconstructed self-encoder model weight, and copying the weights of the last group of normalization layers and convolution layers of the local fine tuning residual error module of the fine tuning self-encoder to the fine-tuning layers; inputting the fuzzy fundus images in the fine tuning training set into a fine tuning self-encoder, forward propagating to generate enhanced fundus images, and calculating a fine tuning loss function of the fine tuning self-encoder according to the reconstructed fundus images and the input clear fundus images; performing back propagation, and performing parameter optimization with an initial learning rate of 5 e-5; finally, saving the trained fine tuning self-encoder; the training platform is Ubuntu16.04, a Pytorch deep learning framework is adopted, and the GPU is utilized to accelerate training;

Step 5.3: training a fine-tuning denoising network using a fine-tuning training set:

Loading trained fine tuning self-encoder model weights and reconstructed denoising network model weights, and copying weights of a last group of normalization layers and convolution layers of a local fine tuning residual error module of a fine tuning disabled layer in the reconstructed denoising network model weights to the fine tuning layer; inputting the low-quality fundus image into a fine-tuning self-encoder to obtain a hidden space feature map; inputting the noise-containing characteristic map into a fine-tuning denoising network, generating a denoising hidden space characteristic map by forward propagation, and calculating a fine-tuning loss function of the fine-tuning denoising network according to the denoising hidden space characteristic map and the denoising hidden space characteristic map; performing back propagation, and performing parameter optimization with an initial learning rate of 5 e-5; finally, the trained fine tuning denoising network is stored; the training platform is Ubuntu16.04, a Pytorch deep learning framework is adopted, and the GPU is utilized to accelerate training.

Step six: and during testing, loading a trained fine-tuning self-encoder and a fine-tuning denoising network, and inputting the blurred retinal fundus image into a model to obtain an enhanced fundus image.

The method realizes the whole process of enhancing the blurred retinal fundus image. Experiments prove that the method can effectively strengthen the fuzzy fundus image, well restore details such as retinal fundus image vascularity and the like, and has a true and reliable generation result. Test results show that the method can be finely adjusted under the condition of low calculation cost, has better enhancement effect compared with other methods based on deep learning, efficiently realizes the algorithm generalization of enhancement tasks among different data sets, degradation types and image modes, and has better universality.

In summary, the above embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Therefore, the invention adopts the general fuzzy image enhancement method based on the hidden space diffusion model, adopts a strategy of reconstruction and fine adjustment to train the hidden space diffusion model, directly enhances the fuzzy image by using the trained model, and compared with the existing image enhancement method, the enhancement result effectively realizes the enhancement of the fuzzy image, well restores the image details and has true and reliable generation result; only unpaired images are needed in the reconstruction process, and the paired image data needed in the fine tuning stage has small scale, so that the problems of high cost and high difficulty in paired data acquisition are well avoided.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention and not for limiting it, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that: the technical scheme of the invention can be modified or replaced by the same, and the modified technical scheme cannot deviate from the spirit and scope of the technical scheme of the invention.

Claims

1. A general enhancement method for a blurred image based on a hidden space diffusion model is characterized by comprising the following steps of: the method comprises the following steps:

s2: constructing a fine tuning training set and a fine tuning test set;

S3: constructing a hidden space diffusion model;

2. The general enhancement method for blurred images based on the hidden space diffusion model as claimed in claim 1, wherein: the step S1 specifically comprises the following steps:

3. The general enhancement method for blurred images based on the hidden space diffusion model as claimed in claim 1, wherein: in step S2, the fine tuning training set is formed by a pixel-level matched sharp-blurred image pair, wherein the blurred image is generated by sharp image degradation, and the construction of the fine tuning training set specifically includes the steps of:

4. A general enhancement method for blurred images based on a hidden space diffusion model according to claim 3, characterized in that: and selecting a blurred image or a clear image, and constructing a fine adjustment data set through the blurred image generated by degradation in the step S242.

5. The general enhancement method for blurred images based on the hidden space diffusion model as claimed in claim 1, wherein: in step S3, the hidden space diffusion model includes an automatic encoder, a hidden space denoising network, and a loss function, and the construction of the hidden space diffusion model specifically includes the following steps:

S33: constructing a loss function of the hidden space diffusion model;

S34: training a hidden space diffusion model, updating parameters and storing.

6. The universal enhancement method for blurred images based on the hidden space diffusion model as claimed in claim 5, wherein the universal enhancement method is characterized by comprising the following steps: in step S31, constructing a trimmable encoder specifically includes the steps of:

S312: performing three downsampling by an encoder downsampling module, wherein the encoder downsampling module comprises an encoder local fine tuning residual error network and a downsampling layer, and the spatial dimension of a feature map is converted from 2 ^K*2^K x 128 to 2 ^K-3*2^K-3 x 512, wherein the value range of K is an integer of [7,9 ];

S314: outputting hidden space codes by using a convolution layer with the convolution kernel size of S and the step length of 1, and outputting an image with the size of 2 ^K-3*2^K ^-3 x 3, wherein the value range of K is an integer of [7,9], and the value range of S is an integer of [3,5 ];

the decoder is constructed by the following steps:

The end of the first decoder residual module of the decoder residual network is connected in series with a convolution layer for short circuit connection, the convolution kernel used by the convolution layer is 1*1, and the step length is 1; the remaining decoder residual block structure used is the same as that used in step 302; the convolution kernel of the up-sampling layer has the size of S.S, the step length is 1, and the value range of S is an integer of [3,5 ];

S304: outputting a decoded blurred image or a decoded clear image by using a convolution layer with the convolution kernel size of S and the step length of 1, wherein the size of the convolution layer is 2 ^K*2^K x 3, the value range of K is an integer of [7,9], and the value range of S is an integer of [3,5 ];

7. The universal enhancement method for blurred images based on the hidden space diffusion model as claimed in claim 6, wherein the universal enhancement method is characterized by comprising the following steps: in step S32, the construction of the denoising downsampling module specifically includes the steps of:

s323: the method comprises the steps of performing three times of downsampling by using a denoising downsampling module, wherein the denoising downsampling module comprises a denoising downsampling layer and two groups of self-attention residual error networks with local fine adjustment of denoising downsampling, and the spatial dimension of a feature map is converted from 2 ^K-3*2^K-3 x 160 to 2 ^K-6*2^K-6 x 640, wherein the value range of K is an integer of [7,9 ];

8. The universal enhancement method for blurred images based on the hidden space diffusion model as claimed in claim 7, wherein: in step S33, the loss functions include a loss function for reconstructing a training process and a loss function for fine-tuning the training process;

(1)

Wherein, Is a high quality image,/>Is a blurred image,/>Is a reconstruction encoder trained with a reconstruction training dataset and with fine-tuning layer disabled,/>The fine tuning encoder is trained by utilizing a fine tuning training data set, and a fine tuning part participates in training; Representing 1 norm of the high quality image after passing through the reconstruction encoder and the blurred image after passing through the fine tuning encoder; /(I) Representing 1-norm after high quality image passes through reconstruction encoder and after high quality image passes through fine tuning encoder,/>Representing the mean;

(2)

9. The universal enhancement method for blurred images based on the hidden space diffusion model as claimed in claim 8, wherein the universal enhancement method is characterized by comprising the following steps: in step S34, the following sub-steps are included:

s3410: inputting the clear images in the reconstruction training set into a reconstruction self-encoder, and generating reconstructed clear images by forward propagation;

S3414: saving the reconstructed from the encoder;

(3)

S3426: saving the trained reconstruction denoising network;

s3435: saving the trained fine tuning self-encoder;

s3446: and saving the trained fine tuning denoising network.