CN113763300A

CN113763300A - Multi-focus image fusion method combining depth context and convolution condition random field

Info

Publication number: CN113763300A
Application number: CN202111047787.3A
Authority: CN
Inventors: 徐川; 杨威; 刘畅; 叶志伟; 张欢
Original assignee: Hubei University of Technology
Current assignee: Hubei University of Technology
Priority date: 2021-09-08
Filing date: 2021-09-08
Publication date: 2021-12-07
Anticipated expiration: 2041-09-08
Also published as: CN113763300B

Abstract

The invention provides a multi-focus image fusion method combining a depth context and a convolution condition random field, aiming at the problem that the traditional method cannot fully mine image focusing associated information to cause fusion detail distortion. And the advantage of dense convolutional neural network feature multiplexing is fully utilized, and the multi-focus source image is integrated to realize the cooperative focusing feature detection. And aggregating the global context information of different focusing areas by adopting a multi-scale pyramid pooling strategy, enhancing the distinguishing capability of focusing and defocusing, and obtaining a rough fusion probability decision diagram. And further optimizing the image by adopting a convolution conditional random field to obtain a refined probability decision diagram, and finally obtaining a fusion image with maintained details. The fusion method is subjectively and objectively evaluated by utilizing the public data set, and experimental results show that the method has a good fusion effect, can fully mine focus related information, and can keep enough image details.

Description

Multi-focus image fusion method combining depth context and convolution condition random field

Technical Field

The invention relates to the technical field of deep learning image processing, in particular to a multi-focus image fusion method combining a depth context and a convolution condition random field.

Background

In optical imaging, only a local area in an image can be focused due to the problem of lens depth of field, and a clear image covering the whole scene is difficult to obtain. The multi-focus image fusion technology fuses a plurality of local focus images into a full-focus clear image by extracting complementary information of the local focus images, so that the image quality is enhanced, the visual understanding is facilitated, and the utilization rate of image information is improved. At present, multi-focus image fusion is widely applied to the fields of medical microscopic imaging, machine vision measurement, machine identification, military security and the like.

Generally, the multi-focus image fusion method can be divided into three methods: a transform domain based fusion method, a spatial domain based fusion method and a deep learning based fusion method. Among them, the image fusion method based on Multi-Scale Transform (MST) includes an algorithm based on laplacian, an algorithm based on wavelet Transform, an algorithm based on Non-Subsampled contour Transform (NSCT), and the like. The fusion process mainly comprises three steps: a. decomposing a source image into a high-frequency component and a low-frequency component according to the multi-scale characteristics of the image; b. selecting different fusion rules to obtain high fusion and low fusion mappings; c. and obtaining the final fusion mapping through the inverse MST. However, the MST-based method has a problem of spatial inconsistency during the transformation fusion process, which is prone to cause distortion of different degrees. The image fusion method based on the spatial domain mainly performs image fusion through linear combination, and can be generally divided into three categories: pixel-based, block-based, and target region-based methods. However, image fusion is performed by using gradient related information of pixels or image blocks, which is prone to introduce artifact blocks into the fusion result, resulting in poor effect. Typical spatial domain Fusion methods include a Fusion method based on a Guided Filtering Image Fusion (GFF) and an Image Matting (IFM), and although the Fusion method is good in the aspect of feature extraction and detail expression of an Image, it is difficult to manually set an ideal Fusion rule. In recent years, a multi-focus image fusion method based on deep learning appears, and the advantages of strong learning ability, strong generalization ability and good portability can be fully exerted. For example, Liu Yu et al (traditional Neural Network, CNN) -based methods (Yu Liu et al,2017) and Mei-jimmy et al (Lihua Mei et al,2017) based on spatial pyramid pooling, which are fused by image blocks, result in complex operation and blocking effect at image edges. Furthermore, gouging et al propose a method based on a full convolution neural network (xiaoping Guo et al,2018), which, although better solving the problem of image blocking, makes image blocks suitable for global features neglected and omitted because the correlation between context information is not fully considered.

Aiming at the problem that fusion details are distorted due to the fact that image focusing associated information cannot be sufficiently mined by a traditional method, multi-focus image fusion is taken as a two-classification segmentation problem of context association constraint, namely focusing and non-focusing areas are distinguished. A multi-focus image fusion method combining depth context and convolution condition random fields is provided. The invention adopts the deep dense convolution nerve deep fusion characteristic, excavates focusing information and learns the context information by utilizing the multi-scale space pyramid pooling. Furthermore, a Convolution Conditional Random Field (ConvCRFs) is introduced in the processing process of distinguishing the non-focusing area from the focusing area, so that the accuracy of the network probability prediction graph can be optimized, and the fusion effect can be further enhanced. Finally, the method is compared with 7 mainstream fusion methods in experiments, and the high efficiency and superiority of the method are verified from the aspects of subjective visual effect evaluation and objective comparison evaluation.

Disclosure of Invention

In order to solve the above problems, the present invention provides a multi-focus image fusion method combining depth context and convolution conditional random field, which is characterized by comprising the following steps:

step 1, two registered multi-focus source images I_AAnd I_BIntegrating the images into a multi-channel image, and inputting the multi-channel image into a depth dense convolution neural network for focus detection to obtain a multi-dimensional feature map;

step 2, extracting feature information of the multi-dimensional feature map obtained in the step 1 by using a pyramid pooling model to obtain a plurality of feature maps, and then merging the feature maps to obtain a rough two-classification probability decision map;

step 3, for the two-classification probability decision diagrams obtained in the step 2, a convolution conditional random field is used for realizing probability decision diagram refinement, and fusion is carried out according to the obtained refined probability decision diagrams and fusion calculation rules to obtain a final multi-focus image fusion result;

step 4, training by combining the depth intensive convolution neural network in the step 1 and the whole network formed by the pyramid pooling model in the step 2;

and 5, fusing the multi-focus images by using the trained integral network.

Further, the deep dense convolutional neural network comprises a plurality of dense blocks and transition layers, the dense blocks comprise a plurality of convolutions of 1 × 1 and 3 × 3, wherein the convolution of 1 × 1 is used for reducing the number of feature maps, and the convolution of 3 × 3 is used for extracting features; the transition layer is arranged between the two dense blocks, plays a role of connection and consists of a convolution layer and a pooling layer.

Further, the pyramid pooling model is processed as follows;

firstly, pooling the feature maps to a target size respectively, and then performing 1 × 1 convolution on the pooled result to reduce the channel to the original 1/N, wherein N is the layer number of the pyramid pooling model; then, each feature map in the previous step is up-sampled by bilinear interpolation to obtain the same size of the original feature map, then the original feature map and the feature map obtained by up-sampling are subjected to concatevation according to the channel dimension to obtain a channel which is twice as large as the channel of the original feature map, finally the channel is reduced to the original channel by 1 × 1 convolution, and the final feature map is the same as the size and the channel of the original feature map.

Further, the specific process of optimizing the binary probability decision diagram by using the convolution conditional random field in the step 3 is as follows;

inputting the rough two-classification probability decision diagram obtained in the step 2, and marking the rough two-classification probability decision diagram as O, wherein the process of processing by using the convolution conditional random field can be realized by

To perform the solution of the problem to be solved,

in order to optimize the probability decision graph, the specific analytical formula is as follows:

where K is { K ═ K₁，…，K_nThe symbol represents the random field and the symbol represents the random field,

representing a random field of

An optimized probability decision graph when the input image is O, Z (O) is a distribution function,

o' is an image of a random class on O;

function of energy

The expression is as follows:

in the above formula, N is the number of random fields, i is a random number smaller than N, and j is a random number which is not equal to i and smaller than N; wherein the function of unitary potential

The observed value for measuring the current pixel point i is

Then, the probability that the pixel point belongs to the class label in O is output from the back end of the whole network; binary potential function

Then, for measuring the probability of two events occurring simultaneously, the calculation formula is as follows:

for label-compatible terms, it constrains the conditions of conduction between pixels, energy being conducted to each other only under the same label conditions, i.e. when

And

when the labels are the same, the label is the same,

otherwise

In the latter addition term, ω^mIs a weight value parameter that is a function of,

is a feature function, fi, fj are pixels in an arbitrary feature spacei and j, as shown in the following equation:

the above formula represents the "intimacy" between different pixels in terms of the travel of the feature, the first term of the formula being called the surface kernel and the second term being called the smoothing kernel, W⁽¹⁾，θ_α，θ_βAs surface nuclear parameter, W⁽²⁾，θ_γThe parameters are smooth kernel parameters which are model parameters and are obtained through training, and p and I respectively represent the actual position and color of the pixel point.

Further, the fusion calculation rule in step 3 is designed as follows;

assuming that a binary image matrix refined by a probability decision diagram is W_AThe other half of the binary image is W_BI.e. W_B＝1-W_AThe source image is I_AAnd I_BTherefore, the calculation rule of the final fusion image F is:

F＝W_A·I_A+I_B·(1-W_A)。

further, the specific implementation manner in step 4 is as follows;

a) data set preparation: the data set used was the VOC2012 data set, which was divided into 4 general categories in total: vehicles, households, animals, humans, the data set is classified into 20 categories, and the background is added for 21 categories, and the total number of the categories comprises 17125 images; in order to simulate a multi-focus image, an image generation method in multi-focus image fusion image cutout in a dynamic scene is adopted, the real multi-focus condition is simulated by Gaussian blur, and a synthesized multi-focus image is obtained through five steps in total, wherein the synthesized multi-focus image is respectively Gaussian blur, image transformation, image inversion, pixel-by-pixel multiplication and pixel-by-pixel addition;

b) adjusting training parameters: multiple focus images I_AAnd multi-focus image I_BCombining the images into 6 channels, sending the images into an integral network for training, wherein the size of the images in the training stage is 256 x 256, Adam is used as a gradient optimizer, the learning rate of the Adam is 0.001, and the regularization term is0.9, 1 image of the network is sent each time, the total training times are n times, the loss function of the whole network adopts binary cross entropy, and in the testing stage, the input image is the original size of the image; the binary cross entropy formula is as follows:

for a single sample, wherein

Is the actual output of the sample, y_iIs the desired output of the sample.

The invention provides a multi-focus image fusion method combining a depth context and a convolution condition random field. The fusion method utilizes the advantage of feature multiplexing in a dense convolutional neural network, integrates multi-focus source images to achieve feature detection of cooperative focusing, then pools global information through a multi-scale pyramid, aggregates context information of different focusing areas, optimizes the distinguishing capability of scattered focusing and obtains a rough fusion probability decision diagram. And then, further optimizing by using a convolution conditional random field, obtaining an accurate probability decision diagram by roughly fusing the probability decision diagram, and finally generating a fusion result image with good details. The final experimental result shows that the method provided by the invention obtains a better result in vision, and also obtains the best result in four quantitative indexes, so that the effectiveness of the method provided by the invention is fully proved, and the method can be effectively applied to an automatic imaging vision task.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention.

FIG. 2 is a pyramid pooling multi-scale information extraction diagram of the present invention.

FIG. 3 is a probability decision diagram for multi-scale pyramid pooling cooperative detection, showing the source image, and the effect of the coarse segmentation diagram and the fine segmentation diagram thereof.

FIG. 4 shows the results of the Lytro-3 image fusion;

FIG. 5 is a graph comparing the results of the Lytro-17 experiment;

FIG. 6 is a Lytro-17 image residual pseudo-color contrast diagram.

Detailed Description

The technical scheme of the invention can adopt a computer software technology to realize an automatic operation process. The technical scheme of the invention is explained in detail in the following by combining the drawings and the embodiment. As shown in fig. 1, the flow of the technical solution of the embodiment includes the following steps:

step 1, detecting an image by utilizing a dense network to focus cooperatively:

firstly for the input source image I_AAnd I_BConsider two registered multi-focus source images I_AAnd I_BThe method integrates the correlation information into a multi-channel image, and performs the cooperative detection operation based on the dense convolutional network on the basis.

The core of the dense network cooperative focusing detection is a deep dense convolutional neural network which is based on a convolutional neural network, further exploration is carried out theoretically, and a deep dense concept is introduced. The most important components are as follows: the gradient loss reducing device comprises a transition layer and dense blocks, wherein the dense blocks are mutually connected, so that the gradient loss can be reduced, and the utilization rate of network characteristic information can be better improved. Inside each dense block, the input of each layer is the splice of the outputs of all previous layers, where the splice refers to the splice at the channel level. For example, stitching together a 56 × 56 × 64 data and a 56 × 56 × 32 data results in 56 × 56 × 96, where 96 is the sum of 64 and 32. The definition is equal to the growth rate, indicating that the output of each layer is a certain number of channels. Each dense block contains a number of substructures, taking the first one of the dense blocks as an example, the substructure is first a bottleeck layer that operates as a 1 x 1 convolution with the purpose of reducing the number of feature-maps, followed by a convolution kernel of 3 x 3 convolution layers for feature extraction. The number of dense blocks in the network is 4, and the structure is shown as the following table:

TABLE 1 dense block Structure Table

The transition layer is arranged between the two dense blocks, plays a role of connection and consists of a convolution layer and a pooling layer. As shown in fig. 1, the module has four dense blocks. In general, in a deep learning network, the deeper the network is, the more serious the gradient disappears, and thus dense blocks are introduced for mitigation. On the basis of a residual error network, the structure of the deep dense convolution neural network is more complex, and after dense connection is added, all layers have all previous layer characteristics, so that characteristic multiplexing can be effectively realized, and the transmission of multi-focus image characteristic information of different layers is optimized and utilized. And (4) obtaining a multi-dimensional characteristic diagram after the multi-channel image is subjected to dense network cooperative focusing detection.

Step 2, extracting pyramid pooling multi-scale information, which specifically comprises the following contents:

and (3) performing multi-scale feature information extraction on the multi-dimensional feature map obtained after the detection in the step (1) by utilizing pyramid pooling. Considering that the most difficult detection points of a multi-focus image are a focus area and a non-focus area, important global prior knowledge is not sufficiently obtained in a high-level convolutional neural network, and high-level features contain more semantics and less position information, in order to further reduce the loss of context information between different sub-areas, as shown in fig. 1, a pyramid pooling model is introduced, which is very common in the traditional machine learning feature extraction, and the main idea is to divide a sub-image into blocks of a plurality of scales, for example, one image is divided into 1 part, 4 parts, 8 parts and the like. The features are then extracted for each block and then fused together so that features of multiple scales are compatible.

As shown in FIG. 2, the method adopts 4 different pyramid scales, the number of layers of pyramid pooling modules and the size of each layer can be modified, the pyramid pooling modules in the invention are 4 layers, and the sizes of each layer are 1 × 1,2 × 2,3 × 3 and 6 × 6 respectively. Firstly, pooling the feature map to a target size, respectively, by operating the process of imaging as N × N blocks, then pooling each block, as in the first figure, the red brick is the feature map subjected to 1 × 1 pooling, and then performing 1 × 1 convolution on each pooled result to reduce the channel to the original 1/N, where N is 4. Then, each feature map of the previous step is up-sampled by bilinear interpolation to obtain the same size of the original feature map, and then the original feature map and the feature map obtained by up-sampling are classified according to channel dimension. The obtained channel is twice that of the original characteristic diagram, and finally the channel is reduced to the original channel by 1 × 1 convolution. The final profile is the same as the original profile size and channel.

The global context prior model is an effective global context prior model, contains information among different sub-regions with different scales, can effectively improve the capability of a network for utilizing global context information, enables the network to fully mine boundary information of focused and unfocused regions, embeds context features of a scene difficult to fuse, and improves the fusion effect. And finally, merging the feature graphs to obtain a rough two-classification probability decision graph.

Step 3, refining the convolution conditional random field probability decision diagram, which specifically comprises the following steps:

although the depth-dense convolutional neural network in step 1 and the pyramid pooling model in step 2 have a good effect on the extraction of the global context information of the source image, there are pixels that are misclassified in the probability map, as shown in fig. 3. Therefore, in order to obtain more accurate and excellent segmentation capability, the probabilistic decision graph is optimized by using the convolution conditional random fields (ConvCRFs). The convolution Conditional Random Field is optimized based on the Fully connected Conditional Random Field (fullrfs). The coarse binary probability decision graph O input by step 2 can be passed through

To perform the solution of the problem to be solved,

where K is { K ═ K₁,…,K_nThe symbol represents the random field and the symbol represents the random field,

is a random field of

function of energy

The expression is as follows:

in the above formula, N is the number of random fields, i is a random number smaller than N, and j is a random number not equal to i but smaller than N. Wherein the function of unitary potential

The observed value for measuring the current pixel point i is

And (3) the probability that the pixel point belongs to the class label in the O is output from the rear end of the whole network formed by the deep dense convolutional neural network in the step (1) and the pyramid pooling model in the step (2). Binary potential function

And

when the labels are the same, the label is the same,

otherwise

is a characteristic function, f_i,f_jIs the feature vector of pixels i and j in an arbitrary feature space, as shown in the following formula:

the above formula represents the "intimacy" between different pixels in terms of the travel of the feature, the first term of the formula being referred to as the surface kernel and the second term being referred to as the smoothing kernel. W⁽¹⁾，θ_α，θ_βAs surface nuclear parameter, W⁽²⁾，θ_γFor smoothing the kernel parameters, these parameters are model parameters, and are obtained by segment training. p and I respectively represent the actual position and color value of the pixel point, and the color value is the pixel value.

The condition independence factor is added in the FullCRFs framework, so that the operation enables the GPU to effectively speculate by utilizing convolution operation of a convolution random condition field, and the feature extraction capability of the CNN and the modeling capability of a random field can be effectively combined, so that the convolution random condition field can effectively transmit information.

In order to realize accurate significance detection of a target region, global, local and boundary information of a probability map is integrated through a convolution random condition field, and a refined probability decision map can be effectively obtained. In the optimization process, a plurality of feature information of the probability map is calculated on the basis of convolution operation, the features are fused into a refined map through CRFs, and finally the accurate map optimized by ConvCRFs is obtained.

After refinement, assume that the binary image matrix after probability decision graph refinement is W_AThe other half of the binary image is W_BI.e. W_B＝1-W_A. Source image is I_AAnd I_BTherefore, the calculation rule of the final fusion image F is:

F＝W_A·I_A+I_B·(1-W_A)

step 4, performing network training on the deep dense neural network and the pyramid network, specifically comprising the following steps:

c) data set preparation: as used herein, the data set is the VOC2012 data set, which is divided into 4 general categories: vehicles, households, animals, humans, the data set was classified into 20 categories (plus background total 21 categories), containing 17125 images in total. In order to simulate a multi-focus image, the invention adopts an image generation method (Shutao Li et al,2013) of Listo et al for multi-focus image fusion image cutout in a dynamic scene, and utilizes Gaussian blur to simulate a real multi-focus condition. The total of five steps are used to obtain the synthesized multifocal image, which is respectively Gaussian blur, image transformation, image inversion, pixel-by-pixel multiplication and pixel-by-pixel addition.

d) Adjusting training parameters: multiple focus images I_AAnd multi-focus image I_BAnd combining the images into 6 channels, sending the images into the whole network for training, wherein the size of the images in the training stage is 256 × 256, Adam is used as a gradient optimizer, the learning rate of the Adam is 0.001, and the regularization term is 0.9. 1 sheet at a time of sending into the networkAnd (4) images, the total training times are 30, and the loss function adopts binary cross entropy. In the testing stage, the input image is the original size of the image. The binary cross entropy formula is as follows:

for a single sample, wherein

Is the actual output of the sample, y_iIs the desired output of the sample.

Step 5, comparing the method of the invention with the mainstream method, testing and analyzing, and specifically comprising the following steps: .

a) The comparison method comprises the following steps: in order to prove the superiority and effectiveness of the fusion method provided by the invention, a Lytro multi-focusing color image data set is selected for experiment. 7 mainstream image fusion methods are selected as comparison methods, which respectively comprise: non-subsampled contourlet Transform Fusion method (NSCT), guided Filter-based Fusion method (GFF), multi-focus Image Fusion Method (IFM) based on image matting, Bilateral Filter-based (CBF) Fusion method, Discrete Cosine Harmonic Wavelet Transform-based (DCHWT) Fusion method, convolutional neural Network-based (CNN) Fusion method, Pyramid pooling Network-based (PSPF) Fusion method

b) And (3) qualitative analysis: fig. 4 shows the visual fusion results of Lytro-3 by different fusion methods, from which it can be seen that "the edge of boy's ear" becomes more blurred in most of the comparison methods, but the edge structure of the fusion result graph obtained by the method herein is relatively clear, which can prove that the method herein has better edge information extraction capability. In order to further verify the effect of the method, the fusion result of the Lytro-17 and the pseudo-color difference graph of the fused image and the source image A are shown in FIGS. 5 and 6. It is easy to see that in different images, the less the trace of the focus area is left, the more image information indicating the focus part is extracted into the fused image, which means better fusion performance. As can be seen from fig. 5 and 6, the algorithm has less information under the image, and the CNN and IFM methods are all insufficient in the boundary region, which indicates that the fusion method proposed herein has a better edge region, and makes full use of the image context information to better detect the focus boundary. Compared with a comparison method, the method has a better fusion effect on subjective visual effect evaluation.

c) Quantitative comparison: in order to objectively prove the effectiveness of the method, four mainstream multi-focus image evaluation indexes are adopted as quantization indexes, namely mutual information (Q)_MI) Non-linearly related information entropy (Q)_NCIE) Edge retention (Q)_AB/F) Visual fidelity (Q)_VIF) And compared with the mainstream 7 methods. The experimental results are shown in tables 2 and 3, where table 2 shows the objective evaluation of the fusion results of 5 images, and table 3 shows the average result of the objective evaluation of 20 images. Wherein the bold values indicate the optimal results and the underlined values indicate the suboptimal results.

TABLE 2 Objective evaluation comparison of fusion results

TABLE 3 average objective evaluation comparison of fusion results

As can be seen from Table 2, the method herein achieves better results in all four objective evaluation indexes. Most of the methods provided by the invention are optimal in terms of scoring scores of indexes such as mutual information, nonlinear correlation information entropy, edge retention, visual information fidelity and the like, which means that the depth characteristics of multi-focus images can be well mined by adopting the provided dense convolutional network cooperative detection method so as to carry out focusing estimation, and the adopted space pyramid pooling module better constrains context information so as to enable the image edge focusing detection to be better, so that the overall fusion quality of the algorithm is higher, and more source image focusing information can be retained. As can be seen from the average evaluation results in table 3, the method also obtains sub-optimal results next to the CNN method in the visual information fidelity evaluation index. This also indicates that, in the combined situation, the method of the present invention has a better fusion effect than other methods.

d) Summary analysis: the invention provides a multi-focus image fusion method combining a depth context and a convolution condition random field. The fusion method utilizes the advantage of feature multiplexing in a dense convolutional neural network, integrates multi-focus source images to achieve feature detection of cooperative focusing, then pools global information through a multi-scale pyramid, aggregates context information of different focusing areas, optimizes the distinguishing capability of scattered focusing and obtains a rough fusion probability decision diagram. And then, further optimizing by using a convolution conditional random field, obtaining an accurate probability decision diagram by roughly fusing the probability decision diagram, and finally generating a fusion result image with good details. The final experimental result shows that the method provided by the invention obtains a better result in vision, and also obtains the best result in four quantitative indexes, so that the effectiveness of the method provided by the invention is fully proved, and the method can be effectively applied to an automatic imaging vision task.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims

1. A multi-focus image fusion method combining depth context and convolution conditional random fields is characterized by comprising the following steps:

and 5, fusing the multi-focus images by using the trained integral network.

2. The method of claim 1 for multi-focus image fusion combining depth context with convolutional conditional random fields, comprising: the deep dense convolutional neural network comprises a plurality of dense blocks and a transition layer, wherein the dense blocks comprise a plurality of convolutions of 1 × 1 and 3 × 3, the 1 × 1 convolution is used for reducing the number of feature maps, and the 3 × 3 convolution is used for extracting features; the transition layer is arranged between the two dense blocks, plays a role of connection and consists of a convolution layer and a pooling layer.

3. The method of claim 1 for multi-focus image fusion combining depth context with convolutional conditional random fields, comprising: the pyramid pooling model is processed as follows;

4. The method of claim 1 for multi-focus image fusion combining depth context with convolutional conditional random fields, comprising: the specific process of optimizing the binary probability decision diagram by using the convolution conditional random field in the step 3 is as follows;

To perform the solution of the problem to be solved,

where K is { K ═ K₁,…，K_nThe symbol represents the random field and the symbol represents the random field,

representing a random field of

o' is an image of a random class on O;

function of energy

The expression is as follows:

The observed value for measuring the current pixel point i is

And

when the labels are the same, the label is the same,

otherwise

the above formula represents the "intimacy" between different pixels in terms of the travel of the feature, the first term of the formula being called the surface kernel and the second term being called the smoothing kernel, W⁽¹⁾，θ_α，θ_βAs surface nuclear parameter, W⁽²⁾，θ_γThe parameters are smooth kernel parameters which are model parameters and are obtained through training, p and I respectively represent the actual position and the color value of the pixel point, and the color value is the pixel value.

5. The method of claim 1 for multi-focus image fusion combining depth context with convolutional conditional random fields, comprising: step 3, fusing calculation rules and designing as follows;

F＝W_A·I_A+I_B·(1-W_A)。

6. the method of claim 1 for multi-focus image fusion combining depth context with convolutional conditional random fields, comprising: the specific implementation manner in the step 4 is as follows;

b) adjusting training parameters: multiple focus images I_AAnd multi-focus image I_BCombining the images into 6 channels, sending the images into an overall network for training, wherein the size of the images in the training stage is 256 x 256, Adam is used as a gradient optimizer, the learning rate of the Adam is 0.001, the regularization term is 0.9, 1 image is sent into the network each time, the total training times are n times, the loss function of the overall network adopts binary cross entropy, and the input images are the original sizes of the images in the testing stage; the binary cross entropy formula is as follows:

for a single sample, wherein

Is the actual output of the sample, y_iIs the desired output of the sample.