CN113763300A - Multi-focus image fusion method combining depth context and convolution condition random field - Google Patents

Multi-focus image fusion method combining depth context and convolution condition random field Download PDF

Info

Publication number
CN113763300A
CN113763300A CN202111047787.3A CN202111047787A CN113763300A CN 113763300 A CN113763300 A CN 113763300A CN 202111047787 A CN202111047787 A CN 202111047787A CN 113763300 A CN113763300 A CN 113763300A
Authority
CN
China
Prior art keywords
image
fusion
focus
convolution
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111047787.3A
Other languages
Chinese (zh)
Other versions
CN113763300B (en
Inventor
徐川
杨威
刘畅
叶志伟
张欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei University of Technology
Original Assignee
Hubei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei University of Technology filed Critical Hubei University of Technology
Priority to CN202111047787.3A priority Critical patent/CN113763300B/en
Publication of CN113763300A publication Critical patent/CN113763300A/en
Application granted granted Critical
Publication of CN113763300B publication Critical patent/CN113763300B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a multi-focus image fusion method combining a depth context and a convolution condition random field, aiming at the problem that the traditional method cannot fully mine image focusing associated information to cause fusion detail distortion. And the advantage of dense convolutional neural network feature multiplexing is fully utilized, and the multi-focus source image is integrated to realize the cooperative focusing feature detection. And aggregating the global context information of different focusing areas by adopting a multi-scale pyramid pooling strategy, enhancing the distinguishing capability of focusing and defocusing, and obtaining a rough fusion probability decision diagram. And further optimizing the image by adopting a convolution conditional random field to obtain a refined probability decision diagram, and finally obtaining a fusion image with maintained details. The fusion method is subjectively and objectively evaluated by utilizing the public data set, and experimental results show that the method has a good fusion effect, can fully mine focus related information, and can keep enough image details.

Description

Multi-focus image fusion method combining depth context and convolution condition random field
Technical Field
The invention relates to the technical field of deep learning image processing, in particular to a multi-focus image fusion method combining a depth context and a convolution condition random field.
Background
In optical imaging, only a local area in an image can be focused due to the problem of lens depth of field, and a clear image covering the whole scene is difficult to obtain. The multi-focus image fusion technology fuses a plurality of local focus images into a full-focus clear image by extracting complementary information of the local focus images, so that the image quality is enhanced, the visual understanding is facilitated, and the utilization rate of image information is improved. At present, multi-focus image fusion is widely applied to the fields of medical microscopic imaging, machine vision measurement, machine identification, military security and the like.
Generally, the multi-focus image fusion method can be divided into three methods: a transform domain based fusion method, a spatial domain based fusion method and a deep learning based fusion method. Among them, the image fusion method based on Multi-Scale Transform (MST) includes an algorithm based on laplacian, an algorithm based on wavelet Transform, an algorithm based on Non-Subsampled contour Transform (NSCT), and the like. The fusion process mainly comprises three steps: a. decomposing a source image into a high-frequency component and a low-frequency component according to the multi-scale characteristics of the image; b. selecting different fusion rules to obtain high fusion and low fusion mappings; c. and obtaining the final fusion mapping through the inverse MST. However, the MST-based method has a problem of spatial inconsistency during the transformation fusion process, which is prone to cause distortion of different degrees. The image fusion method based on the spatial domain mainly performs image fusion through linear combination, and can be generally divided into three categories: pixel-based, block-based, and target region-based methods. However, image fusion is performed by using gradient related information of pixels or image blocks, which is prone to introduce artifact blocks into the fusion result, resulting in poor effect. Typical spatial domain Fusion methods include a Fusion method based on a Guided Filtering Image Fusion (GFF) and an Image Matting (IFM), and although the Fusion method is good in the aspect of feature extraction and detail expression of an Image, it is difficult to manually set an ideal Fusion rule. In recent years, a multi-focus image fusion method based on deep learning appears, and the advantages of strong learning ability, strong generalization ability and good portability can be fully exerted. For example, Liu Yu et al (traditional Neural Network, CNN) -based methods (Yu Liu et al,2017) and Mei-jimmy et al (Lihua Mei et al,2017) based on spatial pyramid pooling, which are fused by image blocks, result in complex operation and blocking effect at image edges. Furthermore, gouging et al propose a method based on a full convolution neural network (xiaoping Guo et al,2018), which, although better solving the problem of image blocking, makes image blocks suitable for global features neglected and omitted because the correlation between context information is not fully considered.
Aiming at the problem that fusion details are distorted due to the fact that image focusing associated information cannot be sufficiently mined by a traditional method, multi-focus image fusion is taken as a two-classification segmentation problem of context association constraint, namely focusing and non-focusing areas are distinguished. A multi-focus image fusion method combining depth context and convolution condition random fields is provided. The invention adopts the deep dense convolution nerve deep fusion characteristic, excavates focusing information and learns the context information by utilizing the multi-scale space pyramid pooling. Furthermore, a Convolution Conditional Random Field (ConvCRFs) is introduced in the processing process of distinguishing the non-focusing area from the focusing area, so that the accuracy of the network probability prediction graph can be optimized, and the fusion effect can be further enhanced. Finally, the method is compared with 7 mainstream fusion methods in experiments, and the high efficiency and superiority of the method are verified from the aspects of subjective visual effect evaluation and objective comparison evaluation.
Disclosure of Invention
In order to solve the above problems, the present invention provides a multi-focus image fusion method combining depth context and convolution conditional random field, which is characterized by comprising the following steps:
step 1, two registered multi-focus source images IAAnd IBIntegrating the images into a multi-channel image, and inputting the multi-channel image into a depth dense convolution neural network for focus detection to obtain a multi-dimensional feature map;
step 2, extracting feature information of the multi-dimensional feature map obtained in the step 1 by using a pyramid pooling model to obtain a plurality of feature maps, and then merging the feature maps to obtain a rough two-classification probability decision map;
step 3, for the two-classification probability decision diagrams obtained in the step 2, a convolution conditional random field is used for realizing probability decision diagram refinement, and fusion is carried out according to the obtained refined probability decision diagrams and fusion calculation rules to obtain a final multi-focus image fusion result;
step 4, training by combining the depth intensive convolution neural network in the step 1 and the whole network formed by the pyramid pooling model in the step 2;
and 5, fusing the multi-focus images by using the trained integral network.
Further, the deep dense convolutional neural network comprises a plurality of dense blocks and transition layers, the dense blocks comprise a plurality of convolutions of 1 × 1 and 3 × 3, wherein the convolution of 1 × 1 is used for reducing the number of feature maps, and the convolution of 3 × 3 is used for extracting features; the transition layer is arranged between the two dense blocks, plays a role of connection and consists of a convolution layer and a pooling layer.
Further, the pyramid pooling model is processed as follows;
firstly, pooling the feature maps to a target size respectively, and then performing 1 × 1 convolution on the pooled result to reduce the channel to the original 1/N, wherein N is the layer number of the pyramid pooling model; then, each feature map in the previous step is up-sampled by bilinear interpolation to obtain the same size of the original feature map, then the original feature map and the feature map obtained by up-sampling are subjected to concatevation according to the channel dimension to obtain a channel which is twice as large as the channel of the original feature map, finally the channel is reduced to the original channel by 1 × 1 convolution, and the final feature map is the same as the size and the channel of the original feature map.
Further, the specific process of optimizing the binary probability decision diagram by using the convolution conditional random field in the step 3 is as follows;
inputting the rough two-classification probability decision diagram obtained in the step 2, and marking the rough two-classification probability decision diagram as O, wherein the process of processing by using the convolution conditional random field can be realized by
Figure BDA0003251540060000031
To perform the solution of the problem to be solved,
Figure BDA0003251540060000032
in order to optimize the probability decision graph, the specific analytical formula is as follows:
Figure BDA0003251540060000033
where K is { K ═ K1,…,KnThe symbol represents the random field and the symbol represents the random field,
Figure BDA0003251540060000034
representing a random field of
Figure BDA0003251540060000035
An optimized probability decision graph when the input image is O, Z (O) is a distribution function,
Figure BDA0003251540060000036
o' is an image of a random class on O;
function of energy
Figure BDA0003251540060000037
The expression is as follows:
Figure BDA0003251540060000038
in the above formula, N is the number of random fields, i is a random number smaller than N, and j is a random number which is not equal to i and smaller than N; wherein the function of unitary potential
Figure BDA0003251540060000039
The observed value for measuring the current pixel point i is
Figure BDA00032515400600000310
Then, the probability that the pixel point belongs to the class label in O is output from the back end of the whole network; binary potential function
Figure BDA00032515400600000311
Then, for measuring the probability of two events occurring simultaneously, the calculation formula is as follows:
Figure BDA00032515400600000312
Figure BDA00032515400600000313
for label-compatible terms, it constrains the conditions of conduction between pixels, energy being conducted to each other only under the same label conditions, i.e. when
Figure BDA0003251540060000041
And
Figure BDA0003251540060000042
when the labels are the same, the label is the same,
Figure BDA0003251540060000043
otherwise
Figure BDA0003251540060000044
In the latter addition term, ωmIs a weight value parameter that is a function of,
Figure BDA0003251540060000045
is a feature function, fi, fj are pixels in an arbitrary feature spacei and j, as shown in the following equation:
Figure BDA0003251540060000046
the above formula represents the "intimacy" between different pixels in terms of the travel of the feature, the first term of the formula being called the surface kernel and the second term being called the smoothing kernel, W(1),θα,θβAs surface nuclear parameter, W(2),θγThe parameters are smooth kernel parameters which are model parameters and are obtained through training, and p and I respectively represent the actual position and color of the pixel point.
Further, the fusion calculation rule in step 3 is designed as follows;
assuming that a binary image matrix refined by a probability decision diagram is WAThe other half of the binary image is WBI.e. WB=1-WAThe source image is IAAnd IBTherefore, the calculation rule of the final fusion image F is:
F=WA·IA+IB·(1-WA)。
further, the specific implementation manner in step 4 is as follows;
a) data set preparation: the data set used was the VOC2012 data set, which was divided into 4 general categories in total: vehicles, households, animals, humans, the data set is classified into 20 categories, and the background is added for 21 categories, and the total number of the categories comprises 17125 images; in order to simulate a multi-focus image, an image generation method in multi-focus image fusion image cutout in a dynamic scene is adopted, the real multi-focus condition is simulated by Gaussian blur, and a synthesized multi-focus image is obtained through five steps in total, wherein the synthesized multi-focus image is respectively Gaussian blur, image transformation, image inversion, pixel-by-pixel multiplication and pixel-by-pixel addition;
b) adjusting training parameters: multiple focus images IAAnd multi-focus image IBCombining the images into 6 channels, sending the images into an integral network for training, wherein the size of the images in the training stage is 256 x 256, Adam is used as a gradient optimizer, the learning rate of the Adam is 0.001, and the regularization term is0.9, 1 image of the network is sent each time, the total training times are n times, the loss function of the whole network adopts binary cross entropy, and in the testing stage, the input image is the original size of the image; the binary cross entropy formula is as follows:
Figure BDA0003251540060000047
for a single sample, wherein
Figure BDA0003251540060000048
Is the actual output of the sample, yiIs the desired output of the sample.
The invention provides a multi-focus image fusion method combining a depth context and a convolution condition random field. The fusion method utilizes the advantage of feature multiplexing in a dense convolutional neural network, integrates multi-focus source images to achieve feature detection of cooperative focusing, then pools global information through a multi-scale pyramid, aggregates context information of different focusing areas, optimizes the distinguishing capability of scattered focusing and obtains a rough fusion probability decision diagram. And then, further optimizing by using a convolution conditional random field, obtaining an accurate probability decision diagram by roughly fusing the probability decision diagram, and finally generating a fusion result image with good details. The final experimental result shows that the method provided by the invention obtains a better result in vision, and also obtains the best result in four quantitative indexes, so that the effectiveness of the method provided by the invention is fully proved, and the method can be effectively applied to an automatic imaging vision task.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
FIG. 2 is a pyramid pooling multi-scale information extraction diagram of the present invention.
FIG. 3 is a probability decision diagram for multi-scale pyramid pooling cooperative detection, showing the source image, and the effect of the coarse segmentation diagram and the fine segmentation diagram thereof.
FIG. 4 shows the results of the Lytro-3 image fusion;
FIG. 5 is a graph comparing the results of the Lytro-17 experiment;
FIG. 6 is a Lytro-17 image residual pseudo-color contrast diagram.
Detailed Description
The technical scheme of the invention can adopt a computer software technology to realize an automatic operation process. The technical scheme of the invention is explained in detail in the following by combining the drawings and the embodiment. As shown in fig. 1, the flow of the technical solution of the embodiment includes the following steps:
step 1, detecting an image by utilizing a dense network to focus cooperatively:
firstly for the input source image IAAnd IBConsider two registered multi-focus source images IAAnd IBThe method integrates the correlation information into a multi-channel image, and performs the cooperative detection operation based on the dense convolutional network on the basis.
The core of the dense network cooperative focusing detection is a deep dense convolutional neural network which is based on a convolutional neural network, further exploration is carried out theoretically, and a deep dense concept is introduced. The most important components are as follows: the gradient loss reducing device comprises a transition layer and dense blocks, wherein the dense blocks are mutually connected, so that the gradient loss can be reduced, and the utilization rate of network characteristic information can be better improved. Inside each dense block, the input of each layer is the splice of the outputs of all previous layers, where the splice refers to the splice at the channel level. For example, stitching together a 56 × 56 × 64 data and a 56 × 56 × 32 data results in 56 × 56 × 96, where 96 is the sum of 64 and 32. The definition is equal to the growth rate, indicating that the output of each layer is a certain number of channels. Each dense block contains a number of substructures, taking the first one of the dense blocks as an example, the substructure is first a bottleeck layer that operates as a 1 x 1 convolution with the purpose of reducing the number of feature-maps, followed by a convolution kernel of 3 x 3 convolution layers for feature extraction. The number of dense blocks in the network is 4, and the structure is shown as the following table:
TABLE 1 dense block Structure Table
Figure BDA0003251540060000061
The transition layer is arranged between the two dense blocks, plays a role of connection and consists of a convolution layer and a pooling layer. As shown in fig. 1, the module has four dense blocks. In general, in a deep learning network, the deeper the network is, the more serious the gradient disappears, and thus dense blocks are introduced for mitigation. On the basis of a residual error network, the structure of the deep dense convolution neural network is more complex, and after dense connection is added, all layers have all previous layer characteristics, so that characteristic multiplexing can be effectively realized, and the transmission of multi-focus image characteristic information of different layers is optimized and utilized. And (4) obtaining a multi-dimensional characteristic diagram after the multi-channel image is subjected to dense network cooperative focusing detection.
Step 2, extracting pyramid pooling multi-scale information, which specifically comprises the following contents:
and (3) performing multi-scale feature information extraction on the multi-dimensional feature map obtained after the detection in the step (1) by utilizing pyramid pooling. Considering that the most difficult detection points of a multi-focus image are a focus area and a non-focus area, important global prior knowledge is not sufficiently obtained in a high-level convolutional neural network, and high-level features contain more semantics and less position information, in order to further reduce the loss of context information between different sub-areas, as shown in fig. 1, a pyramid pooling model is introduced, which is very common in the traditional machine learning feature extraction, and the main idea is to divide a sub-image into blocks of a plurality of scales, for example, one image is divided into 1 part, 4 parts, 8 parts and the like. The features are then extracted for each block and then fused together so that features of multiple scales are compatible.
As shown in FIG. 2, the method adopts 4 different pyramid scales, the number of layers of pyramid pooling modules and the size of each layer can be modified, the pyramid pooling modules in the invention are 4 layers, and the sizes of each layer are 1 × 1,2 × 2,3 × 3 and 6 × 6 respectively. Firstly, pooling the feature map to a target size, respectively, by operating the process of imaging as N × N blocks, then pooling each block, as in the first figure, the red brick is the feature map subjected to 1 × 1 pooling, and then performing 1 × 1 convolution on each pooled result to reduce the channel to the original 1/N, where N is 4. Then, each feature map of the previous step is up-sampled by bilinear interpolation to obtain the same size of the original feature map, and then the original feature map and the feature map obtained by up-sampling are classified according to channel dimension. The obtained channel is twice that of the original characteristic diagram, and finally the channel is reduced to the original channel by 1 × 1 convolution. The final profile is the same as the original profile size and channel.
The global context prior model is an effective global context prior model, contains information among different sub-regions with different scales, can effectively improve the capability of a network for utilizing global context information, enables the network to fully mine boundary information of focused and unfocused regions, embeds context features of a scene difficult to fuse, and improves the fusion effect. And finally, merging the feature graphs to obtain a rough two-classification probability decision graph.
Step 3, refining the convolution conditional random field probability decision diagram, which specifically comprises the following steps:
although the depth-dense convolutional neural network in step 1 and the pyramid pooling model in step 2 have a good effect on the extraction of the global context information of the source image, there are pixels that are misclassified in the probability map, as shown in fig. 3. Therefore, in order to obtain more accurate and excellent segmentation capability, the probabilistic decision graph is optimized by using the convolution conditional random fields (ConvCRFs). The convolution Conditional Random Field is optimized based on the Fully connected Conditional Random Field (fullrfs). The coarse binary probability decision graph O input by step 2 can be passed through
Figure BDA0003251540060000071
To perform the solution of the problem to be solved,
Figure BDA0003251540060000072
in order to optimize the probability decision graph, the specific analytical formula is as follows:
Figure BDA0003251540060000073
where K is { K ═ K1,…,KnThe symbol represents the random field and the symbol represents the random field,
Figure BDA0003251540060000074
is a random field of
Figure BDA0003251540060000075
An optimized probability decision graph when the input image is O, Z (O) is a distribution function,
Figure BDA0003251540060000076
function of energy
Figure BDA0003251540060000077
The expression is as follows:
Figure BDA0003251540060000078
in the above formula, N is the number of random fields, i is a random number smaller than N, and j is a random number not equal to i but smaller than N. Wherein the function of unitary potential
Figure BDA0003251540060000081
The observed value for measuring the current pixel point i is
Figure BDA0003251540060000082
And (3) the probability that the pixel point belongs to the class label in the O is output from the rear end of the whole network formed by the deep dense convolutional neural network in the step (1) and the pyramid pooling model in the step (2). Binary potential function
Figure BDA0003251540060000083
Then, for measuring the probability of two events occurring simultaneously, the calculation formula is as follows:
Figure BDA0003251540060000084
Figure BDA0003251540060000085
for label-compatible terms, it constrains the conditions of conduction between pixels, energy being conducted to each other only under the same label conditions, i.e. when
Figure BDA0003251540060000086
And
Figure BDA0003251540060000087
when the labels are the same, the label is the same,
Figure BDA0003251540060000088
otherwise
Figure BDA0003251540060000089
In the latter addition term, ωmIs a weight value parameter that is a function of,
Figure BDA00032515400600000810
is a characteristic function, fi,fjIs the feature vector of pixels i and j in an arbitrary feature space, as shown in the following formula:
Figure BDA00032515400600000811
the above formula represents the "intimacy" between different pixels in terms of the travel of the feature, the first term of the formula being referred to as the surface kernel and the second term being referred to as the smoothing kernel. W(1),θα,θβAs surface nuclear parameter, W(2),θγFor smoothing the kernel parameters, these parameters are model parameters, and are obtained by segment training. p and I respectively represent the actual position and color value of the pixel point, and the color value is the pixel value.
The condition independence factor is added in the FullCRFs framework, so that the operation enables the GPU to effectively speculate by utilizing convolution operation of a convolution random condition field, and the feature extraction capability of the CNN and the modeling capability of a random field can be effectively combined, so that the convolution random condition field can effectively transmit information.
In order to realize accurate significance detection of a target region, global, local and boundary information of a probability map is integrated through a convolution random condition field, and a refined probability decision map can be effectively obtained. In the optimization process, a plurality of feature information of the probability map is calculated on the basis of convolution operation, the features are fused into a refined map through CRFs, and finally the accurate map optimized by ConvCRFs is obtained.
After refinement, assume that the binary image matrix after probability decision graph refinement is WAThe other half of the binary image is WBI.e. WB=1-WA. Source image is IAAnd IBTherefore, the calculation rule of the final fusion image F is:
F=WA·IA+IB·(1-WA)
step 4, performing network training on the deep dense neural network and the pyramid network, specifically comprising the following steps:
c) data set preparation: as used herein, the data set is the VOC2012 data set, which is divided into 4 general categories: vehicles, households, animals, humans, the data set was classified into 20 categories (plus background total 21 categories), containing 17125 images in total. In order to simulate a multi-focus image, the invention adopts an image generation method (Shutao Li et al,2013) of Listo et al for multi-focus image fusion image cutout in a dynamic scene, and utilizes Gaussian blur to simulate a real multi-focus condition. The total of five steps are used to obtain the synthesized multifocal image, which is respectively Gaussian blur, image transformation, image inversion, pixel-by-pixel multiplication and pixel-by-pixel addition.
d) Adjusting training parameters: multiple focus images IAAnd multi-focus image IBAnd combining the images into 6 channels, sending the images into the whole network for training, wherein the size of the images in the training stage is 256 × 256, Adam is used as a gradient optimizer, the learning rate of the Adam is 0.001, and the regularization term is 0.9. 1 sheet at a time of sending into the networkAnd (4) images, the total training times are 30, and the loss function adopts binary cross entropy. In the testing stage, the input image is the original size of the image. The binary cross entropy formula is as follows:
Figure BDA0003251540060000091
for a single sample, wherein
Figure BDA0003251540060000092
Is the actual output of the sample, yiIs the desired output of the sample.
Step 5, comparing the method of the invention with the mainstream method, testing and analyzing, and specifically comprising the following steps: .
a) The comparison method comprises the following steps: in order to prove the superiority and effectiveness of the fusion method provided by the invention, a Lytro multi-focusing color image data set is selected for experiment. 7 mainstream image fusion methods are selected as comparison methods, which respectively comprise: non-subsampled contourlet Transform Fusion method (NSCT), guided Filter-based Fusion method (GFF), multi-focus Image Fusion Method (IFM) based on image matting, Bilateral Filter-based (CBF) Fusion method, Discrete Cosine Harmonic Wavelet Transform-based (DCHWT) Fusion method, convolutional neural Network-based (CNN) Fusion method, Pyramid pooling Network-based (PSPF) Fusion method
b) And (3) qualitative analysis: fig. 4 shows the visual fusion results of Lytro-3 by different fusion methods, from which it can be seen that "the edge of boy's ear" becomes more blurred in most of the comparison methods, but the edge structure of the fusion result graph obtained by the method herein is relatively clear, which can prove that the method herein has better edge information extraction capability. In order to further verify the effect of the method, the fusion result of the Lytro-17 and the pseudo-color difference graph of the fused image and the source image A are shown in FIGS. 5 and 6. It is easy to see that in different images, the less the trace of the focus area is left, the more image information indicating the focus part is extracted into the fused image, which means better fusion performance. As can be seen from fig. 5 and 6, the algorithm has less information under the image, and the CNN and IFM methods are all insufficient in the boundary region, which indicates that the fusion method proposed herein has a better edge region, and makes full use of the image context information to better detect the focus boundary. Compared with a comparison method, the method has a better fusion effect on subjective visual effect evaluation.
c) Quantitative comparison: in order to objectively prove the effectiveness of the method, four mainstream multi-focus image evaluation indexes are adopted as quantization indexes, namely mutual information (Q)MI) Non-linearly related information entropy (Q)NCIE) Edge retention (Q)AB/F) Visual fidelity (Q)VIF) And compared with the mainstream 7 methods. The experimental results are shown in tables 2 and 3, where table 2 shows the objective evaluation of the fusion results of 5 images, and table 3 shows the average result of the objective evaluation of 20 images. Wherein the bold values indicate the optimal results and the underlined values indicate the suboptimal results.
TABLE 2 Objective evaluation comparison of fusion results
Figure BDA0003251540060000101
Figure BDA0003251540060000111
TABLE 3 average objective evaluation comparison of fusion results
Figure BDA0003251540060000112
As can be seen from Table 2, the method herein achieves better results in all four objective evaluation indexes. Most of the methods provided by the invention are optimal in terms of scoring scores of indexes such as mutual information, nonlinear correlation information entropy, edge retention, visual information fidelity and the like, which means that the depth characteristics of multi-focus images can be well mined by adopting the provided dense convolutional network cooperative detection method so as to carry out focusing estimation, and the adopted space pyramid pooling module better constrains context information so as to enable the image edge focusing detection to be better, so that the overall fusion quality of the algorithm is higher, and more source image focusing information can be retained. As can be seen from the average evaluation results in table 3, the method also obtains sub-optimal results next to the CNN method in the visual information fidelity evaluation index. This also indicates that, in the combined situation, the method of the present invention has a better fusion effect than other methods.
d) Summary analysis: the invention provides a multi-focus image fusion method combining a depth context and a convolution condition random field. The fusion method utilizes the advantage of feature multiplexing in a dense convolutional neural network, integrates multi-focus source images to achieve feature detection of cooperative focusing, then pools global information through a multi-scale pyramid, aggregates context information of different focusing areas, optimizes the distinguishing capability of scattered focusing and obtains a rough fusion probability decision diagram. And then, further optimizing by using a convolution conditional random field, obtaining an accurate probability decision diagram by roughly fusing the probability decision diagram, and finally generating a fusion result image with good details. The final experimental result shows that the method provided by the invention obtains a better result in vision, and also obtains the best result in four quantitative indexes, so that the effectiveness of the method provided by the invention is fully proved, and the method can be effectively applied to an automatic imaging vision task.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims (6)

1. A multi-focus image fusion method combining depth context and convolution conditional random fields is characterized by comprising the following steps:
step 1, two registered multi-focus source images IAAnd IBIntegrating the images into a multi-channel image, and inputting the multi-channel image into a depth dense convolution neural network for focus detection to obtain a multi-dimensional feature map;
step 2, extracting feature information of the multi-dimensional feature map obtained in the step 1 by using a pyramid pooling model to obtain a plurality of feature maps, and then merging the feature maps to obtain a rough two-classification probability decision map;
step 3, for the two-classification probability decision diagrams obtained in the step 2, a convolution conditional random field is used for realizing probability decision diagram refinement, and fusion is carried out according to the obtained refined probability decision diagrams and fusion calculation rules to obtain a final multi-focus image fusion result;
step 4, training by combining the depth intensive convolution neural network in the step 1 and the whole network formed by the pyramid pooling model in the step 2;
and 5, fusing the multi-focus images by using the trained integral network.
2. The method of claim 1 for multi-focus image fusion combining depth context with convolutional conditional random fields, comprising: the deep dense convolutional neural network comprises a plurality of dense blocks and a transition layer, wherein the dense blocks comprise a plurality of convolutions of 1 × 1 and 3 × 3, the 1 × 1 convolution is used for reducing the number of feature maps, and the 3 × 3 convolution is used for extracting features; the transition layer is arranged between the two dense blocks, plays a role of connection and consists of a convolution layer and a pooling layer.
3. The method of claim 1 for multi-focus image fusion combining depth context with convolutional conditional random fields, comprising: the pyramid pooling model is processed as follows;
firstly, pooling the feature maps to a target size respectively, and then performing 1 × 1 convolution on the pooled result to reduce the channel to the original 1/N, wherein N is the layer number of the pyramid pooling model; then, each feature map in the previous step is up-sampled by bilinear interpolation to obtain the same size of the original feature map, then the original feature map and the feature map obtained by up-sampling are subjected to concatevation according to the channel dimension to obtain a channel which is twice as large as the channel of the original feature map, finally the channel is reduced to the original channel by 1 × 1 convolution, and the final feature map is the same as the size and the channel of the original feature map.
4. The method of claim 1 for multi-focus image fusion combining depth context with convolutional conditional random fields, comprising: the specific process of optimizing the binary probability decision diagram by using the convolution conditional random field in the step 3 is as follows;
inputting the rough two-classification probability decision diagram obtained in the step 2, and marking the rough two-classification probability decision diagram as O, wherein the process of processing by using the convolution conditional random field can be realized by
Figure FDA0003251540050000021
To perform the solution of the problem to be solved,
Figure FDA0003251540050000022
in order to optimize the probability decision graph, the specific analytical formula is as follows:
Figure FDA0003251540050000023
where K is { K ═ K1,…,KnThe symbol represents the random field and the symbol represents the random field,
Figure FDA0003251540050000024
representing a random field of
Figure FDA0003251540050000025
An optimized probability decision graph when the input image is O, Z (O) is a distribution function,
Figure FDA0003251540050000026
o' is an image of a random class on O;
function of energy
Figure FDA0003251540050000027
The expression is as follows:
Figure FDA0003251540050000028
in the above formula, N is the number of random fields, i is a random number smaller than N, and j is a random number which is not equal to i and smaller than N; wherein the function of unitary potential
Figure FDA0003251540050000029
The observed value for measuring the current pixel point i is
Figure FDA00032515400500000210
Then, the probability that the pixel point belongs to the class label in O is output from the back end of the whole network; binary potential function
Figure FDA00032515400500000211
Then, for measuring the probability of two events occurring simultaneously, the calculation formula is as follows:
Figure FDA00032515400500000212
Figure FDA00032515400500000213
for label-compatible terms, it constrains the conditions of conduction between pixels, energy being conducted to each other only under the same label conditions, i.e. when
Figure FDA00032515400500000214
And
Figure FDA00032515400500000215
when the labels are the same, the label is the same,
Figure FDA00032515400500000216
otherwise
Figure FDA00032515400500000217
In the latter addition term, ωmIs a weight value parameter that is a function of,
Figure FDA00032515400500000218
is a characteristic function, fi,fjIs the feature vector of pixels i and j in an arbitrary feature space, as shown in the following formula:
Figure FDA00032515400500000219
the above formula represents the "intimacy" between different pixels in terms of the travel of the feature, the first term of the formula being called the surface kernel and the second term being called the smoothing kernel, W(1),θα,θβAs surface nuclear parameter, W(2),θγThe parameters are smooth kernel parameters which are model parameters and are obtained through training, p and I respectively represent the actual position and the color value of the pixel point, and the color value is the pixel value.
5. The method of claim 1 for multi-focus image fusion combining depth context with convolutional conditional random fields, comprising: step 3, fusing calculation rules and designing as follows;
assuming that a binary image matrix refined by a probability decision diagram is WAThe other half of the binary image is WBI.e. WB=1-WAThe source image is IAAnd IBTherefore, the calculation rule of the final fusion image F is:
F=WA·IA+IB·(1-WA)。
6. the method of claim 1 for multi-focus image fusion combining depth context with convolutional conditional random fields, comprising: the specific implementation manner in the step 4 is as follows;
a) data set preparation: the data set used was the VOC2012 data set, which was divided into 4 general categories in total: vehicles, households, animals, humans, the data set is classified into 20 categories, and the background is added for 21 categories, and the total number of the categories comprises 17125 images; in order to simulate a multi-focus image, an image generation method in multi-focus image fusion image cutout in a dynamic scene is adopted, the real multi-focus condition is simulated by Gaussian blur, and a synthesized multi-focus image is obtained through five steps in total, wherein the synthesized multi-focus image is respectively Gaussian blur, image transformation, image inversion, pixel-by-pixel multiplication and pixel-by-pixel addition;
b) adjusting training parameters: multiple focus images IAAnd multi-focus image IBCombining the images into 6 channels, sending the images into an overall network for training, wherein the size of the images in the training stage is 256 x 256, Adam is used as a gradient optimizer, the learning rate of the Adam is 0.001, the regularization term is 0.9, 1 image is sent into the network each time, the total training times are n times, the loss function of the overall network adopts binary cross entropy, and the input images are the original sizes of the images in the testing stage; the binary cross entropy formula is as follows:
Figure FDA0003251540050000031
for a single sample, wherein
Figure FDA0003251540050000032
Is the actual output of the sample, yiIs the desired output of the sample.
CN202111047787.3A 2021-09-08 2021-09-08 Multi-focusing image fusion method combining depth context and convolution conditional random field Active CN113763300B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111047787.3A CN113763300B (en) 2021-09-08 2021-09-08 Multi-focusing image fusion method combining depth context and convolution conditional random field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111047787.3A CN113763300B (en) 2021-09-08 2021-09-08 Multi-focusing image fusion method combining depth context and convolution conditional random field

Publications (2)

Publication Number Publication Date
CN113763300A true CN113763300A (en) 2021-12-07
CN113763300B CN113763300B (en) 2023-06-06

Family

ID=78793683

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111047787.3A Active CN113763300B (en) 2021-09-08 2021-09-08 Multi-focusing image fusion method combining depth context and convolution conditional random field

Country Status (1)

Country Link
CN (1) CN113763300B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114707427A (en) * 2022-05-25 2022-07-05 青岛科技大学 Personalized modeling method of graph neural network based on effective neighbor sampling maximization
CN115984104A (en) * 2022-12-05 2023-04-18 南京大学 Multi-focus image fusion method and device based on self-supervision learning

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017190574A1 (en) * 2016-05-04 2017-11-09 北京大学深圳研究生院 Fast pedestrian detection method based on aggregation channel features
CN107369148A (en) * 2017-09-20 2017-11-21 湖北工业大学 Based on the multi-focus image fusing method for improving SML and Steerable filter
US20190236411A1 (en) * 2016-09-14 2019-08-01 Konica Minolta Laboratory U.S.A., Inc. Method and system for multi-scale cell image segmentation using multiple parallel convolutional neural networks
CN110334779A (en) * 2019-07-16 2019-10-15 大连海事大学 A kind of multi-focus image fusing method based on PSPNet detail extraction
CN110533623A (en) * 2019-09-06 2019-12-03 兰州交通大学 A kind of full convolutional neural networks multi-focus image fusing method based on supervised learning
CN111368707A (en) * 2020-03-02 2020-07-03 佛山科学技术学院 Face detection method, system, device and medium based on feature pyramid and dense block
CN111429393A (en) * 2020-04-15 2020-07-17 四川警察学院 Multi-focus image fusion method based on convolution elastic network
US20200273192A1 (en) * 2019-02-26 2020-08-27 Baidu Usa Llc Systems and methods for depth estimation using convolutional spatial propagation networks
US20200327679A1 (en) * 2019-04-12 2020-10-15 Beijing Moviebook Science and Technology Co., Ltd. Visual target tracking method and apparatus based on deeply and densely connected neural network
CN112949579A (en) * 2021-03-30 2021-06-11 上海交通大学 Target fusion detection system and method based on dense convolution block neural network
CN113159236A (en) * 2021-05-26 2021-07-23 中国工商银行股份有限公司 Multi-focus image fusion method and device based on multi-scale transformation

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017190574A1 (en) * 2016-05-04 2017-11-09 北京大学深圳研究生院 Fast pedestrian detection method based on aggregation channel features
US20190236411A1 (en) * 2016-09-14 2019-08-01 Konica Minolta Laboratory U.S.A., Inc. Method and system for multi-scale cell image segmentation using multiple parallel convolutional neural networks
CN107369148A (en) * 2017-09-20 2017-11-21 湖北工业大学 Based on the multi-focus image fusing method for improving SML and Steerable filter
US20200273192A1 (en) * 2019-02-26 2020-08-27 Baidu Usa Llc Systems and methods for depth estimation using convolutional spatial propagation networks
US20200327679A1 (en) * 2019-04-12 2020-10-15 Beijing Moviebook Science and Technology Co., Ltd. Visual target tracking method and apparatus based on deeply and densely connected neural network
CN110334779A (en) * 2019-07-16 2019-10-15 大连海事大学 A kind of multi-focus image fusing method based on PSPNet detail extraction
CN110533623A (en) * 2019-09-06 2019-12-03 兰州交通大学 A kind of full convolutional neural networks multi-focus image fusing method based on supervised learning
CN111368707A (en) * 2020-03-02 2020-07-03 佛山科学技术学院 Face detection method, system, device and medium based on feature pyramid and dense block
CN111429393A (en) * 2020-04-15 2020-07-17 四川警察学院 Multi-focus image fusion method based on convolution elastic network
CN112949579A (en) * 2021-03-30 2021-06-11 上海交通大学 Target fusion detection system and method based on dense convolution block neural network
CN113159236A (en) * 2021-05-26 2021-07-23 中国工商银行股份有限公司 Multi-focus image fusion method and device based on multi-scale transformation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YI LI ETC: "Pyramid Pooling Dense Convolutional Neural Network for Multi-focus Image Fusion", 《2019 IEEE 6TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS)》 *
任坤;黄泷;范春奇;高学金;: "基于多尺度像素特征融合的实时小交通标志检测算法", 信号处理, no. 09 *
李恒等: "基于监督学习的全卷积神经网络多聚焦图像融合算法", 《激光与光电子学进展》, vol. 57, no. 08 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114707427A (en) * 2022-05-25 2022-07-05 青岛科技大学 Personalized modeling method of graph neural network based on effective neighbor sampling maximization
CN114707427B (en) * 2022-05-25 2022-09-06 青岛科技大学 Personalized modeling method of graph neural network based on effective neighbor sampling maximization
CN115984104A (en) * 2022-12-05 2023-04-18 南京大学 Multi-focus image fusion method and device based on self-supervision learning
CN115984104B (en) * 2022-12-05 2023-09-22 南京大学 Multi-focus image fusion method and device based on self-supervision learning

Also Published As

Publication number Publication date
CN113763300B (en) 2023-06-06

Similar Documents

Publication Publication Date Title
Wang et al. An experimental-based review of image enhancement and image restoration methods for underwater imaging
CN108986050B (en) Image and video enhancement method based on multi-branch convolutional neural network
CN107909560A (en) A kind of multi-focus image fusing method and system based on SiR
CN111784620B (en) Light field camera full-focusing image fusion algorithm for guiding angle information by space information
CN113763300B (en) Multi-focusing image fusion method combining depth context and convolution conditional random field
Ding et al. U 2 D 2 Net: Unsupervised unified image dehazing and denoising network for single hazy image enhancement
Cong et al. Discrete haze level dehazing network
Chang Single underwater image restoration based on adaptive transmission fusion
Zhang et al. Photo-realistic dehazing via contextual generative adversarial networks
Qiao et al. Layered input GradiNet for image denoising
Guan et al. NCDCN: multi-focus image fusion via nest connection and dilated convolution network
Guo et al. Low-light image enhancement with joint illumination and noise data distribution transformation
Yu et al. Multi-focus image fusion based on L1 image transform
Liu et al. Underwater optical image enhancement based on super-resolution convolutional neural network and perceptual fusion
Hou et al. Joint learning of image deblurring and depth estimation through adversarial multi-task network
CN117495882A (en) Liver tumor CT image segmentation method based on AGCH-Net and multi-scale fusion
Zhang et al. MFFE: multi-scale feature fusion enhanced net for image dehazing
Pang et al. Underwater image enhancement via variable contrast and saturation enhancement model
Wang et al. New insights into multi-focus image fusion: A fusion method based on multi-dictionary linear sparse representation and region fusion model
Moghimi et al. A joint adaptive evolutionary model towards optical image contrast enhancement and geometrical reconstruction approach in underwater remote sensing
CN112508828A (en) Multi-focus image fusion method based on sparse representation and guided filtering
CN116757979A (en) Embryo image fusion method, device, electronic equipment and storage medium
Kumar et al. Underwater image enhancement using deep learning
Lee et al. An image-guided network for depth edge enhancement
Xu et al. Interactive algorithms in complex image processing systems based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant