CN113763300A - Multi-focus image fusion method combining depth context and convolution condition random field - Google Patents
Multi-focus image fusion method combining depth context and convolution condition random field Download PDFInfo
- Publication number
- CN113763300A CN113763300A CN202111047787.3A CN202111047787A CN113763300A CN 113763300 A CN113763300 A CN 113763300A CN 202111047787 A CN202111047787 A CN 202111047787A CN 113763300 A CN113763300 A CN 113763300A
- Authority
- CN
- China
- Prior art keywords
- image
- fusion
- focus
- convolution
- images
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000007500 overflow downdraw method Methods 0.000 title claims abstract description 34
- 230000004927 fusion Effects 0.000 claims abstract description 53
- 238000000034 method Methods 0.000 claims abstract description 53
- 238000010586 diagram Methods 0.000 claims abstract description 30
- 238000011176 pooling Methods 0.000 claims abstract description 30
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 16
- 238000001514 detection method Methods 0.000 claims abstract description 14
- 238000012549 training Methods 0.000 claims description 18
- 230000006870 function Effects 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000013528 artificial neural network Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 8
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 6
- 230000007704 transition Effects 0.000 claims description 6
- 238000009499 grossing Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 238000012360 testing method Methods 0.000 claims description 4
- 230000009466 transformation Effects 0.000 claims description 4
- 241001465754 Metazoa Species 0.000 claims description 3
- 238000005315 distribution function Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000002360 preparation method Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 9
- 230000008901 benefit Effects 0.000 abstract description 4
- 230000004931 aggregating effect Effects 0.000 abstract 1
- 230000002708 enhancing effect Effects 0.000 abstract 1
- 238000011156 evaluation Methods 0.000 description 11
- 238000000605 extraction Methods 0.000 description 8
- 230000000007 visual effect Effects 0.000 description 7
- 238000007792 addition Methods 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 4
- 238000003384 imaging method Methods 0.000 description 4
- 230000011218 segmentation Effects 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 3
- 230000000903 blocking effect Effects 0.000 description 2
- 238000007499 fusion processing Methods 0.000 description 2
- 230000014759 maintenance of location Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000002146 bilateral effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 239000011449 brick Substances 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000000116 mitigating effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 238000012634 optical imaging Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000004451 qualitative analysis Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4007—Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a multi-focus image fusion method combining a depth context and a convolution condition random field, aiming at the problem that the traditional method cannot fully mine image focusing associated information to cause fusion detail distortion. And the advantage of dense convolutional neural network feature multiplexing is fully utilized, and the multi-focus source image is integrated to realize the cooperative focusing feature detection. And aggregating the global context information of different focusing areas by adopting a multi-scale pyramid pooling strategy, enhancing the distinguishing capability of focusing and defocusing, and obtaining a rough fusion probability decision diagram. And further optimizing the image by adopting a convolution conditional random field to obtain a refined probability decision diagram, and finally obtaining a fusion image with maintained details. The fusion method is subjectively and objectively evaluated by utilizing the public data set, and experimental results show that the method has a good fusion effect, can fully mine focus related information, and can keep enough image details.
Description
Technical Field
The invention relates to the technical field of deep learning image processing, in particular to a multi-focus image fusion method combining a depth context and a convolution condition random field.
Background
In optical imaging, only a local area in an image can be focused due to the problem of lens depth of field, and a clear image covering the whole scene is difficult to obtain. The multi-focus image fusion technology fuses a plurality of local focus images into a full-focus clear image by extracting complementary information of the local focus images, so that the image quality is enhanced, the visual understanding is facilitated, and the utilization rate of image information is improved. At present, multi-focus image fusion is widely applied to the fields of medical microscopic imaging, machine vision measurement, machine identification, military security and the like.
Generally, the multi-focus image fusion method can be divided into three methods: a transform domain based fusion method, a spatial domain based fusion method and a deep learning based fusion method. Among them, the image fusion method based on Multi-Scale Transform (MST) includes an algorithm based on laplacian, an algorithm based on wavelet Transform, an algorithm based on Non-Subsampled contour Transform (NSCT), and the like. The fusion process mainly comprises three steps: a. decomposing a source image into a high-frequency component and a low-frequency component according to the multi-scale characteristics of the image; b. selecting different fusion rules to obtain high fusion and low fusion mappings; c. and obtaining the final fusion mapping through the inverse MST. However, the MST-based method has a problem of spatial inconsistency during the transformation fusion process, which is prone to cause distortion of different degrees. The image fusion method based on the spatial domain mainly performs image fusion through linear combination, and can be generally divided into three categories: pixel-based, block-based, and target region-based methods. However, image fusion is performed by using gradient related information of pixels or image blocks, which is prone to introduce artifact blocks into the fusion result, resulting in poor effect. Typical spatial domain Fusion methods include a Fusion method based on a Guided Filtering Image Fusion (GFF) and an Image Matting (IFM), and although the Fusion method is good in the aspect of feature extraction and detail expression of an Image, it is difficult to manually set an ideal Fusion rule. In recent years, a multi-focus image fusion method based on deep learning appears, and the advantages of strong learning ability, strong generalization ability and good portability can be fully exerted. For example, Liu Yu et al (traditional Neural Network, CNN) -based methods (Yu Liu et al,2017) and Mei-jimmy et al (Lihua Mei et al,2017) based on spatial pyramid pooling, which are fused by image blocks, result in complex operation and blocking effect at image edges. Furthermore, gouging et al propose a method based on a full convolution neural network (xiaoping Guo et al,2018), which, although better solving the problem of image blocking, makes image blocks suitable for global features neglected and omitted because the correlation between context information is not fully considered.
Aiming at the problem that fusion details are distorted due to the fact that image focusing associated information cannot be sufficiently mined by a traditional method, multi-focus image fusion is taken as a two-classification segmentation problem of context association constraint, namely focusing and non-focusing areas are distinguished. A multi-focus image fusion method combining depth context and convolution condition random fields is provided. The invention adopts the deep dense convolution nerve deep fusion characteristic, excavates focusing information and learns the context information by utilizing the multi-scale space pyramid pooling. Furthermore, a Convolution Conditional Random Field (ConvCRFs) is introduced in the processing process of distinguishing the non-focusing area from the focusing area, so that the accuracy of the network probability prediction graph can be optimized, and the fusion effect can be further enhanced. Finally, the method is compared with 7 mainstream fusion methods in experiments, and the high efficiency and superiority of the method are verified from the aspects of subjective visual effect evaluation and objective comparison evaluation.
Disclosure of Invention
In order to solve the above problems, the present invention provides a multi-focus image fusion method combining depth context and convolution conditional random field, which is characterized by comprising the following steps:
step 1, two registered multi-focus source images IAAnd IBIntegrating the images into a multi-channel image, and inputting the multi-channel image into a depth dense convolution neural network for focus detection to obtain a multi-dimensional feature map;
step 2, extracting feature information of the multi-dimensional feature map obtained in the step 1 by using a pyramid pooling model to obtain a plurality of feature maps, and then merging the feature maps to obtain a rough two-classification probability decision map;
step 3, for the two-classification probability decision diagrams obtained in the step 2, a convolution conditional random field is used for realizing probability decision diagram refinement, and fusion is carried out according to the obtained refined probability decision diagrams and fusion calculation rules to obtain a final multi-focus image fusion result;
step 4, training by combining the depth intensive convolution neural network in the step 1 and the whole network formed by the pyramid pooling model in the step 2;
and 5, fusing the multi-focus images by using the trained integral network.
Further, the deep dense convolutional neural network comprises a plurality of dense blocks and transition layers, the dense blocks comprise a plurality of convolutions of 1 × 1 and 3 × 3, wherein the convolution of 1 × 1 is used for reducing the number of feature maps, and the convolution of 3 × 3 is used for extracting features; the transition layer is arranged between the two dense blocks, plays a role of connection and consists of a convolution layer and a pooling layer.
Further, the pyramid pooling model is processed as follows;
firstly, pooling the feature maps to a target size respectively, and then performing 1 × 1 convolution on the pooled result to reduce the channel to the original 1/N, wherein N is the layer number of the pyramid pooling model; then, each feature map in the previous step is up-sampled by bilinear interpolation to obtain the same size of the original feature map, then the original feature map and the feature map obtained by up-sampling are subjected to concatevation according to the channel dimension to obtain a channel which is twice as large as the channel of the original feature map, finally the channel is reduced to the original channel by 1 × 1 convolution, and the final feature map is the same as the size and the channel of the original feature map.
Further, the specific process of optimizing the binary probability decision diagram by using the convolution conditional random field in the step 3 is as follows;
inputting the rough two-classification probability decision diagram obtained in the step 2, and marking the rough two-classification probability decision diagram as O, wherein the process of processing by using the convolution conditional random field can be realized byTo perform the solution of the problem to be solved,in order to optimize the probability decision graph, the specific analytical formula is as follows:
where K is { K ═ K1,…,KnThe symbol represents the random field and the symbol represents the random field,representing a random field ofAn optimized probability decision graph when the input image is O, Z (O) is a distribution function,o' is an image of a random class on O;
in the above formula, N is the number of random fields, i is a random number smaller than N, and j is a random number which is not equal to i and smaller than N; wherein the function of unitary potentialThe observed value for measuring the current pixel point i isThen, the probability that the pixel point belongs to the class label in O is output from the back end of the whole network; binary potential functionThen, for measuring the probability of two events occurring simultaneously, the calculation formula is as follows:
for label-compatible terms, it constrains the conditions of conduction between pixels, energy being conducted to each other only under the same label conditions, i.e. whenAndwhen the labels are the same, the label is the same,otherwiseIn the latter addition term, ωmIs a weight value parameter that is a function of,is a feature function, fi, fj are pixels in an arbitrary feature spacei and j, as shown in the following equation:
the above formula represents the "intimacy" between different pixels in terms of the travel of the feature, the first term of the formula being called the surface kernel and the second term being called the smoothing kernel, W(1),θα,θβAs surface nuclear parameter, W(2),θγThe parameters are smooth kernel parameters which are model parameters and are obtained through training, and p and I respectively represent the actual position and color of the pixel point.
Further, the fusion calculation rule in step 3 is designed as follows;
assuming that a binary image matrix refined by a probability decision diagram is WAThe other half of the binary image is WBI.e. WB=1-WAThe source image is IAAnd IBTherefore, the calculation rule of the final fusion image F is:
F=WA·IA+IB·(1-WA)。
further, the specific implementation manner in step 4 is as follows;
a) data set preparation: the data set used was the VOC2012 data set, which was divided into 4 general categories in total: vehicles, households, animals, humans, the data set is classified into 20 categories, and the background is added for 21 categories, and the total number of the categories comprises 17125 images; in order to simulate a multi-focus image, an image generation method in multi-focus image fusion image cutout in a dynamic scene is adopted, the real multi-focus condition is simulated by Gaussian blur, and a synthesized multi-focus image is obtained through five steps in total, wherein the synthesized multi-focus image is respectively Gaussian blur, image transformation, image inversion, pixel-by-pixel multiplication and pixel-by-pixel addition;
b) adjusting training parameters: multiple focus images IAAnd multi-focus image IBCombining the images into 6 channels, sending the images into an integral network for training, wherein the size of the images in the training stage is 256 x 256, Adam is used as a gradient optimizer, the learning rate of the Adam is 0.001, and the regularization term is0.9, 1 image of the network is sent each time, the total training times are n times, the loss function of the whole network adopts binary cross entropy, and in the testing stage, the input image is the original size of the image; the binary cross entropy formula is as follows:
for a single sample, whereinIs the actual output of the sample, yiIs the desired output of the sample.
The invention provides a multi-focus image fusion method combining a depth context and a convolution condition random field. The fusion method utilizes the advantage of feature multiplexing in a dense convolutional neural network, integrates multi-focus source images to achieve feature detection of cooperative focusing, then pools global information through a multi-scale pyramid, aggregates context information of different focusing areas, optimizes the distinguishing capability of scattered focusing and obtains a rough fusion probability decision diagram. And then, further optimizing by using a convolution conditional random field, obtaining an accurate probability decision diagram by roughly fusing the probability decision diagram, and finally generating a fusion result image with good details. The final experimental result shows that the method provided by the invention obtains a better result in vision, and also obtains the best result in four quantitative indexes, so that the effectiveness of the method provided by the invention is fully proved, and the method can be effectively applied to an automatic imaging vision task.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
FIG. 2 is a pyramid pooling multi-scale information extraction diagram of the present invention.
FIG. 3 is a probability decision diagram for multi-scale pyramid pooling cooperative detection, showing the source image, and the effect of the coarse segmentation diagram and the fine segmentation diagram thereof.
FIG. 4 shows the results of the Lytro-3 image fusion;
FIG. 5 is a graph comparing the results of the Lytro-17 experiment;
FIG. 6 is a Lytro-17 image residual pseudo-color contrast diagram.
Detailed Description
The technical scheme of the invention can adopt a computer software technology to realize an automatic operation process. The technical scheme of the invention is explained in detail in the following by combining the drawings and the embodiment. As shown in fig. 1, the flow of the technical solution of the embodiment includes the following steps:
step 1, detecting an image by utilizing a dense network to focus cooperatively:
firstly for the input source image IAAnd IBConsider two registered multi-focus source images IAAnd IBThe method integrates the correlation information into a multi-channel image, and performs the cooperative detection operation based on the dense convolutional network on the basis.
The core of the dense network cooperative focusing detection is a deep dense convolutional neural network which is based on a convolutional neural network, further exploration is carried out theoretically, and a deep dense concept is introduced. The most important components are as follows: the gradient loss reducing device comprises a transition layer and dense blocks, wherein the dense blocks are mutually connected, so that the gradient loss can be reduced, and the utilization rate of network characteristic information can be better improved. Inside each dense block, the input of each layer is the splice of the outputs of all previous layers, where the splice refers to the splice at the channel level. For example, stitching together a 56 × 56 × 64 data and a 56 × 56 × 32 data results in 56 × 56 × 96, where 96 is the sum of 64 and 32. The definition is equal to the growth rate, indicating that the output of each layer is a certain number of channels. Each dense block contains a number of substructures, taking the first one of the dense blocks as an example, the substructure is first a bottleeck layer that operates as a 1 x 1 convolution with the purpose of reducing the number of feature-maps, followed by a convolution kernel of 3 x 3 convolution layers for feature extraction. The number of dense blocks in the network is 4, and the structure is shown as the following table:
TABLE 1 dense block Structure Table
The transition layer is arranged between the two dense blocks, plays a role of connection and consists of a convolution layer and a pooling layer. As shown in fig. 1, the module has four dense blocks. In general, in a deep learning network, the deeper the network is, the more serious the gradient disappears, and thus dense blocks are introduced for mitigation. On the basis of a residual error network, the structure of the deep dense convolution neural network is more complex, and after dense connection is added, all layers have all previous layer characteristics, so that characteristic multiplexing can be effectively realized, and the transmission of multi-focus image characteristic information of different layers is optimized and utilized. And (4) obtaining a multi-dimensional characteristic diagram after the multi-channel image is subjected to dense network cooperative focusing detection.
Step 2, extracting pyramid pooling multi-scale information, which specifically comprises the following contents:
and (3) performing multi-scale feature information extraction on the multi-dimensional feature map obtained after the detection in the step (1) by utilizing pyramid pooling. Considering that the most difficult detection points of a multi-focus image are a focus area and a non-focus area, important global prior knowledge is not sufficiently obtained in a high-level convolutional neural network, and high-level features contain more semantics and less position information, in order to further reduce the loss of context information between different sub-areas, as shown in fig. 1, a pyramid pooling model is introduced, which is very common in the traditional machine learning feature extraction, and the main idea is to divide a sub-image into blocks of a plurality of scales, for example, one image is divided into 1 part, 4 parts, 8 parts and the like. The features are then extracted for each block and then fused together so that features of multiple scales are compatible.
As shown in FIG. 2, the method adopts 4 different pyramid scales, the number of layers of pyramid pooling modules and the size of each layer can be modified, the pyramid pooling modules in the invention are 4 layers, and the sizes of each layer are 1 × 1,2 × 2,3 × 3 and 6 × 6 respectively. Firstly, pooling the feature map to a target size, respectively, by operating the process of imaging as N × N blocks, then pooling each block, as in the first figure, the red brick is the feature map subjected to 1 × 1 pooling, and then performing 1 × 1 convolution on each pooled result to reduce the channel to the original 1/N, where N is 4. Then, each feature map of the previous step is up-sampled by bilinear interpolation to obtain the same size of the original feature map, and then the original feature map and the feature map obtained by up-sampling are classified according to channel dimension. The obtained channel is twice that of the original characteristic diagram, and finally the channel is reduced to the original channel by 1 × 1 convolution. The final profile is the same as the original profile size and channel.
The global context prior model is an effective global context prior model, contains information among different sub-regions with different scales, can effectively improve the capability of a network for utilizing global context information, enables the network to fully mine boundary information of focused and unfocused regions, embeds context features of a scene difficult to fuse, and improves the fusion effect. And finally, merging the feature graphs to obtain a rough two-classification probability decision graph.
Step 3, refining the convolution conditional random field probability decision diagram, which specifically comprises the following steps:
although the depth-dense convolutional neural network in step 1 and the pyramid pooling model in step 2 have a good effect on the extraction of the global context information of the source image, there are pixels that are misclassified in the probability map, as shown in fig. 3. Therefore, in order to obtain more accurate and excellent segmentation capability, the probabilistic decision graph is optimized by using the convolution conditional random fields (ConvCRFs). The convolution Conditional Random Field is optimized based on the Fully connected Conditional Random Field (fullrfs). The coarse binary probability decision graph O input by step 2 can be passed throughTo perform the solution of the problem to be solved,in order to optimize the probability decision graph, the specific analytical formula is as follows:
where K is { K ═ K1,…,KnThe symbol represents the random field and the symbol represents the random field,is a random field ofAn optimized probability decision graph when the input image is O, Z (O) is a distribution function,
in the above formula, N is the number of random fields, i is a random number smaller than N, and j is a random number not equal to i but smaller than N. Wherein the function of unitary potentialThe observed value for measuring the current pixel point i isAnd (3) the probability that the pixel point belongs to the class label in the O is output from the rear end of the whole network formed by the deep dense convolutional neural network in the step (1) and the pyramid pooling model in the step (2). Binary potential functionThen, for measuring the probability of two events occurring simultaneously, the calculation formula is as follows:
for label-compatible terms, it constrains the conditions of conduction between pixels, energy being conducted to each other only under the same label conditions, i.e. whenAndwhen the labels are the same, the label is the same,otherwiseIn the latter addition term, ωmIs a weight value parameter that is a function of,is a characteristic function, fi,fjIs the feature vector of pixels i and j in an arbitrary feature space, as shown in the following formula:
the above formula represents the "intimacy" between different pixels in terms of the travel of the feature, the first term of the formula being referred to as the surface kernel and the second term being referred to as the smoothing kernel. W(1),θα,θβAs surface nuclear parameter, W(2),θγFor smoothing the kernel parameters, these parameters are model parameters, and are obtained by segment training. p and I respectively represent the actual position and color value of the pixel point, and the color value is the pixel value.
The condition independence factor is added in the FullCRFs framework, so that the operation enables the GPU to effectively speculate by utilizing convolution operation of a convolution random condition field, and the feature extraction capability of the CNN and the modeling capability of a random field can be effectively combined, so that the convolution random condition field can effectively transmit information.
In order to realize accurate significance detection of a target region, global, local and boundary information of a probability map is integrated through a convolution random condition field, and a refined probability decision map can be effectively obtained. In the optimization process, a plurality of feature information of the probability map is calculated on the basis of convolution operation, the features are fused into a refined map through CRFs, and finally the accurate map optimized by ConvCRFs is obtained.
After refinement, assume that the binary image matrix after probability decision graph refinement is WAThe other half of the binary image is WBI.e. WB=1-WA. Source image is IAAnd IBTherefore, the calculation rule of the final fusion image F is:
F=WA·IA+IB·(1-WA)
step 4, performing network training on the deep dense neural network and the pyramid network, specifically comprising the following steps:
c) data set preparation: as used herein, the data set is the VOC2012 data set, which is divided into 4 general categories: vehicles, households, animals, humans, the data set was classified into 20 categories (plus background total 21 categories), containing 17125 images in total. In order to simulate a multi-focus image, the invention adopts an image generation method (Shutao Li et al,2013) of Listo et al for multi-focus image fusion image cutout in a dynamic scene, and utilizes Gaussian blur to simulate a real multi-focus condition. The total of five steps are used to obtain the synthesized multifocal image, which is respectively Gaussian blur, image transformation, image inversion, pixel-by-pixel multiplication and pixel-by-pixel addition.
d) Adjusting training parameters: multiple focus images IAAnd multi-focus image IBAnd combining the images into 6 channels, sending the images into the whole network for training, wherein the size of the images in the training stage is 256 × 256, Adam is used as a gradient optimizer, the learning rate of the Adam is 0.001, and the regularization term is 0.9. 1 sheet at a time of sending into the networkAnd (4) images, the total training times are 30, and the loss function adopts binary cross entropy. In the testing stage, the input image is the original size of the image. The binary cross entropy formula is as follows:
for a single sample, whereinIs the actual output of the sample, yiIs the desired output of the sample.
Step 5, comparing the method of the invention with the mainstream method, testing and analyzing, and specifically comprising the following steps: .
a) The comparison method comprises the following steps: in order to prove the superiority and effectiveness of the fusion method provided by the invention, a Lytro multi-focusing color image data set is selected for experiment. 7 mainstream image fusion methods are selected as comparison methods, which respectively comprise: non-subsampled contourlet Transform Fusion method (NSCT), guided Filter-based Fusion method (GFF), multi-focus Image Fusion Method (IFM) based on image matting, Bilateral Filter-based (CBF) Fusion method, Discrete Cosine Harmonic Wavelet Transform-based (DCHWT) Fusion method, convolutional neural Network-based (CNN) Fusion method, Pyramid pooling Network-based (PSPF) Fusion method
b) And (3) qualitative analysis: fig. 4 shows the visual fusion results of Lytro-3 by different fusion methods, from which it can be seen that "the edge of boy's ear" becomes more blurred in most of the comparison methods, but the edge structure of the fusion result graph obtained by the method herein is relatively clear, which can prove that the method herein has better edge information extraction capability. In order to further verify the effect of the method, the fusion result of the Lytro-17 and the pseudo-color difference graph of the fused image and the source image A are shown in FIGS. 5 and 6. It is easy to see that in different images, the less the trace of the focus area is left, the more image information indicating the focus part is extracted into the fused image, which means better fusion performance. As can be seen from fig. 5 and 6, the algorithm has less information under the image, and the CNN and IFM methods are all insufficient in the boundary region, which indicates that the fusion method proposed herein has a better edge region, and makes full use of the image context information to better detect the focus boundary. Compared with a comparison method, the method has a better fusion effect on subjective visual effect evaluation.
c) Quantitative comparison: in order to objectively prove the effectiveness of the method, four mainstream multi-focus image evaluation indexes are adopted as quantization indexes, namely mutual information (Q)MI) Non-linearly related information entropy (Q)NCIE) Edge retention (Q)AB/F) Visual fidelity (Q)VIF) And compared with the mainstream 7 methods. The experimental results are shown in tables 2 and 3, where table 2 shows the objective evaluation of the fusion results of 5 images, and table 3 shows the average result of the objective evaluation of 20 images. Wherein the bold values indicate the optimal results and the underlined values indicate the suboptimal results.
TABLE 2 Objective evaluation comparison of fusion results
TABLE 3 average objective evaluation comparison of fusion results
As can be seen from Table 2, the method herein achieves better results in all four objective evaluation indexes. Most of the methods provided by the invention are optimal in terms of scoring scores of indexes such as mutual information, nonlinear correlation information entropy, edge retention, visual information fidelity and the like, which means that the depth characteristics of multi-focus images can be well mined by adopting the provided dense convolutional network cooperative detection method so as to carry out focusing estimation, and the adopted space pyramid pooling module better constrains context information so as to enable the image edge focusing detection to be better, so that the overall fusion quality of the algorithm is higher, and more source image focusing information can be retained. As can be seen from the average evaluation results in table 3, the method also obtains sub-optimal results next to the CNN method in the visual information fidelity evaluation index. This also indicates that, in the combined situation, the method of the present invention has a better fusion effect than other methods.
d) Summary analysis: the invention provides a multi-focus image fusion method combining a depth context and a convolution condition random field. The fusion method utilizes the advantage of feature multiplexing in a dense convolutional neural network, integrates multi-focus source images to achieve feature detection of cooperative focusing, then pools global information through a multi-scale pyramid, aggregates context information of different focusing areas, optimizes the distinguishing capability of scattered focusing and obtains a rough fusion probability decision diagram. And then, further optimizing by using a convolution conditional random field, obtaining an accurate probability decision diagram by roughly fusing the probability decision diagram, and finally generating a fusion result image with good details. The final experimental result shows that the method provided by the invention obtains a better result in vision, and also obtains the best result in four quantitative indexes, so that the effectiveness of the method provided by the invention is fully proved, and the method can be effectively applied to an automatic imaging vision task.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.
Claims (6)
1. A multi-focus image fusion method combining depth context and convolution conditional random fields is characterized by comprising the following steps:
step 1, two registered multi-focus source images IAAnd IBIntegrating the images into a multi-channel image, and inputting the multi-channel image into a depth dense convolution neural network for focus detection to obtain a multi-dimensional feature map;
step 2, extracting feature information of the multi-dimensional feature map obtained in the step 1 by using a pyramid pooling model to obtain a plurality of feature maps, and then merging the feature maps to obtain a rough two-classification probability decision map;
step 3, for the two-classification probability decision diagrams obtained in the step 2, a convolution conditional random field is used for realizing probability decision diagram refinement, and fusion is carried out according to the obtained refined probability decision diagrams and fusion calculation rules to obtain a final multi-focus image fusion result;
step 4, training by combining the depth intensive convolution neural network in the step 1 and the whole network formed by the pyramid pooling model in the step 2;
and 5, fusing the multi-focus images by using the trained integral network.
2. The method of claim 1 for multi-focus image fusion combining depth context with convolutional conditional random fields, comprising: the deep dense convolutional neural network comprises a plurality of dense blocks and a transition layer, wherein the dense blocks comprise a plurality of convolutions of 1 × 1 and 3 × 3, the 1 × 1 convolution is used for reducing the number of feature maps, and the 3 × 3 convolution is used for extracting features; the transition layer is arranged between the two dense blocks, plays a role of connection and consists of a convolution layer and a pooling layer.
3. The method of claim 1 for multi-focus image fusion combining depth context with convolutional conditional random fields, comprising: the pyramid pooling model is processed as follows;
firstly, pooling the feature maps to a target size respectively, and then performing 1 × 1 convolution on the pooled result to reduce the channel to the original 1/N, wherein N is the layer number of the pyramid pooling model; then, each feature map in the previous step is up-sampled by bilinear interpolation to obtain the same size of the original feature map, then the original feature map and the feature map obtained by up-sampling are subjected to concatevation according to the channel dimension to obtain a channel which is twice as large as the channel of the original feature map, finally the channel is reduced to the original channel by 1 × 1 convolution, and the final feature map is the same as the size and the channel of the original feature map.
4. The method of claim 1 for multi-focus image fusion combining depth context with convolutional conditional random fields, comprising: the specific process of optimizing the binary probability decision diagram by using the convolution conditional random field in the step 3 is as follows;
inputting the rough two-classification probability decision diagram obtained in the step 2, and marking the rough two-classification probability decision diagram as O, wherein the process of processing by using the convolution conditional random field can be realized byTo perform the solution of the problem to be solved,in order to optimize the probability decision graph, the specific analytical formula is as follows:
where K is { K ═ K1,…,KnThe symbol represents the random field and the symbol represents the random field,representing a random field ofAn optimized probability decision graph when the input image is O, Z (O) is a distribution function,o' is an image of a random class on O;
in the above formula, N is the number of random fields, i is a random number smaller than N, and j is a random number which is not equal to i and smaller than N; wherein the function of unitary potentialThe observed value for measuring the current pixel point i isThen, the probability that the pixel point belongs to the class label in O is output from the back end of the whole network; binary potential functionThen, for measuring the probability of two events occurring simultaneously, the calculation formula is as follows:
for label-compatible terms, it constrains the conditions of conduction between pixels, energy being conducted to each other only under the same label conditions, i.e. whenAndwhen the labels are the same, the label is the same,otherwiseIn the latter addition term, ωmIs a weight value parameter that is a function of,is a characteristic function, fi,fjIs the feature vector of pixels i and j in an arbitrary feature space, as shown in the following formula:
the above formula represents the "intimacy" between different pixels in terms of the travel of the feature, the first term of the formula being called the surface kernel and the second term being called the smoothing kernel, W(1),θα,θβAs surface nuclear parameter, W(2),θγThe parameters are smooth kernel parameters which are model parameters and are obtained through training, p and I respectively represent the actual position and the color value of the pixel point, and the color value is the pixel value.
5. The method of claim 1 for multi-focus image fusion combining depth context with convolutional conditional random fields, comprising: step 3, fusing calculation rules and designing as follows;
assuming that a binary image matrix refined by a probability decision diagram is WAThe other half of the binary image is WBI.e. WB=1-WAThe source image is IAAnd IBTherefore, the calculation rule of the final fusion image F is:
F=WA·IA+IB·(1-WA)。
6. the method of claim 1 for multi-focus image fusion combining depth context with convolutional conditional random fields, comprising: the specific implementation manner in the step 4 is as follows;
a) data set preparation: the data set used was the VOC2012 data set, which was divided into 4 general categories in total: vehicles, households, animals, humans, the data set is classified into 20 categories, and the background is added for 21 categories, and the total number of the categories comprises 17125 images; in order to simulate a multi-focus image, an image generation method in multi-focus image fusion image cutout in a dynamic scene is adopted, the real multi-focus condition is simulated by Gaussian blur, and a synthesized multi-focus image is obtained through five steps in total, wherein the synthesized multi-focus image is respectively Gaussian blur, image transformation, image inversion, pixel-by-pixel multiplication and pixel-by-pixel addition;
b) adjusting training parameters: multiple focus images IAAnd multi-focus image IBCombining the images into 6 channels, sending the images into an overall network for training, wherein the size of the images in the training stage is 256 x 256, Adam is used as a gradient optimizer, the learning rate of the Adam is 0.001, the regularization term is 0.9, 1 image is sent into the network each time, the total training times are n times, the loss function of the overall network adopts binary cross entropy, and the input images are the original sizes of the images in the testing stage; the binary cross entropy formula is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111047787.3A CN113763300B (en) | 2021-09-08 | 2021-09-08 | Multi-focusing image fusion method combining depth context and convolution conditional random field |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111047787.3A CN113763300B (en) | 2021-09-08 | 2021-09-08 | Multi-focusing image fusion method combining depth context and convolution conditional random field |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113763300A true CN113763300A (en) | 2021-12-07 |
CN113763300B CN113763300B (en) | 2023-06-06 |
Family
ID=78793683
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111047787.3A Active CN113763300B (en) | 2021-09-08 | 2021-09-08 | Multi-focusing image fusion method combining depth context and convolution conditional random field |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113763300B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114707427A (en) * | 2022-05-25 | 2022-07-05 | 青岛科技大学 | Personalized modeling method of graph neural network based on effective neighbor sampling maximization |
CN115984104A (en) * | 2022-12-05 | 2023-04-18 | 南京大学 | Multi-focus image fusion method and device based on self-supervision learning |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017190574A1 (en) * | 2016-05-04 | 2017-11-09 | 北京大学深圳研究生院 | Fast pedestrian detection method based on aggregation channel features |
CN107369148A (en) * | 2017-09-20 | 2017-11-21 | 湖北工业大学 | Based on the multi-focus image fusing method for improving SML and Steerable filter |
US20190236411A1 (en) * | 2016-09-14 | 2019-08-01 | Konica Minolta Laboratory U.S.A., Inc. | Method and system for multi-scale cell image segmentation using multiple parallel convolutional neural networks |
CN110334779A (en) * | 2019-07-16 | 2019-10-15 | 大连海事大学 | A kind of multi-focus image fusing method based on PSPNet detail extraction |
CN110533623A (en) * | 2019-09-06 | 2019-12-03 | 兰州交通大学 | A kind of full convolutional neural networks multi-focus image fusing method based on supervised learning |
CN111368707A (en) * | 2020-03-02 | 2020-07-03 | 佛山科学技术学院 | Face detection method, system, device and medium based on feature pyramid and dense block |
CN111429393A (en) * | 2020-04-15 | 2020-07-17 | 四川警察学院 | Multi-focus image fusion method based on convolution elastic network |
US20200273192A1 (en) * | 2019-02-26 | 2020-08-27 | Baidu Usa Llc | Systems and methods for depth estimation using convolutional spatial propagation networks |
US20200327679A1 (en) * | 2019-04-12 | 2020-10-15 | Beijing Moviebook Science and Technology Co., Ltd. | Visual target tracking method and apparatus based on deeply and densely connected neural network |
CN112949579A (en) * | 2021-03-30 | 2021-06-11 | 上海交通大学 | Target fusion detection system and method based on dense convolution block neural network |
CN113159236A (en) * | 2021-05-26 | 2021-07-23 | 中国工商银行股份有限公司 | Multi-focus image fusion method and device based on multi-scale transformation |
-
2021
- 2021-09-08 CN CN202111047787.3A patent/CN113763300B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017190574A1 (en) * | 2016-05-04 | 2017-11-09 | 北京大学深圳研究生院 | Fast pedestrian detection method based on aggregation channel features |
US20190236411A1 (en) * | 2016-09-14 | 2019-08-01 | Konica Minolta Laboratory U.S.A., Inc. | Method and system for multi-scale cell image segmentation using multiple parallel convolutional neural networks |
CN107369148A (en) * | 2017-09-20 | 2017-11-21 | 湖北工业大学 | Based on the multi-focus image fusing method for improving SML and Steerable filter |
US20200273192A1 (en) * | 2019-02-26 | 2020-08-27 | Baidu Usa Llc | Systems and methods for depth estimation using convolutional spatial propagation networks |
US20200327679A1 (en) * | 2019-04-12 | 2020-10-15 | Beijing Moviebook Science and Technology Co., Ltd. | Visual target tracking method and apparatus based on deeply and densely connected neural network |
CN110334779A (en) * | 2019-07-16 | 2019-10-15 | 大连海事大学 | A kind of multi-focus image fusing method based on PSPNet detail extraction |
CN110533623A (en) * | 2019-09-06 | 2019-12-03 | 兰州交通大学 | A kind of full convolutional neural networks multi-focus image fusing method based on supervised learning |
CN111368707A (en) * | 2020-03-02 | 2020-07-03 | 佛山科学技术学院 | Face detection method, system, device and medium based on feature pyramid and dense block |
CN111429393A (en) * | 2020-04-15 | 2020-07-17 | 四川警察学院 | Multi-focus image fusion method based on convolution elastic network |
CN112949579A (en) * | 2021-03-30 | 2021-06-11 | 上海交通大学 | Target fusion detection system and method based on dense convolution block neural network |
CN113159236A (en) * | 2021-05-26 | 2021-07-23 | 中国工商银行股份有限公司 | Multi-focus image fusion method and device based on multi-scale transformation |
Non-Patent Citations (3)
Title |
---|
YI LI ETC: "Pyramid Pooling Dense Convolutional Neural Network for Multi-focus Image Fusion", 《2019 IEEE 6TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS)》 * |
任坤;黄泷;范春奇;高学金;: "基于多尺度像素特征融合的实时小交通标志检测算法", 信号处理, no. 09 * |
李恒等: "基于监督学习的全卷积神经网络多聚焦图像融合算法", 《激光与光电子学进展》, vol. 57, no. 08 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114707427A (en) * | 2022-05-25 | 2022-07-05 | 青岛科技大学 | Personalized modeling method of graph neural network based on effective neighbor sampling maximization |
CN114707427B (en) * | 2022-05-25 | 2022-09-06 | 青岛科技大学 | Personalized modeling method of graph neural network based on effective neighbor sampling maximization |
CN115984104A (en) * | 2022-12-05 | 2023-04-18 | 南京大学 | Multi-focus image fusion method and device based on self-supervision learning |
CN115984104B (en) * | 2022-12-05 | 2023-09-22 | 南京大学 | Multi-focus image fusion method and device based on self-supervision learning |
Also Published As
Publication number | Publication date |
---|---|
CN113763300B (en) | 2023-06-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | An experimental-based review of image enhancement and image restoration methods for underwater imaging | |
CN108986050B (en) | Image and video enhancement method based on multi-branch convolutional neural network | |
CN107909560A (en) | A kind of multi-focus image fusing method and system based on SiR | |
CN111784620B (en) | Light field camera full-focusing image fusion algorithm for guiding angle information by space information | |
CN113763300B (en) | Multi-focusing image fusion method combining depth context and convolution conditional random field | |
Ding et al. | U 2 D 2 Net: Unsupervised unified image dehazing and denoising network for single hazy image enhancement | |
Cong et al. | Discrete haze level dehazing network | |
Chang | Single underwater image restoration based on adaptive transmission fusion | |
Zhang et al. | Photo-realistic dehazing via contextual generative adversarial networks | |
Qiao et al. | Layered input GradiNet for image denoising | |
Guan et al. | NCDCN: multi-focus image fusion via nest connection and dilated convolution network | |
Guo et al. | Low-light image enhancement with joint illumination and noise data distribution transformation | |
Yu et al. | Multi-focus image fusion based on L1 image transform | |
Liu et al. | Underwater optical image enhancement based on super-resolution convolutional neural network and perceptual fusion | |
Hou et al. | Joint learning of image deblurring and depth estimation through adversarial multi-task network | |
CN117495882A (en) | Liver tumor CT image segmentation method based on AGCH-Net and multi-scale fusion | |
Zhang et al. | MFFE: multi-scale feature fusion enhanced net for image dehazing | |
Pang et al. | Underwater image enhancement via variable contrast and saturation enhancement model | |
Wang et al. | New insights into multi-focus image fusion: A fusion method based on multi-dictionary linear sparse representation and region fusion model | |
Moghimi et al. | A joint adaptive evolutionary model towards optical image contrast enhancement and geometrical reconstruction approach in underwater remote sensing | |
CN112508828A (en) | Multi-focus image fusion method based on sparse representation and guided filtering | |
CN116757979A (en) | Embryo image fusion method, device, electronic equipment and storage medium | |
Kumar et al. | Underwater image enhancement using deep learning | |
Lee et al. | An image-guided network for depth edge enhancement | |
Xu et al. | Interactive algorithms in complex image processing systems based on big data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |