CN111507970B

CN111507970B - Image fusion quality detection method and device

Info

Publication number: CN111507970B
Application number: CN202010311554.9A
Authority: CN
Inventors: 苑贵全; 骞一凡; 朱冬; 杨易
Original assignee: Chongqing Qiteng Technology Co Ltd
Current assignee: Seven Teng Robot Co ltd
Priority date: 2020-04-20
Filing date: 2020-04-20
Publication date: 2022-01-11
Anticipated expiration: 2040-04-20
Also published as: CN111507970A

Abstract

The application discloses an image fusion quality detection method and device. The method comprises the steps of searching a tracked image frame from each frame of a video image; respectively carrying out image fusion on a plurality of tracked person images according to a wavelet transform image fusion method, a contour wavelet fusion method and a scale invariant feature transform image fusion method; and respectively calculating average gradients according to the fusion results, and judging the image fusion quality according to the average gradients. By adopting the image fusion quality detection method provided by the application, the quality of image fusion is detected by calculating the average gradient of three different image fusion modes, and the tracked person is better positioned.

Description

Image fusion quality detection method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for detecting image fusion quality.

Background

In video security monitoring, tracking and positioning of people are the most important problems. However, when a person is identified, it is easy to recognize the position of the person because the person is blocked by an object.

People tracking is generally realized in the prior art by searching one or more pictures which can best embody the face of a person from a video to manually infer the facial features of the person to be shot. However, human inference is largely unable to accurately understand the situation of the photographed person, thereby causing failure in person tracking. In the prior art, a manual identification mode is more and more abandoned, and various automatic image identification methods are proposed, but the quality of detected images is also a key condition for person tracking, so that a method capable of detecting the image quality is urgently needed to inform a tracker of the accuracy of person tracking.

Disclosure of Invention

The application provides an image fusion quality detection method, which comprises the following steps:

searching a tracked image frame from each frame of the video image;

respectively carrying out image fusion on a plurality of tracked person images according to a wavelet transform image fusion method, a contour wavelet fusion method and a scale invariant feature transform image fusion method;

and respectively calculating average gradients according to the fusion results, and judging the image fusion quality according to the average gradients.

The image fusion quality detection method as described above, wherein the image frame of the tracked object is searched from each frame of the video image, specifically:

constructing a deep convolutional neural network model;

from the input layer, sequentially passing through a first convolution layer, a first depth convolution layer, a second depth convolution layer, a third convolution layer and a third depth convolution layer;

and inputting the output image into the global average pooling layer and the full connection layer to reach a softmax layer, outputting the occurrence probability of the tracked person by the softmax layer, and if the output probability is 1, determining the image frame as the image frame of the tracked person.

The image fusion quality detection method as described above, further includes preprocessing the tracked object image after the tracked object image is found, so as to eliminate or reduce noise in the image.

The image fusion quality detection method described above, wherein the image fusion is performed on the preprocessed multiple tracked person images according to a wavelet transform image fusion method, specifically includes the following sub-steps:

decomposing each tracked person image by using a discrete wavelet transform function to obtain a source image;

fusing wavelet coefficients corresponding to the source images based on a modulus maximum fusion algorithm to obtain fused images;

and performing wavelet inverse transformation on the fused image to obtain an image fusion result based on wavelet transformation.

The image fusion quality detection method described above, wherein image fusion is performed on a plurality of preprocessed tracked person images according to a contour wavelet fusion method, specifically includes the following sub-steps:

decomposing each tracked person image by using an edge contour transformation function to obtain a source image, and decomposing the source image to obtain a contour wavelet coefficient;

comparing the high-frequency coefficient in the contour wavelet coefficient obtained by decomposition, and taking the maximum value of the high-frequency coefficient as the high-frequency coefficient of the fused image;

calculating the mean value of low-frequency coefficients in the contour wavelet coefficients obtained by decomposition, and taking the mean value of the low-frequency coefficients as the low-frequency coefficients of the fused image;

and forming the low-frequency coefficient and the high-frequency coefficient of the fused image into a coefficient of the fused image, and performing contour wavelet fusion inverse transformation on the coefficient of the fused image to obtain an image fusion result based on a contour wavelet fusion method.

The method for detecting image fusion quality as described above, wherein the image fusion of the preprocessed images of the plurality of tracked objects is performed according to a scale-invariant feature transform image fusion method, and the method specifically includes the following sub-steps:

carrying out linear filtering on the two tracked person images to obtain contrast, direction and brightness characteristic saliency maps of the two tracked person images, and solving intersection of the contrast, direction and brightness characteristic saliency maps to obtain a visual saliency area, a unique saliency area and a public saliency area;

determining a fusion coefficient of the fusion image according to the low-frequency components of the visual salient region, the unique salient region and the common salient region;

and performing multi-scale inverse transformation on the fusion coefficient by using a multi-scale fusion algorithm to reconstruct a fusion image.

The application also provides an image fusion quality detection device, including:

the tracked person image searching module is used for searching the tracked person image frame from each frame of the video image;

the image fusion module is used for respectively carrying out image fusion on a plurality of tracked person images according to a wavelet transform image fusion method, a contour wavelet fusion method and a scale invariant feature transform image fusion method;

and the fusion image quality detection module based on the average gradient is used for respectively calculating the average gradient according to the fusion result and judging the image fusion quality according to the average gradient.

The image fusion quality detection device as described above, wherein the tracked object image search module is specifically configured to construct a deep convolutional neural network model; from the input layer, sequentially passing through a first convolution layer, a first depth convolution layer, a second depth convolution layer, a third convolution layer and a third depth convolution layer; and inputting the output image into the global average pooling layer and the full connection layer to reach a softmax layer, outputting the occurrence probability of the tracked person by the softmax layer, and if the output probability is 1, determining the image frame as the image frame of the tracked person.

The image fusion quality detection device comprises an image fusion module, a tracking module and a processing module, wherein the image fusion module comprises a wavelet transform image fusion submodule, and the wavelet transform image fusion submodule is specifically used for decomposing each tracked image by using a discrete wavelet transform function to obtain a source image; fusing wavelet coefficients corresponding to the source images based on a modulus maximum fusion algorithm to obtain fused images; and performing wavelet inverse transformation on the fused image to obtain an image fusion result based on wavelet transformation.

The image fusion quality detection device as described above, wherein the image fusion module performs image fusion on a plurality of tracked person images according to a contour wavelet fusion method, and is specifically configured to decompose each tracked person image with an edge contour transformation function to obtain a source image, and decompose the source image to obtain a contour wavelet coefficient; comparing the high-frequency coefficient in the contour wavelet coefficient obtained by decomposition, and taking the maximum value of the high-frequency coefficient as the high-frequency coefficient of the fused image; calculating the mean value of low-frequency coefficients in the contour wavelet coefficients obtained by decomposition, and taking the mean value of the low-frequency coefficients as the low-frequency coefficients of the fused image; and forming the low-frequency coefficient and the high-frequency coefficient of the fused image into a coefficient of the fused image, and performing contour wavelet fusion inverse transformation on the coefficient of the fused image to obtain an image fusion result based on a contour wavelet fusion method.

The image fusion quality detection device described above, wherein in the image fusion module, image fusion is performed on a plurality of tracked person images according to a scale-invariant feature transform image fusion method, and is specifically configured to perform linear filtering on two tracked person images to obtain contrast, direction, and brightness feature saliency maps thereof, and solve an intersection of the contrast, direction, and brightness feature saliency maps to obtain a visual saliency region, a unique saliency region, and a common saliency region; determining a fusion coefficient of the fusion image according to the low-frequency components of the visual salient region, the unique salient region and the common salient region; and performing multi-scale inverse transformation on the fusion coefficient by using a multi-scale fusion algorithm to reconstruct a fusion image.

The beneficial effect that this application realized is as follows: by adopting the image fusion quality detection method provided by the application, the quality of image fusion is detected by calculating the average gradient of three different image fusion modes, and the tracked person is better positioned.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.

Fig. 1 is a flowchart of an image fusion quality detection method according to an embodiment of the present application;

FIG. 2 is a flowchart of an image fusion method for pre-processing a plurality of images of a tracked person according to a wavelet transform image fusion method;

FIG. 3 is a flow chart of an image fusion method for pre-processing a plurality of tracked images according to a contour wavelet fusion method;

FIG. 4 is a flowchart of an image fusion method for pre-processing a plurality of images of a tracked person according to a scale-invariant feature transform image fusion method;

fig. 5 is a schematic diagram of an image fusion quality detection apparatus according to the second embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example one

An embodiment of the present application provides an image fusion quality detection method, as shown in fig. 1, including:

step 110, searching a tracked image frame from each frame of the video image;

because the real-time requirement on the tracked person is higher, the image judgment is carried out by adopting the deep convolutional neural network model with high calculation speed;

the method for searching the image frame of the tracked object from each frame of the video image specifically comprises the following steps: constructing a deep convolution neural network model, and sequentially performing convolution C from an input layer₁Layer (output image size [256, 8)]) Depth convolution layer D₁(output image size [128, 16)]) And a convolution layer C₂(output image size [64,64,32 ]]) Depth convolution layer D₂(output image size [32,32,64 ]]) And a convolution layer C₃(output image size [16, 128)]) Depth convolution layer D₃(output image size [8, 256 ]]) Then, the image passes through a global average pooling layer and a full connection layer, finally reaches a softmax layer, and judges whether a tracked person exists in the image according to an output result of the softmax layer;

the softmax layer outputs the probability of the appearance of the tracked person, and if the softmax layer output value is 1, it indicates that the tracked person appears in the frame image, and if the softmax layer output value is 0, it indicates that the tracked person does not appear in the frame image.

Step 120, preprocessing the image frame of the tracked person;

specifically, due to the influence of the acquisition conditions, illumination and other factors, the images carry noise, and therefore, the images of the tracked persons need to be preprocessed to eliminate or reduce the noise in the images.

Step 130, respectively carrying out image fusion on the preprocessed images of the plurality of tracked persons according to a wavelet transform image fusion method, a contour wavelet fusion method and a scale invariant feature transform image fusion method;

in the embodiment of the application, image fusion is respectively carried out based on a wavelet transform image fusion method, a scale invariant feature transform image fusion method and a contour wavelet fusion method to obtain image fusion results under different methods, and the quality of image fusion is judged according to the comparison of the image fusion results under different methods;

the image fusion of the preprocessed multiple tracked person images according to the wavelet transform image fusion method, as shown in fig. 2, specifically includes the following sub-steps:

step 210, decomposing each tracked person image by using a db1 discrete wavelet transform function to obtain a source image;

step 220, fusing wavelet coefficients corresponding to the source image based on a modulus maximum fusion algorithm to obtain a fused image;

specifically, the wavelet system is fused by the following formula:

i_wave(x,y)＝MAX{w₁(x,y),w₂(x,y),w₃(x,y)……w_i(x,y)}

wherein x represents a wavelet coefficient w₁、w₂……w_iY represents the wavelet coefficient w₁、w₂……w_iNumber of columns, w_i(x, y) denotes a wavelet system w_iAnd the value at x rows and y columns, i _ wave (x, y) represents the value of the fused wavelet coefficient i _ wave at x rows and y columns, and i is the total number of the source images.

And step 230, performing wavelet inverse transformation on the fused image to obtain an image fusion result based on wavelet transformation.

The image fusion of the preprocessed images of the plurality of tracked persons is performed according to a contour wavelet fusion method, as shown in fig. 3, and specifically includes the following sub-steps:

step 310, decomposing each tracked person image by using an edge contour transformation function to obtain a source image, and decomposing the source image to obtain a contour wavelet coefficient;

optionally, a source image is decomposed in three layers, and the highest layer is decomposed in eight directions and the next highest layer is decomposed in four directions to obtain a decomposed contour wavelet coefficient L_i= low frequency coefficient cl_iHigh frequency coefficient ch_iAnd i is the total number of source images.

Step 320, comparing high-frequency coefficients in the contour wavelet coefficients obtained by decomposition, and taking the maximum value of the high-frequency coefficients as the high-frequency coefficients of the fused image;

specifically, the high-frequency coefficient maximum value is obtained by the following formula:

freH(x,y)＝MAX{ch₁(x,y),ch₂(x,y),……,ch_i(x,y)}；

wherein x represents a high frequency coefficient ch_iThe number of lines of (a) and y represents a high-frequency coefficient ch_iNumber of columns of (ch)_i(x, y) represents a high frequency coefficient ch_iThe values at x rows and y columns, freH (x, y) represents the values of the fused high-frequency coefficient freH at x rows and y columns, and i is the total number of source images.

Step 330, calculating the mean value of the low-frequency coefficients in the contour wavelet coefficients obtained by decomposition, and taking the mean value of the low-frequency coefficients as the low-frequency coefficients of the fused image;

specifically, the low frequency coefficient minimum is calculated by:

wherein x represents a low frequency coefficient cl_iY represents the low frequency coefficient cl_iNumber of columns of (ch)_i(x, y) represents the low frequency coefficient cl_iThe values at x rows and y columns, freL (x, y), represent the values of the fused low frequency coefficient freL at x rows and y columns, and i is the total number of source images.

And 340, forming the low-frequency coefficient and the high-frequency coefficient of the fused image into a coefficient of the fused image, and performing contour wavelet fusion inverse transformation on the coefficient of the fused image to obtain an image fusion result based on a contour wavelet fusion method.

The image fusion of the preprocessed multiple tracked person images is performed according to a scale-invariant feature transformation image fusion method, as shown in fig. 4, the method specifically includes the following sub-steps:

step 410, performing linear filtering on the two tracked person images to obtain contrast, direction and brightness characteristic saliency maps of the two tracked person images, and solving intersection of the contrast, direction and brightness characteristic saliency maps to obtain a visual saliency area, a unique saliency area and a public saliency area;

the contrast characteristic saliency map is obtained by filtering a source image by using a Gaussian pyramid, then performing a layer-by-layer difference solving method on a filtering result to obtain contrast characteristic saliency point distribution, and applying an entropy threshold segmentation method to the characteristic saliency point distribution; the direction characteristic saliency map is specifically that a filter is utilized to filter a source image in multiple directions, filtering results are added to obtain direction characteristic point distribution of the source image, and then an entropy threshold segmentation method is applied to the direction characteristic point distribution to generate the direction characteristic saliency map; the brightness characteristic saliency map is specifically a brightness characteristic saliency map of a source image generated by smoothing the source image by using an average filter to eliminate noise and gray level abrupt change influence and then applying an entropy threshold segmentation method to the smoothed image.

Step 420, determining a fusion coefficient of the fusion image according to the low-frequency components of the visual salient region, the unique salient region and the public salient region;

specifically, for each point in a low-frequency component obtained by dividing two source images, if the unique salient region of a certain source image corresponding to the point is 1, determining that the fused image is the low-frequency coefficient corresponding to the source image, if the point corresponds to a public salient region, taking the mean value of the low-frequency coefficients of the two source images as the low-frequency coefficient of the fused image, if the point does not belong to any salient region, calculating the domain variance of the two images, wherein the larger the variance is, the more abundant the source image belongs to the region of the point, and taking the low-frequency coefficient of the source image corresponding to the point as the low-frequency coefficient of the fused image.

And 430, performing multi-scale inverse transformation on the fusion coefficient by using a multi-scale fusion algorithm to reconstruct a fusion image.

Referring back to fig. 1, step 140, respectively calculating an average gradient according to the fusion result, and determining the image fusion quality according to the average gradient;

specifically, the average gradient of the fused image results after the fusion by the three methods is calculated by the following formula:

wherein M and N are the total number of rows and columns of the image, f (M and N) is a fused image function, and f (M and N) is_i,n_j) Is the image point of the ith row and the jth column,

is the derivative of the image point in the ith row and jth column in the ith row direction,

the derivative of the image point of the ith row and the jth column in the jth row direction;

the larger the calculated average gradient value is, the more image layers are, the clearer the image is, and the fused image with the largest average gradient value is correspondingly used as the fused image with the optimal quality, namely, the fusion method corresponding to the fused image is determined to have the optimal person tracking effect in the video image.

Example two

An embodiment of the present application provides an image fusion quality detection apparatus, as shown in fig. 5, including:

a tracked person image searching module 510, configured to search the tracked person image frame from each frame of the video image;

the image fusion module 520 is used for respectively carrying out image fusion on a plurality of tracked person images according to a wavelet transform image fusion method, a contour wavelet fusion method and a scale invariant feature transform image fusion method;

and an average gradient-based fusion image quality detection module 530, configured to calculate average gradients according to the fusion results, and determine image fusion quality according to the average gradients.

The tracked person image searching module 510 is specifically configured to construct a deep convolutional neural network model; from the input layer, sequentially passing through a first convolution layer, a first depth convolution layer, a second depth convolution layer, a third convolution layer and a third depth convolution layer; and inputting the output image into the global average pooling layer and the full connection layer to reach a softmax layer, outputting the occurrence probability of the tracked person by the softmax layer, and if the output probability is 1, determining the image frame as the image frame of the tracked person.

Further, the image fusion module 520 includes a wavelet transform image fusion sub-module 521, an outline wavelet fusion sub-module 522 and a scale-invariant feature transform image fusion sub-module 523;

the wavelet transform image fusion submodule 521 is specifically configured to decompose each tracked person image by using a discrete wavelet transform function to obtain a source image; fusing wavelet coefficients corresponding to the source images based on a modulus maximum fusion algorithm to obtain fused images; and performing wavelet inverse transformation on the fused image to obtain an image fusion result based on wavelet transformation.

The contour wavelet fusion submodule 522 is specifically configured to decompose each tracked person image by using an edge contour transformation function to obtain a source image, and decompose the source image to obtain a contour wavelet coefficient; comparing the high-frequency coefficient in the contour wavelet coefficient obtained by decomposition, and taking the maximum value of the high-frequency coefficient as the high-frequency coefficient of the fused image; calculating the mean value of low-frequency coefficients in the contour wavelet coefficients obtained by decomposition, and taking the mean value of the low-frequency coefficients as the low-frequency coefficients of the fused image; and forming the low-frequency coefficient and the high-frequency coefficient of the fused image into a coefficient of the fused image, and performing contour wavelet fusion inverse transformation on the coefficient of the fused image to obtain an image fusion result based on a contour wavelet fusion method.

The scale-invariant feature transformation image fusion submodule 523 is specifically configured to perform linear filtering on two images of a tracked person to obtain a contrast feature saliency map, a direction feature saliency map and a brightness feature saliency map of the tracked person, and solve an intersection of the contrast feature saliency map, the direction feature saliency map and the brightness feature saliency map to obtain a visual saliency region, a unique saliency region and a common saliency region; determining a fusion coefficient of the fusion image according to the low-frequency components of the visual salient region, the unique salient region and the common salient region; and performing multi-scale inverse transformation on the fusion coefficient by using a multi-scale fusion algorithm to reconstruct a fusion image.

The above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An image fusion quality detection method is characterized by comprising the following steps:

searching a tracked image frame from each frame of the video image;

respectively calculating average gradients according to the fusion results, and judging the image fusion quality according to the average gradients;

the average gradient of the fused image results after the three methods are fused is calculated by the following formula:

the larger the calculated average gradient value is, the more image layers are, the clearer the image is, and the fused image with the largest average gradient value is correspondingly used as the fused image with the optimal quality, namely the fused image corresponding to the fusion method is determined to have the optimal character tracking effect in the video image;

the method for fusing the images of the plurality of the tracked persons after the preprocessing is performed according to the scale-invariant feature transformation image fusion method specifically comprises the following substeps:

carrying out linear filtering on the two tracked person images to obtain contrast, direction and brightness characteristic saliency maps of the two tracked person images, and solving intersection of the contrast, direction and brightness characteristic saliency maps to obtain a visual saliency area, a unique saliency area and a public saliency area; the contrast characteristic saliency map is obtained by filtering a source image by using a Gaussian pyramid, then performing a layer-by-layer difference solving method on a filtering result to obtain contrast characteristic saliency point distribution, and applying an entropy threshold segmentation method to the characteristic saliency point distribution; the direction characteristic saliency map is specifically that a filter is utilized to filter a source image in multiple directions, filtering results are added to obtain direction characteristic point distribution of the source image, and then an entropy threshold segmentation method is applied to the direction characteristic point distribution to generate the direction characteristic saliency map; the brightness characteristic saliency map is specifically a brightness characteristic saliency map of a source image generated by smoothing the source image by using an average filter to eliminate noise and gray level abrupt change influence and then applying an entropy threshold segmentation method to the smoothed image;

determining a fusion coefficient of the fusion image according to the low-frequency components of the visual salient region, the unique salient region and the common salient region; specifically, for each point in a low-frequency component obtained by dividing two source images, if a unique salient region of a certain source image corresponding to the point is 1, determining that the fused image is a low-frequency coefficient corresponding to the source image, if the point corresponds to a public salient region, taking the mean value of the low-frequency coefficients of the two source images as the low-frequency coefficient of the fused image, if the point does not belong to any salient region, calculating the neighborhood variance of the two images, wherein the larger the variance is, the richer the source image belongs to the region of the point is, and taking the low-frequency coefficient of the source image corresponding to the point as the low-frequency coefficient of the fused image;

2. The method according to claim 1, wherein the image frame of the tracked object is searched from each frame of the video image, and the method comprises the following steps:

constructing a deep convolutional neural network model;

3. The image fusion quality detection method according to claim 1, wherein the image fusion is performed on the preprocessed images of the plurality of tracked persons according to a wavelet transform image fusion method, and the image fusion quality detection method specifically comprises the following sub-steps:

4. The image fusion quality detection method according to claim 1, wherein the image fusion is performed on the preprocessed images of the plurality of tracked persons according to a contour wavelet fusion method, and the image fusion quality detection method specifically comprises the following sub-steps:

5. An image fusion quality detection apparatus, comprising:

the fusion image quality detection module based on the average gradient is used for respectively calculating the average gradient according to the fusion result and judging the image fusion quality according to the average gradient;

wherein M and N are the total number of rows and columns of the image, f (M and N) is the fused image function, f (mi and nj) is the image point of the ith row and the jth column,

6. The image fusion quality detection apparatus of claim 5, wherein the tracked object image search module is specifically configured to construct a deep convolutional neural network model; from the input layer, sequentially passing through a first convolution layer, a first depth convolution layer, a second depth convolution layer, a third convolution layer and a third depth convolution layer; and inputting the output image into the global average pooling layer and the full connection layer to reach a softmax layer, outputting the occurrence probability of the tracked person by the softmax layer, and if the output probability is 1, determining the image frame as the image frame of the tracked person.

7. The image fusion quality detection apparatus of claim 5, wherein the image fusion module comprises a wavelet transform image fusion sub-module, and the wavelet transform image fusion sub-module is specifically configured to decompose each tracked image by a discrete wavelet transform function to obtain a source image; fusing wavelet coefficients corresponding to the source images based on a modulus maximum fusion algorithm to obtain fused images; and performing wavelet inverse transformation on the fused image to obtain an image fusion result based on wavelet transformation.

8. The image fusion quality detection device of claim 5, wherein the image fusion module comprises a contour wavelet fusion sub-module, and the contour wavelet fusion sub-module is specifically configured to decompose each tracked person image with an edge contour transformation function to obtain a source image, and decompose the source image to obtain a contour wavelet coefficient; comparing the high-frequency coefficient in the contour wavelet coefficient obtained by decomposition, and taking the maximum value of the high-frequency coefficient as the high-frequency coefficient of the fused image; calculating the mean value of low-frequency coefficients in the contour wavelet coefficients obtained by decomposition, and taking the mean value of the low-frequency coefficients as the low-frequency coefficients of the fused image; and forming the low-frequency coefficient and the high-frequency coefficient of the fused image into a coefficient of the fused image, and performing contour wavelet fusion inverse transformation on the coefficient of the fused image to obtain an image fusion result based on a contour wavelet fusion method.