CN111886625A

CN111886625A - Image fusion method, image acquisition equipment and movable platform

Info

Publication number: CN111886625A
Application number: CN201980009284.1A
Authority: CN
Inventors: 李辉; 曹子晟; 胡攀
Original assignee: SZ DJI Technology Co Ltd
Current assignee: SZ DJI Technology Co Ltd
Priority date: 2019-05-13
Filing date: 2019-05-13
Publication date: 2020-11-03
Also published as: WO2020227898A1

Abstract

An image fusion method, an image acquisition device and a movable platform are provided. An image fusion method, comprising: acquiring a feature vector of each frame of image with different shooting brightness in an image sequence; acquiring image features of the frames of images based on the feature vectors; acquiring the weight of each frame of image based on the image characteristics; and fusing the image sequence based on the weight of each frame of image to obtain a fused image with a high dynamic range effect. In the embodiment, the image features of each frame of image are obtained through the feature vectors of each frame of image, the image features of the image do not need to be manually specified by a user, and the objectivity and the accuracy of the obtained image features can be improved. And distributing weights for each frame of image based on the image characteristics and fusing images in the image sequence based on the weights, so that a fused image with a high dynamic range effect can be obtained.

Description

Image fusion method, image acquisition equipment and movable platform

Technical Field

The embodiment of the invention relates to the technical field of image processing, in particular to an image fusion method, image acquisition equipment and a movable platform.

Background

The dynamic range refers to the ratio of the highest brightness value to the lowest brightness value in a natural scene, and the dynamic range of the natural scene is far higher than the dynamic range which can be obtained by a general camera, so that the condition of dark exposure or overexposure often occurs, and the loss of detail and information occurs in the shot picture. Therefore, in the related art, multi-exposure image fusion is a method for generating a high-quality image with a high dynamic range effect by directly fusing a plurality of images with different exposure degrees, and is also called high dynamic range imaging in the following.

Currently, high dynamic range imaging generally employs two implementations: first, the camera response curve is computed over a sequence of images to produce a high dynamic range of luminance values, which are then displayed on a conventional display device by tone mapping. Second, the sequence of images is fused directly in the image domain, resulting in an image with a high dynamic range effect.

Taking the second implementation manner as an example, in the related art, when the images are fused, the weight may be calculated through image features such as contrast, color saturation, and exposure goodness, the laplacian pyramid of the original image is multiplied by the gaussian pyramid of the weight map, and the fused result is obtained by reconstructing the laplacian pyramid. However, in the related art, a second-order laplacian operator is adopted in the process of obtaining the weight by the operator, and the used image features are artificially defined, so that the representation capability is limited, and the effect of the fused image is limited.

Disclosure of Invention

The embodiment of the invention provides an image fusion method, image acquisition equipment and a movable platform.

In a first aspect, an embodiment of the present invention provides an image fusion method, including:

acquiring a feature vector of each frame of image with different shooting brightness in an image sequence;

acquiring image features of the frames of images based on the feature vectors;

acquiring the weight of each frame of image based on the image characteristics;

and fusing the image sequence based on the weight of each frame of image to obtain a fused image with a high dynamic range effect.

In a second aspect, an embodiment of the present invention provides an image capturing device, including a processor, an image sensor, and a memory; the image sensor is used for shooting images under different shooting brightness and storing the images into the memory to form an image sequence; the processor is configured to:

acquiring image features of the frames of images based on the feature vectors;

acquiring the weight of each frame of image based on the image characteristics;

In a third aspect, an embodiment of the present invention provides a movable platform, including the image capturing apparatus according to the second aspect.

In a fourth aspect, an embodiment of the present invention provides a machine-readable storage medium, on which computer instructions are stored, and when executed, the computer instructions implement the steps of the method of the first aspect.

According to the technical scheme, the characteristic vectors of the frame images with different shooting brightness in the image sequence are obtained, and then the image characteristics of the frame images are obtained based on the characteristic vectors of the frame images; and finally, fusing the image sequence based on the weight of each frame of image to obtain a fused image with a high dynamic range effect. Therefore, the image characteristics of each frame of image are obtained through the characteristic vectors of each frame of image, the image characteristics of the image do not need to be manually specified by a user, and the objectivity and the accuracy of the obtained image characteristics can be improved. Moreover, the weight is distributed to each frame image based on the image characteristics, and the images in the image sequence are fused based on the weight, so that the effective fused image with high dynamic range effect can be obtained.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

Fig. 1 is a schematic flowchart of an image fusion method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a process for obtaining feature vectors according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of acquiring image features according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart of another method for obtaining image features according to an embodiment of the present invention;

fig. 5 is a schematic flowchart illustrating a process of obtaining similar weight values according to an embodiment of the present invention;

fig. 6 is a schematic flowchart of another process for obtaining similar weight values according to an embodiment of the present invention;

FIG. 7 is a schematic flowchart of an embodiment of obtaining image weights;

FIG. 8 is a schematic flow chart of a fused image according to an embodiment of the present invention;

FIG. 9 is a schematic flow chart diagram of another image fusion method provided in the embodiment of the present invention;

FIG. 10 is a flowchart illustrating another image fusion method according to an embodiment of the present invention;

fig. 11 is a block diagram of an image capturing apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. In addition, the features in the embodiments and the examples described below may be combined with each other without conflict.

In order to solve the problem that the image fusion effect is limited due to the limited image feature characterization capability when the image features are defined and the weight operators are extracted manually in the related technology, the embodiment of the invention provides an image fusion method. Then, the image characteristics of each frame of image and the weight of each frame of image are determined based on the characteristic vector of the image, and the images of multiple frames under different shooting brightness are fused based on the weight, so that a fused image with a high dynamic range effect can be effectively obtained.

Fig. 1 is a schematic flow diagram of an image fusion method provided in an embodiment of the present invention, and referring to fig. 1, an image fusion method that can be applied to image capturing devices such as a camera, an intelligent device, and a tablet pc includes steps 101 to 104, where:

in step 101, feature vectors of respective frame images of different shooting luminances in an image sequence are acquired.

In this embodiment, after receiving a control operation of a user or triggering an image fusion condition, the image capturing device may capture a plurality of frames of images with different shooting luminances. For example, the image capture device may control the exposure time of the image sensor to achieve the effect of capturing images at different capture brightnesses. In some embodiments, the number of multi-frame images with different shooting brightness may be 3 frames or 5 frames, taking 3 frames as an example, 1 frame underexposed image, 1 frame normally exposed image and 1 frame overexposed image.

Then, the image acquisition device may store the acquired multi-frame image to a designated location, where the designated location may be a location such as a local storage, a memory, a cache, or a cloud, to form an image sequence.

Then, the image acquisition device may acquire the image sequence in a manner of directly acquiring or reading from a specified position, and acquire the feature vectors of each frame of image with different shooting brightness in the image sequence.

In an embodiment, referring to fig. 2, the image capturing device may obtain a preset feature recognition model (corresponding to step 201), where the feature recognition model may include a convolutional neural network model, and the convolutional neural network model may be implemented by using a related art network model. Of course, in the case where the feature vector of each frame image can be obtained, the corresponding deep learning model also falls into the scope of the present application. And the feature recognition model is trained by adopting enough samples, and the recognition accuracy reaches a preset accuracy threshold. Then, the image capturing apparatus may input each frame image to a feature recognition model, and a feature vector of each frame image is acquired by the feature recognition model (corresponding to step 202). The working principle of the feature recognition model may refer to the working principle of the selected deep learning model, and is not described herein again.

Taking the feature recognition model as a convolutional neural network model CNN (-) as an example, the feature vector of each frame of image in the image sequence is represented as follows:

F_i(x,y)＝CNN(I_i)(x,y) (1)；

in the formula, CNN (-) represents a preset convolution neural network model; each frame image I_iEach pixel (x, y) in (i ═ 1,2, 3.., K) corresponds to a feature vector F_i(x, y) with dimensions equal to the number of filters in the convolutional neural network model.

In one embodiment, the convolutional neural network model may include a classification network VGG-16. In practical application, considering that the convolutional neural network model requires more computing resources and occupies a longer time in the process of obtaining the feature vector, the convolutional neural network model in this embodiment adopts the first layer of the classification network VGG-16, so that the amount of computation can be reduced, the running time can be shortened, and the efficiency of subsequent image fusion can be improved.

In step 102, image features of the frame images are obtained based on the feature vectors.

In this embodiment, the image capturing device may obtain the image feature of each frame of image based on the feature vector of each frame of image. In an embodiment, the image features may include at least one of: vector strength of the feature vector, and similarity weight value of the feature vector. Of course, the skilled person may also select other image features according to the specific scene, and in the case of being able to prepare for expressing the characteristics of the corresponding image, the corresponding scheme falls within the scope of protection of the present application.

For example, taking the image feature as the vector strength of the feature vector as an example, referring to fig. 3, the image capturing apparatus in this embodiment may obtain the vector strength of the feature vector based on the feature vector (corresponding to step 301). In one embodiment, the vector strength may include the L1 norm value of the feature vector, expressed by the following formula:

V_i(x,y)＝||F_i(x,y)||₁(2)；

in the formula (2), V_i(x, y) represents the vector strength of the feature vector, | F_i(x,y)||₁Representing a calculated feature vector F_iL1 norm value of (x, y). Where the L1 norm refers to the sum of the absolute values of each element in the computed feature vector. In this embodiment, the larger the vector intensity of the feature vector is, the larger the weight assigned to the image corresponding to the feature vector in the subsequent process is.

For another example, taking the image features as the similar weight values of the feature vectors as an example, referring to fig. 4, in this embodiment, the image acquisition device may obtain the similar weight values of the feature vectors based on the feature vectors of each frame of image; the similarity weight value is an image feature of each frame of image (corresponding to step 401).

Referring to fig. 5, in an embodiment, the obtaining of the similar weight value by the image capturing device may include: the image capturing device may normalize the feature vector of each frame of image to obtain a normalized feature vector (corresponding to step 501). For example, similarity may include the L2 norm value of the feature vector of the normalized matter, expressed using the following formula:

in the formula (3), s_ij(x,y)²Representing similarity values of the two normalized feature vectors;

representing the normalized feature vector;

representing normalized feature vector of any two frames of images in computed image sequence

And

the L2 norm value of the difference. The L2 norm is the square sum re-evolution of the absolute value of each element in the calculation feature vector, and in the formula (3), the squares are increased on the left side and the right side of the equal sign, so that the calculation is convenient.

In this embodiment, the image acquisition device may detect whether a moving object exists in any two frames of images in the image sequence by calculating the similarity value, so as to avoid a ghost effect caused by the moving object, and thus the image fusion method provided in this embodiment is not only suitable for a static scene, but also suitable for a dynamic scene, such as a scene of moving photography, video recording, and the like. It can be understood that, in the embodiment, normalization is performed on the feature vectors of each frame of image, and then the similarity value of the two feature vectors is calculated, so that illumination change caused by different shooting brightness of the two frames of images can be eliminated or reduced, and the calculation accuracy can be improved.

Then, in this embodiment, the image capturing apparatus may obtain a similarity value between each frame of image and any other frame of image based on the normalized feature vector (corresponding to step 502). Thereafter, the image capturing apparatus may obtain a similarity weight value of each frame image based on the similarity value (corresponding to step 503).

Then, referring to fig. 6, the image capturing device may obtain the similarity weight value of each frame of image according to the similarity value, including: the image pickup apparatus may acquire a preset similarity value calculation model (corresponding to step 601). Wherein the similarity value calculation model may include a gaussian function. Then, the image capturing apparatus may input the similarity value to the similarity value calculation model, and calculate a similarity weight value of each frame image by the similarity value calculation model (corresponding to step 602).

Taking the example that the similarity value calculation model includes a gaussian function, i.e., a gaussian kernel, the following formula can be adopted by the image acquisition device to calculate the similarity weight value of each frame of image:

in the formula (4), the first and second groups,

representing the similarity value of the ith frame image and the jth frame image which utilize a Gaussian function to calculate the similarity value; s_i(x, y) represents calculating a similarity weight value of the ith frame image.

In step 103, the weight of each frame image is obtained based on the image features.

In this embodiment, referring to fig. 7, the image capturing device may perform normalization processing on the product of the image features of each frame of image to obtain the weight of each frame of image (corresponding to step 701). The weight of each frame image can be represented by the following formula:

in the formula (5), W_i(x, y) represents the weight of the ith frame image, V_i(x, y) represents the vector intensity of the ith frame image, S_i(x, y) represents the similarity weight value of the ith frame image, and alpha is a decimal set in advance, so that the denominator is avoided being zero.

In step 104, the image sequence is fused based on the weight of each frame image, and a fused image with a high dynamic range effect is obtained.

In this embodiment, referring to fig. 8, the image capturing device may obtain a pixel value and a weight of each pixel in each frame of image; the weight of each frame image includes the weight of each pixel (corresponding to step 801). Then, the image capturing apparatus may obtain one fused pixel value of each pixel based on the pixel value and the weight of each pixel, resulting in a fused image (corresponding to step 802). The fused image can be represented by the following formula:

in the formula (6), I_f(x, y) denotes a fused image, I_i(x, y) denotes the ith frame image, W_i(x, y) represents the weight of the ith frame image.

Therefore, in the embodiment, the image features of each frame of image are obtained through the feature vectors of each frame of image, and the user does not need to manually specify the image features of the image, so that the objectivity and the accuracy of the obtained image features can be improved. In addition, in this embodiment, weights are assigned to each frame of image based on image characteristics, and images in the image sequence are fused based on the weights, so that a fused image with an effective high dynamic range effect can be obtained. In addition, the image fusion method provided by the embodiment can be suitable for static scenes and dynamic scenes, and the application range is widened.

Fig. 9 is a schematic flowchart of an image fusion method provided in an embodiment of the present invention, fig. 10 is a schematic flowchart of another image fusion method provided in an embodiment of the present invention, and referring to fig. 9 and fig. 10, an image fusion method that can be applied to image capturing devices such as a camera, an intelligent device, and a tablet pc includes steps 901 to step 905, where:

in step 901, feature vectors of respective frame images of different shooting luminances in an image sequence are acquired.

The specific method and principle of step 901 and step 101 are the same, and please refer to fig. 1 and related contents of step 101 for detailed description, which is not described herein again.

In step 902, image features of the frames of images are obtained based on the feature vectors.

The specific method and principle of step 902 and step 102 are the same, please refer to fig. 1 and the related contents of step 102 for detailed description, which is not repeated herein.

In step 903, an exposure mask is applied to each frame image to obtain an image after exposure mask.

In consideration of the phenomenon that the captured image may have an excessively dark or bright area, in this embodiment, the image capturing apparatus further performs exposure masking on each frame of image, so that an image after exposure masking can be obtained. Among them, the exposure mask can be expressed by the following formula:

in the formula (7), M_iAnd (x, y) represents the value of each pixel in the image after the mask is exposed, beta represents a filtering threshold value, the filtering threshold value is used for representing the degree of removing the bad pixels, and the value range of the filtering threshold value can be 0.1-0.5.

In step 904, weights for the respective frame images are obtained based on the image features and the post-exposure mask image.

In this embodiment, the image acquisition device may perform normalization processing on a product of the image feature of each frame of image and the image after exposure masking, to obtain the weight of each frame of image. The weight of each frame image is expressed by the following formula:

in the formula (8), W_i(x, y) represents the weight of the ith frame image, V_i(x, y) represents the vector intensity of the ith frame image, S_i(x, y) represents the similarity weight value of the ith frame image, and alpha is a decimal set in advance, so that the denominator is avoided being zero.

In step 905, the image sequence is fused based on the weight of each frame of image, so as to obtain a fused image with a high dynamic range effect.

The specific method and principle of step 904 and step 104 are the same, and please refer to fig. 1 and the related contents of step 104 for detailed description, which is not repeated herein.

Thus, the image fusion method provided by the embodiment has the advantages of the image fusion method shown in fig. 1 to 8, and the image exposure mask is added, so that the weight of each frame of image determined by the image exposure mask and the image features is more accurate, which is beneficial to obtaining a more effective fused image with a high dynamic range effect.

Fig. 11 is a block diagram of an image capturing apparatus according to an embodiment of the present invention, referring to fig. 11, an image capturing apparatus 1100 includes a processor 1101, a memory 1102 storing instructions executable by the processor 1101, and an image sensor 1104; the image sensor 1104 is used for shooting images at different shooting brightness and storing the images into the memory 1102 to form an image sequence; the processor 1101 communicates with the memory 1102 through a communication bus 1103, for reading executable instructions from within the memory 1102 to implement:

acquiring image features of the frames of images based on the feature vectors;

acquiring the weight of each frame of image based on the image characteristics;

In some embodiments, the processor 1101 configured to obtain the feature vector of each frame image with different shooting brightness in the image sequence comprises:

acquiring a preset feature recognition model;

and inputting each image into the feature recognition model, and acquiring the feature vector of each frame of image by the feature recognition model.

In some embodiments, the feature recognition model is a convolutional neural network model.

In some embodiments, the feature vector of each frame image is expressed by the following formula:

F_i(x,y)＝CNN(I_i)(x,y)；

in the formula, CNN (-) represents a preset convolution neural network model; each frame image I_iEach pixel (x, y) in (i ═ 1,2, 3.., K) corresponds to a feature vector whose dimensions are equal to the number of filters in the convolutional neural network model.

In some embodiments, the convolutional neural network model is the first layer of the classification network VGG-16.

In some embodiments, the processor 1101 configured to obtain the image feature of each frame image based on the feature vector comprises:

acquiring the vector strength of the feature vector based on the feature vector; the vector intensity is the image feature of each frame image.

In some embodiments, the vector strength comprises the L1 norm value of the feature vector.

In some embodiments, the vector strength is expressed using the following formula:

V_i(x,y)＝||F_i(x,y)||₁；

in the formula, V_i(x, y) represents the vector strength of the feature vector, | F_i(x,y)||₁Representing a calculated feature vector F_iL1 norm value of (x, y).

In some embodiments, in the process of obtaining the weight of each frame image based on the image feature, the greater the vector strength of the feature vector, the greater the weight of the image corresponding to the feature vector.

acquiring a similar weight value of the feature vector based on the feature vector; the similar weight value is the image characteristic of each frame of image.

In some embodiments, the processor 1101 configured to obtain the similarity weight value of the feature vector based on the feature vector comprises:

normalizing the feature vector of each frame of image to obtain a normalized feature vector;

acquiring a similarity value of each frame of image and any other frame of image based on the normalized feature vector;

and acquiring the similarity weight value of each frame of image based on the similarity value.

In some embodiments, the similarity value comprises an L2 norm value of the normalized feature vector.

In some embodiments, the similarity value is expressed using the following formula:

in the formula, s_ij(x,y)²Representing similarity values of the two normalized feature vectors;

representing the normalized feature vector;

And

the L2 norm value of the difference.

In some embodiments, the processor 1101 configured to obtain the similarity weight value of each frame image based on the similarity value includes:

acquiring a preset similarity value calculation model;

and inputting the similarity value into a similarity value calculation model, and calculating the similarity weight value of each frame of image by the similarity value calculation model.

In some embodiments, the similarity value calculation model comprises a gaussian function.

in the formula, the first step is that,

In some embodiments, the processor 1101 configured to obtain the weight of each frame image based on the image feature comprises:

and carrying out normalization processing on the product of the image characteristics of each frame of image to obtain the weight of each frame of image.

In some embodiments, the weight of each frame image is expressed by the following formula:

in the formula, W_i(x, y) represents the weight of the ith frame image, V_i(x, y) represents the vector intensity of the ith frame image, S_i(x, y) represents the similarity weight value of the ith frame image, and alpha is a decimal set in advance, so that the denominator is avoided being zero.

In some embodiments, before the processor 1101 is configured to obtain the weights of the frame images based on the image features, it is further configured to:

carrying out exposure masking on each frame of image to obtain an image after exposure masking;

the acquiring the weight of each frame of image based on the image features comprises:

and acquiring the weight of each frame of image based on the image features and the image after exposure mask.

In some embodiments, the exposure mask is represented using the following formula:

in the formula, M_i(x, y) represents the value of each pixel in the image after exposure of the mask, and β represents a filtering threshold used to characterize the extent of removal of defective pixels.

In some embodiments, the filtering threshold value ranges from 0.1 to 0.5.

In some embodiments, the processor 1101 is configured to obtain the weight of each frame image based on the image feature and the post-exposure-mask image, and includes:

and carrying out normalization processing on the product of the image characteristics of each frame of image and the image after exposure masking to obtain the weight of each frame of image.

In some embodiments, the processor 1101 is configured to fuse the image sequence based on the weights of the frame images, and obtaining a fused image with high dynamic range effect includes:

acquiring the pixel value and the weight of each pixel in each frame of image; the weight of each frame image comprises the weight of each pixel;

and acquiring a fused pixel value of each pixel based on the pixel value and the weight of each pixel to obtain a fused image.

In some embodiments, the fused image is represented using the following formula:

in the formula, I_f(x, y) denotes a fused image, I_i(x, y) denotes the ith frame image, W_i(x, y) represents the weight of the ith frame image.

Therefore, in the embodiment, the image features of each frame of image are obtained through the feature vectors of each frame of image, and the user does not need to manually specify the image features of the image, so that the objectivity and the accuracy of the obtained image features can be improved. In addition, in this embodiment, weights are assigned to each frame of image based on image characteristics, and images in the image sequence are fused based on the weights, so that a fused image with an effective high dynamic range effect can be obtained. In addition, the image fusion method provided by the embodiment can be suitable for static scenes and dynamic scenes, and the application range is widened. In addition, due to the addition of the image exposure mask, the weight of each frame of image determined by the image exposure mask and the image features is more accurate, which is beneficial to obtaining a more effective fused image with a high dynamic range effect.

An embodiment of the present invention further provides a movable platform, including the image capturing apparatus as claimed in any one of claims 26 to 50.

The embodiment of the invention also provides a movable platform, which at least comprises a machine body, and the image acquisition equipment, the intelligent battery, the power system and the flight controller which are arranged on the machine body and are shown in fig. 11, wherein the intelligent battery can supply power for the power system, and the power system supplies power for the movable platform.

Embodiments of the present invention also provide a machine-readable storage medium, on which computer instructions are stored, and when executed, implement the steps of the method embodiments shown in fig. 1 to 10.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above detailed description of the detection apparatus and method provided by the embodiments of the present invention has been presented, and the present invention has been made by applying specific examples to explain the principle and the implementation of the present invention, and the above description of the embodiments is only used to help understanding the method and the core idea of the present invention; to sum up, the present disclosure should not be construed as limiting the invention, which will be described in the following description but will be modified within the scope of the invention by the spirit of the present disclosure.

Claims

1. An image fusion method, comprising:

acquiring image features of the frames of images based on the feature vectors;

acquiring the weight of each frame of image based on the image characteristics;

2. The image fusion method according to claim 1, wherein obtaining the feature vector of each frame image of different shooting brightness in the image sequence comprises:

acquiring a preset feature recognition model;

3. The image fusion method of claim 2, wherein the feature recognition model is a convolutional neural network model.

4. The image fusion method according to claim 3, wherein the feature vector of each frame image is expressed by the following formula:

F_i(x,y)＝CNN(I_i)(x,y)；

5. The image fusion method of claim 3 or 4, characterized in that the convolutional neural network model is the first layer of a classification network VGG-16.

6. The image fusion method according to claim 1, wherein obtaining the image features of the respective frame images based on the feature vectors comprises:

7. The image fusion method of claim 6, wherein the vector strength comprises an L1 norm value of the feature vector.

8. The image fusion method of claim 7, wherein the vector intensity is expressed by the following formula:

V_i(x,y)＝||F_i(x,y)||₁；

9. The image fusion method according to claim 7, wherein in the process of obtaining the weight of each frame of image based on the image features, the larger the vector intensity of the feature vector is, the larger the weight of the image corresponding to the feature vector is.

10. The image fusion method according to claim 1, wherein obtaining the image features of the respective frame images based on the feature vectors comprises:

11. The image fusion method of claim 10, wherein obtaining the similarity weight value of the feature vector based on the feature vector comprises:

12. The image fusion method of claim 11, wherein the similarity value comprises an L2 norm value of the normalized feature vector.

13. The image fusion method according to claim 12, wherein the similarity value is expressed by the following formula:

representing the normalized feature vector;

And

the L2 norm value of the difference.

14. The image fusion method according to claim 13, wherein obtaining the similarity weight value of each frame image based on the similarity value comprises:

acquiring a preset similarity value calculation model;

15. The image fusion method of claim 14, wherein the similarity value calculation model comprises a gaussian function.

16. The image fusion method according to claim 14 or 15, characterized in that the similarity value is expressed by the following formula:

in the formula, the first step is that,

17. The image fusion method according to claim 1, wherein obtaining the weight of each frame image based on the image features comprises:

18. The image fusion method according to claim 17, wherein the weight of each frame image is expressed by the following formula:

19. The image fusion method according to claim 1, wherein before the obtaining of the weight of each frame image based on the image feature, the method further comprises:

20. The image fusion method of claim 19, wherein the exposure mask is represented by the following formula:

21. The image fusion method of claim 20, wherein the filtering threshold value ranges from 0.1 to 0.5.

22. The image fusion method of claim 19, wherein obtaining the weight of each frame image based on the image feature and the post-exposure-mask image comprises:

23. The image fusion method according to claim 22, wherein the weight of each frame image is expressed by the following formula:

24. The image fusion method according to claim 1, wherein fusing the image sequence based on the weight of each frame image to obtain a fused image with a high dynamic range effect comprises:

25. The image fusion method of claim 24, wherein the fused image is represented by the following formula:

26. An image acquisition device comprising a processor, an image sensor and a memory; the image sensor is used for shooting images under different shooting brightness and storing the images into the memory to form an image sequence; the processor is configured to:

acquiring image features of the frames of images based on the feature vectors;

acquiring the weight of each frame of image based on the image characteristics;

27. The image capturing device of claim 26, wherein the processor obtains feature vectors for frames of images of different capture intensities in the sequence of images comprises:

acquiring a preset feature recognition model;

28. The image capturing device of claim 27, wherein the feature recognition model is a convolutional neural network model.

29. The apparatus according to claim 28, wherein the feature vector of each frame image is expressed by the following formula:

F_i(x,y)＝CNN(I_i)(x,y)；

30. The image acquisition device of claim 28 or 29, wherein the convolutional neural network model is a first layer of a classification network VGG-16.

31. The apparatus according to claim 26, wherein acquiring the image features of the respective frame images based on the feature vectors comprises:

32. The image-capturing device of claim 31, wherein the vector strength comprises an L1 norm value of the feature vector.

33. The image capturing device of claim 32, wherein the vector strength is expressed using the following formula:

V_i(x,y)＝||F_i(x,y)||₁；

34. The apparatus according to claim 32, wherein in the process of obtaining the weight of each frame image based on the image feature, the greater the vector strength of the feature vector, the greater the weight of the image corresponding to the feature vector.

35. The apparatus according to claim 26, wherein acquiring the image features of the respective frame images based on the feature vectors comprises:

36. The apparatus according to claim 35, wherein obtaining the similarity weight value of the feature vector based on the feature vector comprises:

37. The image-capturing device of claim 36, wherein the similarity value comprises an L2 norm value of the normalized feature vector.

38. The image capturing device of claim 37, wherein the similarity value is expressed by the following formula:

representing the normalized feature vector;

And

the L2 norm value of the difference.

39. The apparatus according to claim 38, wherein obtaining the similarity weight value of each frame image based on the similarity value comprises:

acquiring a preset similarity value calculation model;

40. The image capturing device of claim 39, wherein the similarity value calculation model includes a Gaussian function.

41. The image capturing device of claim 39 or 40, wherein the similarity value is expressed by the following formula:

in the formula, the first step is that,

representation utilizationCalculating the similarity value of the ith frame image and the jth frame image of the similarity value by using a Gaussian function; s_i(x, y) represents calculating a similarity weight value of the ith frame image.

42. The apparatus according to claim 26, wherein obtaining the weight of each frame image based on the image feature comprises:

43. The image capturing device as claimed in claim 42, wherein the weight of each frame image is expressed by the following formula:

44. The apparatus according to claim 26, wherein before the obtaining of the weight of each frame image based on the image feature, the method further comprises:

45. The image capturing device of claim 44, wherein the exposure mask is expressed by the following formula:

46. The image acquisition device of claim 45, wherein the filtering threshold value ranges from 0.1 to 0.5.

47. The image capturing device of claim 44, wherein obtaining the weight for each frame of image based on the image feature and the post-exposure-mask image comprises:

48. The image capturing device as claimed in claim 47, wherein the weight of each frame image is expressed by the following formula:

49. The image capturing device according to claim 26, wherein fusing the image sequence based on the weight of each frame image to obtain a fused image with high dynamic range effect comprises:

50. The image capturing device of claim 49, wherein the fused image is represented by the following formula:

51. A movable platform comprising an image capture device according to any one of claims 26 to 50.

52. A machine-readable storage medium having stored thereon computer instructions which, when executed, implement the steps of the method of any one of claims 1 to 25.