CN111402306A

CN111402306A - Low-light-level/infrared image color fusion method and system based on deep learning

Info

Publication number: CN111402306A
Application number: CN202010175703.3A
Authority: CN
Inventors: 刘超; 胡清平; 姚远
Original assignee: Chinese People's Liberation Army 32801
Current assignee: Chinese People's Liberation Army 32801
Priority date: 2020-03-13
Filing date: 2020-03-13
Publication date: 2020-07-10

Abstract

The invention relates to a dim light/infrared image color fusion method and system based on deep learning, wherein the fusion method comprises the following steps: acquiring training data; preprocessing the training data; constructing a low-light/infrared image chromaticity prediction network; constructing a loss function; training the dim light/infrared image chromaticity prediction network based on the preprocessed training data and the loss function to obtain a trained dim light/infrared image chromaticity prediction network; extracting the chromaticity information of the target image by adopting the trained dim light/infrared image chromaticity prediction network; extracting brightness information of a target image; and finally synthesizing the color image based on the brightness information and the chrominance information. The method can realize the prediction of natural and stable colors and improve the observability of the color fusion image.

Description

Low-light-level/infrared image color fusion method and system based on deep learning

Technical Field

The invention relates to the field of night vision, in particular to a low-light/infrared image color fusion method and system based on deep learning.

Background

The night vision technology is a photoelectric technology for realizing night observation by means of a photoelectric imaging device, and the existing night vision equipment mainly comprises a low-light level night vision device and a thermal infrared imager. The low-light level night vision device detects reflected light of a target on moonlight and night sky light in a passive working mode, the obtained image is strong in layering sense, scene and target details are clear, the human eye observation habit is met, and the low-light level night vision device has the defects of poor contrast, limited gray level, large weather influence, easiness in interference of external environment light, limited detection distance and the like; the infrared thermal imager generates a scene thermal image according to the temperature and radiation difference between a target and a background, can work day and night, and has the advantages of high temperature resolution, good image contrast, large dynamic range, smoke penetration and haze penetration, and the like. In order to effectively solve the above problems of night vision equipment, color fusion of a low-light image and an infrared image has become an important development direction of night vision technology.

The existing dim light/infrared color fusion technology based on color transfer selects different modes such as ocean, desert, snowfield and the like for color transfer, but because the dim light/infrared color fusion technology only uses a small amount of simple statistical characteristics such as pixel neighborhood gray level mean value, standard deviation and the like as matching parameters, better pixel matching and color transfer effects are difficult to obtain, and the obtained dim light/infrared color fusion image still has the problems of unnatural colors, instability and the like.

Disclosure of Invention

The invention aims to provide a low-light/infrared image color fusion method and system based on deep learning, which can realize the prediction of natural and stable colors and improve the observability of a color fusion image.

In order to achieve the purpose, the invention provides the following scheme:

a low-light/infrared image color fusion method based on deep learning, the fusion method comprises the following steps:

acquiring training data;

preprocessing the training data;

constructing a low-light/infrared image chromaticity prediction network;

constructing a loss function;

training the dim light/infrared image chromaticity prediction network based on the preprocessed training data and the loss function to obtain a trained dim light/infrared image chromaticity prediction network;

extracting the chromaticity information of the target image by adopting the trained dim light/infrared image chromaticity prediction network;

extracting brightness information of a target image;

and finally synthesizing the color image based on the brightness information and the chrominance information.

Optionally, the preprocessing the training data specifically includes:

and continuously preprocessing the training data by adopting a multimode image calibration technology.

Optionally, the dim light/infrared image chromaticity prediction network includes: a feature extraction sub-network and a chroma prediction sub-network.

Optionally, the loss function specifically adopts the following formula:

where h denotes the height coordinate of the training image, w denotes the width coordinate of the training image, and v (Z)_h,w) Weight, z, representing sparseness of chrominance information in training set_h,wExpressing the chromaticity information of the maximum probability distribution of pixel points with coordinates of h and w, q expressing the quantization value of chromaticity space, z_h,w,qThe probability distribution that the coordinate is h and the chromaticity information of the w pixel point is q is represented,

the expression coordinate is h, and the chroma information of the w pixel point is the predicted value of the probability distribution of q.

Optionally, the network type of the feature extraction sub-network is an encoder-decoder network type, and includes an encoding network unit and a decoding network unit; the coding network unit adopts a conventional convolution unit and a hole convolution unit; the decoding network unit adopts a transposition convolution unit;

the chrominance extraction sub-network adopts a transposed convolution unit.

Optionally, the coding network units in the feature extraction sub-network specifically include a first conventional convolution unit, a second conventional convolution unit, a third conventional convolution unit, and a fourth hole convolution unit;

the decoding network unit in the feature extraction sub-network specifically includes: a fifth transposed convolution unit and a sixth transposed convolution unit;

the transposed convolution unit in the chrominance extraction sub-network specifically includes a seventh transposed convolution unit.

Optionally, the first conventional convolution unit Conv1 includes a first conventional convolution layer Conv1_1 and a second conventional convolution layer Conv1_2, where the Conv1_1 has a convolution kernel size of 2, a convolution kernel number of 64, a step size of 1, and an input channel of 3, and the Conv1_2 has a convolution kernel size of 3, a convolution kernel number of 64, a step size of 2, and an input channel of 64;

the second conventional convolution unit Conv2 has a convolution kernel size of 3, a convolution kernel number of 128, a step size of 2, and an input channel of 64;

the convolution kernel size of the third conventional convolution unit Conv3 is 2, the number of convolution kernels is 64, the step is 1, and the input channel is 3;

the fourth hole convolution unit DilaConv4 includes a first hole convolution layer DilaConv4_1 and a second hole convolution layer DilaConv4_2, the convolution kernel size of the first hole convolution layer DilaConv4_1 is 3, the number of convolution kernels is 256, the step is 1, and the input channel is 256, the convolution kernel size of the second hole convolution layer DilaConv4_2 is 3, the number of convolution kernels is 256, the step is 1, and the input channel is 256;

the size of a convolution kernel of the fifth transpose convolution unit TransConv5 is 3, the number of the convolution kernels is 128, the stride is 1, and the input channel is 128;

the size of a convolution kernel of the sixth transpose convolution unit TransConv6 is 3, the number of the convolution kernels is 64, the stride is 1, and the input channel is 256;

the seventh transpose convolution unit transcconv 7 has a convolution kernel size of 3, a number of convolution kernels of 313, a stride of 1, and an input channel of 128.

Optionally, the method further includes, after constructing the low light/infrared image chromaticity prediction network: and adopting BN (Batchnormalization) technology to normalize the output characteristics of each level in the dim light/infrared image chromaticity prediction network.

The invention additionally provides a dim light/infrared image color fusion system based on deep learning, which comprises:

the training data acquisition module is used for acquiring training data;

the preprocessing module is used for preprocessing the training data;

the prediction network construction module is used for constructing a low-light/infrared image chromaticity prediction network;

the loss function constructing module is used for constructing a loss function;

the training module is used for training the dim light/infrared image chromaticity prediction network based on the preprocessed training data and the loss function to obtain the trained dim light/infrared image chromaticity prediction network;

the chrominance information extraction module is used for extracting chrominance information of the target image by adopting the trained dim light/infrared image chrominance prediction network;

the brightness information extraction module is used for extracting the brightness information of the target image;

and the color image synthesis module is used for finally synthesizing a color image based on the brightness information and the chrominance information.

Optionally, the preprocessing module specifically includes continuously preprocessing the training data by using a multi-mode image calibration technique.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

according to the method, training data are obtained, and the training data are preprocessed; constructing a low-light/infrared image chromaticity prediction network; constructing a loss function; training the dim light/infrared image chromaticity prediction network based on the preprocessed training data and the loss function to obtain a trained dim light/infrared image chromaticity prediction network; the trained dim light/infrared image chromaticity prediction network is adopted to extract chromaticity information of a target image, luminance information of the target image is extracted, a color image is finally synthesized based on the luminance information and the chromaticity information, a large number of features in an original image and a training set image can be automatically obtained to serve as matching parameters, therefore, good pixel matching and color transfer effects can be obtained, color information is obtained by learning the mapping relation between a dim light image, an infrared image and a daytime color image, a color fusion image with natural and stable colors is obtained, the understanding of an observer to a scene can be improved, the identification degree of the target is improved, and the scene perception capability of the observer is effectively enhanced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a flow chart of a low-light/infrared image color fusion method based on deep learning according to an embodiment of the present invention;

FIG. 2 is a training set for low light/infrared color fusion according to an embodiment of the present invention;

FIG. 3 is a dim level/infrared color fusion test set according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating a chrominance prediction network according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a deep learning-based dim light/infrared image color fusion system according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Fig. 1 is a flowchart of a low-light/infrared image color fusion method based on deep learning according to an embodiment of the present invention, as shown in fig. 1, the method includes:

step 101: training data is acquired.

Step 102: and preprocessing the training data.

Step 103: and constructing a low-light/infrared image chromaticity prediction network.

Step 104: a loss function is constructed.

Step 105: and training the dim light/infrared image chromaticity prediction network based on the preprocessed training data and the loss function to obtain the trained dim light/infrared image chromaticity prediction network.

Step 106: and extracting the chromaticity information of the target image by adopting the trained glimmer/infrared image chromaticity prediction network.

Step 107: luminance information of the target image is extracted.

Step 108: and finally synthesizing the color image based on the brightness information and the chrominance information.

The following steps are described in detail:

step 101: training data is acquired.

The method comprises the steps of acquiring dim light, infrared and color images under different scenes in a certain area, acquiring the data set at the early dawn or at the dusk moment in order to ensure that the data set has good adaptability and quality, and acquiring the data set at least in a high acquisition quantity, wherein the dim light/infrared and color fusion original data under the same scene can be acquired through the part, and the part comprises a dim light image set L, an infrared image set I and a color image set C, which are specifically shown in figure 2.

Step 102: and preprocessing the training data.

The quality of data is important for an image processing method based on deep learning, acquired low-light/infrared color fusion original data is preprocessed, and due to the fact that in the acquisition process, the difference exists between the field angle and the optical axis position of low-light, infrared and color image acquisition equipment, the data of the low-light image set L, the infrared image set I and the color image set C need to be calibrated by adopting a multimode image calibration technology.

The method comprises the steps of performing registration operation on acquired low-light-level, infrared and color images in a pairwise registration mode, using the infrared images as a reference, mapping the color images and the low-light-level images to the reference of the infrared images to realize registration of the three types of images, and finally performing identical registration processing on all images in a data set.

The invention designs a micro-light/infrared image chrominance prediction network (CP-Net), the structure of which is shown in figure 4 (wherein each convolution schematic arrow represents one or more convolution operations), the whole network consists of a feature extraction sub-network and a chrominance prediction sub-network: the feature extraction sub-network adopts a coder-decoder style network, the coding network unit adopts conventional convolution and hollow convolution to extract the multi-scale spatial features of the dim light and infrared images layer by layer, and deep convolution is adopted to express the complex features with larger scale and higher level; the decoding network unit expands the coding layer through the transposition convolution, improves the spatial resolution, integrates the multi-scale characteristic information of each level through residual connection, and provides more accurate position information for the subsequent chroma prediction sub-network; the chroma prediction sub-network performs color information prediction (313 is the number of colors after color space quantization) next to the decoding network unit by using a conventional convolution with a convolution kernel number of 313. Since pooling layers can result in partial information loss, the entire network does not use pooling layers, and the reduction in spatial resolution is achieved by using conventional convolution with steps greater than 1. Due to the difference of the characteristic values of different scales, the output characteristics of each level are normalized by adopting the BN technology.

For ease of understanding, the conventional convolutional layer is denoted as Conv (f)_i,n_i,s_i,c_i) The void convolution layer is represented by Dilaconv (f)_i,n_i,s_i,c_i) The transposed convolutional layer is denoted as TransConv (f)_i,n_i,s_i,c_i) Wherein the variable f_i,n_i,s_i,c_iRespectively representing the size, number, stride and number of input channels of the convolution kernel (the number of input channels of the current layer and the number of convolution kernels of the previous layer represent the same quantity, i.e. c_i＝n_i-1I is the number of layers) the network input data spatial resolution is 128 × 128.

The whole network in the feature extraction sub-network adopts convolution with a size of 3 × 3, the coding network units in the feature extraction sub-network specifically comprise a first conventional convolution unit, a second conventional convolution unit, a third conventional convolution unit and a fourth hole convolution unit, the decoding network units in the feature extraction sub-network specifically comprise a fifth transposed convolution unit and a sixth transposed convolution unit, and the transposed convolution unit in the chrominance extraction sub-network specifically comprises a seventh transposed convolution unit.

The first conventional convolution unit is mainly used for extracting low-layer features of the multiband night vision image and increasing the receptive field by reducing the spatial resolution of output features, and specifically comprises two convolution layers, namely a first conventional convolution layer Conv1_1 and a second conventional convolution layer Conv1_2, wherein Conv1_1 is mainly used for extracting the number of the low-layer features, the number of the convolution layers is set to n1_ 1-64 by referring to most networks, and in order to fully utilize the low-layer feature information, the convolution step of each layer is set to s1_ 1-1; if the data set contains data in two wavebands, such as micro-light and infrared, c1_1 is 2, so that the first layer of the first conventional convolution unit can be represented as Conv1_1(2,64,1,3), the output characteristic size is unchanged, the convolutional neural network has good adaptability, can accept any dimension input without changing the structure, and generates any dimension output, so that CP-Net can be applied to searching for the mapping relationship between the multiband data sets, such as three-waveband fusion and the like.

Conv1_2 increases the field of view by increasing the step size to decrease the output signature size, so set s1_2 to 2, the number of convolution kernels is kept the same as the previous layer and set n1_2 to 64, then this layer can be denoted as Conv1_2(3,64,2,64), while the output signature space size is halved to 64 × 64. the 2 nd convolution unit uses a regular convolution with a step size of 2, so the output signature size continues to halve₂128, so the second convolution unit can be denoted as Conv2(3,128,2,64) with an output signature reduction of 32 × 32 the third convolution unit can be denoted as Conv3(3,256,2,128) with an output signature reduction of 16 × 16.

Hole convolution as a new convolution method can increase the receptive field of the output feature without reducing the spatial resolution, and the number of layers of hole convolution determines the maximum receptive field that can be finally obtained, so hole convolution with a spreading factor of 2 is used, which can be represented by DilaConv4_1(3,256,1,256), DilaConv4_2(3,256,1,256), and the output feature map size is still 16 ×. fifth and sixth convolution units are distributed symmetrically with the coding network unit, and the previous features are upsampled and aggregated by transposed convolution, increasing the spatial resolution of the feature map, and halving the number of feature channels. for synthesizing feature information of different scales, the first and second convolution units are connected to the sixth and fifth convolution units, respectively, so that the fifth and sixth convolution units are represented by Conv3, 35, 128, 54, 64, 128, 64, and 64 are configured as a main convolution table, and 3664, and the feature information is extracted.

Table 1 detailed configuration information of feature extraction subnetwork

The chroma prediction sub-network generates a color probability distribution for each pixel using a transposed convolution module with a feature number of 313 and increases the spatial resolution to reach the original input size, then this layer can be denoted as TransConv7(3,313,1,128).

And predicting the probability distribution of each color by adopting a softmax output layer, and converting into the original dimensionality of the image by a Rehsape layer. Let p_i(k) Representing the probability that the ith pixel belongs to the kth quantized value, the softmax output layer is defined as:

table 2 gives detailed configuration information of the chroma prediction sub-network. It can be seen that the core of the chroma prediction sub-network is to perform some kind of color prediction through each layer of features, and to realize the probability distribution calculation of colors through the Softmax layer.

Table 2 detailed configuration information of chroma prediction sub-network

Step 104: a loss function is constructed.

The color information itself is a multi-modal problem, i.e. many objects may have different colors, but the loss function MSE is not robust to the multi-modal natural color problem. Because if a target can take a set of different chroma values, the primary optimal solution for MSE will be the average of the set, which will result in grayish, unsaturated results in the colorization process; if a reasonably colored set is non-convex, the true color will be outside the set, giving an unreliable result. Therefore, the chromaticity problem is converted into a multi-classification problem to predict the color probability distribution of each pixel, a class balance concept is introduced in the training process, the lost weight is adjusted to emphasize colors with low occurrence frequency, and the designed network is ensured to have color diversity and authenticity.

The channel where the multi-band night vision image is located is X ∈ R^H×W×2(where two dimensions represent dim light, infrared, respectively), then the goal is to learn a mapping

(wherein, the belt

The parameter of (d) represents a predicted value, and the parameter of (d) does not represent a true value) to represent the corresponding chroma channel Y ∈ R^H×W×2。

The step of converting the chrominance prediction problem into a multi-classification problem is to first quantize the ab rendering space into a set of points of grid size 10, with a total of 313 pairs of Q over the entire rendered area^H×W×2To study

Probability distribution to possible colors

A mapping of (2).

Due to true value Y_h,wIs chroma information of an actual image, and its chroma value is not a quantized value, and is used for a predicted value

Comparing with the true value, and defining that Z is H^-1(Y) using a soft coding structure to convert the true color space into a vector Z, i.e. finding the distance Y in the output space (313 quantization values)_h,wThe nearest 5 neighborhoods, where the distance d (k) from each neighbor to the true value is calculated, k ═ (1,2,3, …,313), and each distance is weighted proportionally by the gaussian kernel of σ ═ 5, then the probability that the ith pixel belongs to the kth quantization value can be expressed as:

the probability distribution Z of all colors on an image can be obtained by the above formula [0,1 ═ c]^H×W×Q. The multi-class cross entropy loss function can be designed as follows:

where v (-) is a weighting term used to balance the loss function, which can be determined based on the sparsity of the color class. Using functions

Probability distribution of color

Mapping to color values

On the basis of a network model, hyper-parameters such as an optimization function, a learning rate, batch size, filter number and the like of a low-light/infrared image natural sense color fusion network are set, wherein the optimization function adopts an Adam algorithm, the learning rate is set to be l ═ 0.001, a network weight initialization adopts an MSRA method, bias is completely initialized to be 0, an activation function adopts an Re L U function, the batch size is set to be 16, a training period is set to be 80, finally, a Keras open source framework is adopted to carry out network training on an NVIDIA GTX850 video card, and the trained network can realize the prediction of chrominance information under the condition of inputting given micro-light and infrared images.

Step 107: luminance information of the target image is extracted.

The invention utilizes the luminance information of the NR L pseudo color fusion image as the source of luminance information for the final color fusion.

The method can realize the final natural color fusion image prediction based on the combination of the brightness information and the predicted chrominance information.

The method selects an image fusion data set [10,11] provided by Dutch human research institute (TNO Hunan Factors), wherein the data set comprises registered micro-light and infrared images and color images under corresponding scenes, the color images are not registered with the micro-light and infrared images under the corresponding scenes, firstly, the multi-mode image calibration technology is utilized to register the color images and the micro-light infrared images under the corresponding scenes, finally, 5 groups of images with similar scenes are selected as a training set, and a group of typical training images in the group of images comprise three parts (a), (b) and (c), and the micro-light images, the infrared images and the color images under the same scene are sequentially arranged from left to right.

In the training process, 5000 pairs of low-light, infrared and color image subareas with the size of 128 × 128 images are randomly extracted from a training sample set, data enhancement is carried out by adopting methods such as rotation and translation, then the low-light and infrared image block pairs are combined into a two-dimensional multi-channel image as network input, the chrominance information ab of the color image block is extracted as output, and network parameters are set according to the foregoing.

Finally, the trained model is tested on a test set to obtain a natural color fusion image, as shown in fig. 3, the natural color fusion image comprises three parts (a), (b) and (c), wherein the part (a) is a low-light-level image, the part (b) is an infrared image, and the part (c) is a color image, so that the color fusion image in the fig. 3 has the color characteristics of a reference image in the training set, a house is red, the sky is light blue, a road is gray, a tree grass land is green, the color is natural and rich, and the observation habit of human eyes is met.

Fig. 5 is a schematic structural diagram of a deep learning-based dim light/infrared image color fusion system according to an embodiment of the present invention, where the system shown in fig. 5 includes: a training data acquisition module 201, a preprocessing module 202, a prediction network construction module 203, a loss function construction module 204, a training module 205, a chrominance information extraction module 206, a luminance information extraction module 207, and a color image synthesis module 208.

The training data obtaining module 201 is configured to obtain training data.

The preprocessing module 202 is configured to preprocess the training data.

The prediction network construction module 203 is used for constructing a low light/infrared image chromaticity prediction network.

The loss function building block 204 is used to build a loss function.

The training module 205 is configured to train the dim light/infrared image chromaticity prediction network based on the preprocessed training data and the loss function, so as to obtain a trained dim light/infrared image chromaticity prediction network.

The chrominance information extraction module 206 is configured to extract chrominance information of the target image by using the trained dim light/infrared image chrominance prediction network;

the brightness information extraction module 207 is used for extracting the brightness information of the target image;

the color image synthesis module 208 is configured to finally synthesize a color image based on the luminance information and the chrominance information.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A dim light/infrared image color fusion method based on deep learning is characterized in that the fusion method comprises the following steps:

acquiring training data;

preprocessing the training data;

constructing a low-light/infrared image chromaticity prediction network;

constructing a loss function;

extracting brightness information of a target image;

2. The deep learning-based micro/infrared image color fusion method according to claim 1, wherein preprocessing the training data specifically comprises:

3. The deep learning-based micro light/infrared image color fusion method according to claim 1, wherein the micro light/infrared image chromaticity prediction network comprises: a feature extraction sub-network and a chroma prediction sub-network.

4. The deep learning-based micro/infrared image color fusion method according to claim 1, wherein the loss function specifically adopts the following formula:

h denotes the height coordinate of the training image, w denotes the width coordinate of the training image, v (Z)_h,w) Weight, z, representing sparseness of chrominance information in training set_h,wExpressing the chromaticity information of the maximum probability distribution of pixel points with coordinates of h and w, q expressing the quantization value of chromaticity space, z_h,w,qThe probability distribution that the coordinate is h and the chromaticity information of the w pixel point is q is represented,

5. The deep learning based micro-optic/infrared image color fusion method as claimed in claim 3, wherein the network type of the feature extraction sub-network is an encoder-decoder network type, comprising an encoding network unit and a decoding network unit; the coding network unit adopts a conventional convolution unit and a hole convolution unit; the decoding network unit adopts a transposition convolution unit;

the chrominance extraction sub-network adopts a transposed convolution unit.

6. The deep learning-based micro/infrared image color fusion method as claimed in claim 5, wherein the coding network units in the feature extraction sub-network specifically comprise a first conventional convolution unit, a second conventional convolution unit, a third conventional convolution unit and a fourth hole convolution unit;

7. The deep learning based micro/infrared image color fusion method according to claim 6, wherein the first conventional convolution unit Conv1 comprises a first conventional convolution layer Conv1_1 and a second conventional convolution layer Conv1_2, the Conv1_1 has a convolution kernel size of 2, a convolution kernel number of 64, a step size of 1, and an input channel of 3, the Conv1_2 has a convolution kernel size of 3, a convolution kernel number of 64, a step size of 2, and an input channel of 64;

8. The deep learning based micro light/infrared image color fusion method according to claim 1, further comprising, after constructing a micro light/infrared image chroma prediction network: and adopting BN technology to carry out normalization processing on the output characteristics of each level in the dim light/infrared image chromaticity prediction network.

9. A low-light/infrared image color fusion system based on deep learning, the system comprising:

the training data acquisition module is used for acquiring training data;

the preprocessing module is used for preprocessing the training data;

the loss function constructing module is used for constructing a loss function;

10. The deep learning based micro-optic/infrared image color fusion system of claim 9 wherein the pre-processing module specifically includes continuing pre-processing the training data using a multi-modal image calibration technique.