CN113920455A

CN113920455A - Night video coloring method based on deep neural network

Info

Publication number: CN113920455A
Application number: CN202111009898.5A
Authority: CN
Inventors: 李志颖; 戚自华; 陈唯彬; 陈力彦; 赵容; 黄斐然
Original assignee: Jinan University
Current assignee: Jinan University; University of Jinan
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2022-01-11

Abstract

The invention relates to a night video coloring method based on a deep neural network, which comprises the following steps: s1, establishing a target detection neural network model, inputting a video image to be processed, detecting a target instance by using a target detection algorithm and generating a cut target image; s2, establishing a coloring network, and performing instance coloring and full image coloring by constructing two end-to-end training backbone networks including an instance coloring network and a full image coloring network; constructing two coloring network corresponding levels, and performing end-to-end training on the full convolution neural network; and S3, establishing a fusion module, selectively fusing the features extracted from the example coloring network and the full image coloring network, and finally obtaining the colored night video image. The invention finally obtains the colored night video image by inputting the video image to be processed, processing the image by a target detection network, a full convolution network of the example image coloring and the full image coloring and a fusion module.

Description

Night video coloring method based on deep neural network

Technical Field

The invention relates to the technical field of image analysis of Computer Vision (Computer Vision), in particular to a night video coloring method based on a deep neural network.

Background

In recent years, with the continuous development of technologies in the field of computer vision, the target detection network model is also receiving more and more attention, and the technology of combining the target detection network model with the coloring network model also becomes one of the hot spots receiving attention, but most of the combinations are applied to coloring pictures, for example: for applications such as old photo restoration, there still exists a certain gap in video coloring technology, especially in the field of monitoring video coloring.

The automatic conversion of grayscale images into authentic color images is a subject of intense research in computer vision and graphics. However, predicting two missing channels from a given single-channel grayscale image is inherently an ill-posed problem. Furthermore, the shading task may be multi-modal, in that there are many possible options to shade an object, such as: an automobile may be white, black, red, etc. Thus, the coloring of images remains a challenging research problem.

At present, a plurality of night monitoring videos or black and white monitoring videos exist, the colors of the videos cannot be well presented, so that the targets cannot be well processed and become invalid data, and much inconvenience is brought to the technology in the aspect of computer vision. Moreover, the black and white video coloring has very high requirement on the prediction accuracy of itself.

Disclosure of Invention

In order to solve the technical problems in the prior art, the invention provides a night video coloring method based on a deep neural network.

The invention is realized by adopting the following technical scheme: a night video coloring method based on a deep neural network comprises the following steps:

s1, establishing a target detection neural network model, inputting a video image to be processed, detecting a target instance and generating a cut target image by using a target detection algorithm, extracting the target image, outputting the characteristics of the extracted target image into a full-connection network for classification and regression, sending the classification and regression result into the full convolution network, and recovering the category of each pixel from the characteristics of the target image;

s2, establishing a coloring network, and performing instance coloring and full image coloring by constructing two end-to-end training backbone networks including an instance coloring network and a full image coloring network; constructing two coloring network corresponding levels, and performing end-to-end training on the full convolution neural network;

s3, establishing a fusion module, performing three-layer convolution on the features extracted from the example coloring network and the full image coloring network to obtain a full image weight map and an example weight map, changing the example image features and the example weight map into the size of the full image features according to coordinates, and finally performing weighted fusion on the full image features and each group of example image features according to corresponding weight maps to obtain a colored night video image.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention realizes the acquisition of the colored night video image by inputting the video image to be processed, processing the image by a target detection network, a full convolution network of the example image coloring and the full image coloring and a fusion module.

2. The method can be used for coloring night monitoring videos in various scenes, has good coloring effect and high accuracy, can well restore the original color of the scenes, ensures that black and white monitoring videos are no longer invalid and meaningless training data, expands the scale of a data set, has good ductility and plasticity, and can be applied to various scenes and fields.

3. The invention realizes instance perception coloring without processing complex background noise interference.

4. The present invention uses the located objects as input, allowing the instance coloring web learning objects and representations to accurately color and avoid color and background confusion.

Drawings

FIG. 1 is an overall framework of the present invention;

FIG. 2 is a diagram of an object detection framework of the present invention;

FIG. 3 is a target detection subject network architecture of the present invention;

FIG. 4 is a diagram of a colored network framework of the present invention;

fig. 5 is a fusion module framework diagram of the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

Examples

As shown in fig. 1, the night video coloring method based on the deep neural network of the present embodiment mainly includes the following steps:

s1, establishing a target detection neural network model, inputting a video image to be processed, detecting a target instance and generating a cut target image by using a target detection algorithm, extracting the target image, outputting the extracted image characteristics, sending the output image characteristics into a full-connection network of 1024 neurons for classification and regression, sending the classification and regression results into a full convolution network, and recovering the category of each pixel from the abstract characteristics; the bottom layer of the pre-trained convolutional neural network detects low-level features including edges, corners and the like, and the higher layer detects higher-level features, such as: car, person, sky, etc.; after the forward propagation of the convolutional neural network, selecting high-level features by using an FPN method and transmitting the high-level features to a bottom layer to combine the features of each level with the high-level and low-level features; cutting out corresponding gray scale example images and color example images after detecting the boundary frame of each object, and converting the size of the cut images into 256x256 resolution;

s2, establishing a coloring network, and realizing the functions of example coloring and full image coloring by constructing two end-to-end training backbone networks, namely an example coloring network and a full image coloring network; constructing two coloring network corresponding levels, namely a full convolution neural network for end-to-end training;

s3, establishing a fusion module, performing three-layer convolution on the features extracted from the example coloring network and the full image coloring network to obtain a full image weight map and an example weight map, changing the example image features and the example weight map into the size of the full image features according to coordinates, and finally performing weighted fusion on the full image features and each group of example image features according to corresponding weight maps to finally obtain a colored night video image.

As shown in fig. 2, in this embodiment, the specific steps of establishing the target detection neural network model in step S1 are as follows:

s11, setting a main network layer, as shown in figure 3, using a training or predicted original picture as an input unit, carrying out multi-scale detection on the picture through three steps of bottom-to-top connection, top-to-bottom connection and transverse connection, and fusing the features of each level of the picture to enable the picture to have strong semantic information and strong spatial information at the same time so as to achieve a stronger feature learning effect;

and S12, setting a head network layer, dividing the feature map extracted and acquired in the step S11 into 7x7 grids, performing bilinear interpolation on each grid to obtain four points, performing maximum pooling after interpolation to obtain a final 7x7 feature map as input, and performing full convolution neural network processing to obtain a fixed size to complete Mask prediction.

In this embodiment, the specific processes of the bottom-up connection, the top-down connection, and the transverse connection in step S11 are as follows:

connecting from bottom to top, realizing the traditional characteristic extraction process, dividing the complete picture into 5 blocks according to the size of a characteristic diagram, and respectively defining the output of the last layer of 2, 3, 4 and 5 in the characteristic diagram as C2, C3, C4 and C5;

the upper sampling is carried out from the highest layer, and the effect of reducing the training parameters is realized by directly utilizing the nearest neighbor upper sampling;

transverse connection, namely fusing an up-sampling result in the top-down connection with a feature map with the same size generated by the bottom-up connection; and performing convolution operation and no-activation function operation on each layer of C2, C3, C4 and C5 by 1x1, reducing the number of channels, setting output channels as 256 channels, then performing addition operation on the 256 channels and the up-sampled feature map, and after fusion, processing the fused feature map by using a 3 x 3 convolution kernel to eliminate aliasing effect of up-sampling.

As shown in fig. 4, in this embodiment, the specific steps of establishing the coloring network in step S2 are as follows:

s21, establishing an example coloring network for coloring example images, cutting targets and coordinates obtained by using a pre-training detection network Mask RCNN as a target detector into example images, inputting the example images into the example coloring network, learning deep semantic information of the images through the convolutional neural network by establishing a convolutional neural network with a plurality of intermediate layers, including convolutional layers, pooling layers, ReLu layers and batch normalization processing, and finally obtaining predicted color example images;

s22, establishing a full image coloring network for coloring the complete image, wherein the structure of the full image coloring network is similar to that of an example coloring network, and the consistent network structure is adopted, so that the corresponding levels of the two subsequent coloring networks are convenient to fuse, then inputting the original gray image into the full image coloring network, extracting the full image characteristics after the multi-layer network structure is carried out, and finally obtaining the predicted complete image.

In this embodiment, the concrete process of the example image coloring and the complete image coloring is as follows: on the basis of a gray level map, a neural network is guided in a point input color information mode and is directly mapped into a full convolution neural network, so that a reasonable color result is quickly generated, and the full convolution neural network can be colored by fusing low-level clues and high-level semantic information after being learned from a large amount of data.

As shown in fig. 5, in this embodiment, the specific steps of establishing the fusion module in step S3 are as follows:

s31, performing feature extraction and convolution, taking the full image feature Fx and the example image feature Fxi after the whole image is colored as input, and performing three-layer convolution extraction to obtain a full image weight map WF and an example image weight map WIi;

s32, adjusting the example image feature image, and converting the example image feature Fxi and the example image weight map WIi into the size of the full image feature Fx through stretching according to specific coordinates in a zero padding mode;

s33, performing weighted fusion, and performing Softmax weighted fusion on the full image feature Fx and the group of example image features Fxi according to the corresponding weight map for each pixel, wherein the specific formula is as follows:

where N is the number of instances.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A night video coloring method based on a deep neural network is characterized by comprising the following steps:

2. The night video coloring method based on the deep neural network as claimed in claim 1, wherein the specific steps of establishing the target detection neural network model in step S1 are as follows:

s11, setting a main network layer, wherein the main network framework takes a trained or predicted original picture as an input unit, performs multi-scale detection on the picture through three steps of bottom-to-top connection, top-to-bottom connection and transverse connection, and fuses the characteristics of each level of the picture;

3. The night video coloring method based on the deep neural network as claimed in claim 2, wherein the detailed process of the three steps of bottom-up connection, top-down connection and horizontal connection in the step S11 is as follows:

connecting from bottom to top, dividing the complete picture into 5 blocks according to the size of the feature map, wherein the outputs of the last layers of 2, 3, 4 and 5 in the feature map are respectively defined as C2, C3, C4 and C5;

the up-sampling is carried out from the highest layer, and the up-sampling utilizes nearest neighbor up-sampling;

transverse connection, namely fusing an up-sampling result in the top-down connection with a feature map with the same size generated by the bottom-up connection; and performing convolution operation and no-activation function operation on each layer of C2, C3, C4 and C5 by 1x1, setting an output channel as 256 channels, then performing summation operation on the 256 channels and the up-sampled feature map, and processing the fused feature map by using 3 x 3 convolution kernel after fusion so as to eliminate the aliasing effect of up-sampling.

4. The night video coloring method based on the deep neural network as claimed in claim 1, wherein the specific process of detecting the target instance and generating the cropped target image is as follows: after the bounding box of each object is detected, the corresponding gray scale example image and color example image are cropped, and the cropped image is converted to 256 × 256 resolution.

5. The night video coloring method based on the deep neural network as claimed in claim 1, wherein the specific steps of establishing the coloring network in step S2 are as follows:

s21, establishing an example coloring network for example image coloring, acquiring a target and coordinates by using a pre-training detection network Mask RCNN as a target detector, cutting the target and the coordinates into an example image, inputting the example image into the example coloring network, establishing a convolutional neural network with a plurality of intermediate layers, including a convolutional layer, a pooling layer, a ReLu layer and batch normalization processing, learning deep semantic information of the image through the convolutional neural network, and finally obtaining a predicted color example image;

and S22, establishing a full image coloring network for coloring the complete image, inputting the original image into the full image coloring network, extracting the full image characteristics after the full image coloring network passes through a multi-layer network structure, and finally obtaining the predicted complete image.

6. The night video coloring method based on the deep neural network as claimed in claim 5, wherein the specific process of example image coloring and complete image coloring is as follows: on the basis of a gray level map, a neural network is guided in a point input color information mode and is directly mapped into a full convolution neural network, and the full convolution neural network is colored by fusing low-level clues and high-level semantic information after being learned from a large amount of data.

7. The night video coloring method based on the deep neural network as claimed in claim 1, wherein the specific steps of establishing the fusion module in step S3 are as follows:

where N is the number of instances.