CN113920455A - Night video coloring method based on deep neural network - Google Patents

Night video coloring method based on deep neural network Download PDF

Info

Publication number
CN113920455A
CN113920455A CN202111009898.5A CN202111009898A CN113920455A CN 113920455 A CN113920455 A CN 113920455A CN 202111009898 A CN202111009898 A CN 202111009898A CN 113920455 A CN113920455 A CN 113920455A
Authority
CN
China
Prior art keywords
image
coloring
network
full
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111009898.5A
Other languages
Chinese (zh)
Inventor
李志颖
戚自华
陈唯彬
陈力彦
赵容
黄斐然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan University
University of Jinan
Original Assignee
Jinan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan University filed Critical Jinan University
Priority to CN202111009898.5A priority Critical patent/CN113920455A/en
Publication of CN113920455A publication Critical patent/CN113920455A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a night video coloring method based on a deep neural network, which comprises the following steps: s1, establishing a target detection neural network model, inputting a video image to be processed, detecting a target instance by using a target detection algorithm and generating a cut target image; s2, establishing a coloring network, and performing instance coloring and full image coloring by constructing two end-to-end training backbone networks including an instance coloring network and a full image coloring network; constructing two coloring network corresponding levels, and performing end-to-end training on the full convolution neural network; and S3, establishing a fusion module, selectively fusing the features extracted from the example coloring network and the full image coloring network, and finally obtaining the colored night video image. The invention finally obtains the colored night video image by inputting the video image to be processed, processing the image by a target detection network, a full convolution network of the example image coloring and the full image coloring and a fusion module.

Description

Night video coloring method based on deep neural network
Technical Field
The invention relates to the technical field of image analysis of Computer Vision (Computer Vision), in particular to a night video coloring method based on a deep neural network.
Background
In recent years, with the continuous development of technologies in the field of computer vision, the target detection network model is also receiving more and more attention, and the technology of combining the target detection network model with the coloring network model also becomes one of the hot spots receiving attention, but most of the combinations are applied to coloring pictures, for example: for applications such as old photo restoration, there still exists a certain gap in video coloring technology, especially in the field of monitoring video coloring.
The automatic conversion of grayscale images into authentic color images is a subject of intense research in computer vision and graphics. However, predicting two missing channels from a given single-channel grayscale image is inherently an ill-posed problem. Furthermore, the shading task may be multi-modal, in that there are many possible options to shade an object, such as: an automobile may be white, black, red, etc. Thus, the coloring of images remains a challenging research problem.
At present, a plurality of night monitoring videos or black and white monitoring videos exist, the colors of the videos cannot be well presented, so that the targets cannot be well processed and become invalid data, and much inconvenience is brought to the technology in the aspect of computer vision. Moreover, the black and white video coloring has very high requirement on the prediction accuracy of itself.
Disclosure of Invention
In order to solve the technical problems in the prior art, the invention provides a night video coloring method based on a deep neural network.
The invention is realized by adopting the following technical scheme: a night video coloring method based on a deep neural network comprises the following steps:
s1, establishing a target detection neural network model, inputting a video image to be processed, detecting a target instance and generating a cut target image by using a target detection algorithm, extracting the target image, outputting the characteristics of the extracted target image into a full-connection network for classification and regression, sending the classification and regression result into the full convolution network, and recovering the category of each pixel from the characteristics of the target image;
s2, establishing a coloring network, and performing instance coloring and full image coloring by constructing two end-to-end training backbone networks including an instance coloring network and a full image coloring network; constructing two coloring network corresponding levels, and performing end-to-end training on the full convolution neural network;
s3, establishing a fusion module, performing three-layer convolution on the features extracted from the example coloring network and the full image coloring network to obtain a full image weight map and an example weight map, changing the example image features and the example weight map into the size of the full image features according to coordinates, and finally performing weighted fusion on the full image features and each group of example image features according to corresponding weight maps to obtain a colored night video image.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention realizes the acquisition of the colored night video image by inputting the video image to be processed, processing the image by a target detection network, a full convolution network of the example image coloring and the full image coloring and a fusion module.
2. The method can be used for coloring night monitoring videos in various scenes, has good coloring effect and high accuracy, can well restore the original color of the scenes, ensures that black and white monitoring videos are no longer invalid and meaningless training data, expands the scale of a data set, has good ductility and plasticity, and can be applied to various scenes and fields.
3. The invention realizes instance perception coloring without processing complex background noise interference.
4. The present invention uses the located objects as input, allowing the instance coloring web learning objects and representations to accurately color and avoid color and background confusion.
Drawings
FIG. 1 is an overall framework of the present invention;
FIG. 2 is a diagram of an object detection framework of the present invention;
FIG. 3 is a target detection subject network architecture of the present invention;
FIG. 4 is a diagram of a colored network framework of the present invention;
fig. 5 is a fusion module framework diagram of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Examples
As shown in fig. 1, the night video coloring method based on the deep neural network of the present embodiment mainly includes the following steps:
s1, establishing a target detection neural network model, inputting a video image to be processed, detecting a target instance and generating a cut target image by using a target detection algorithm, extracting the target image, outputting the extracted image characteristics, sending the output image characteristics into a full-connection network of 1024 neurons for classification and regression, sending the classification and regression results into a full convolution network, and recovering the category of each pixel from the abstract characteristics; the bottom layer of the pre-trained convolutional neural network detects low-level features including edges, corners and the like, and the higher layer detects higher-level features, such as: car, person, sky, etc.; after the forward propagation of the convolutional neural network, selecting high-level features by using an FPN method and transmitting the high-level features to a bottom layer to combine the features of each level with the high-level and low-level features; cutting out corresponding gray scale example images and color example images after detecting the boundary frame of each object, and converting the size of the cut images into 256x256 resolution;
s2, establishing a coloring network, and realizing the functions of example coloring and full image coloring by constructing two end-to-end training backbone networks, namely an example coloring network and a full image coloring network; constructing two coloring network corresponding levels, namely a full convolution neural network for end-to-end training;
s3, establishing a fusion module, performing three-layer convolution on the features extracted from the example coloring network and the full image coloring network to obtain a full image weight map and an example weight map, changing the example image features and the example weight map into the size of the full image features according to coordinates, and finally performing weighted fusion on the full image features and each group of example image features according to corresponding weight maps to finally obtain a colored night video image.
As shown in fig. 2, in this embodiment, the specific steps of establishing the target detection neural network model in step S1 are as follows:
s11, setting a main network layer, as shown in figure 3, using a training or predicted original picture as an input unit, carrying out multi-scale detection on the picture through three steps of bottom-to-top connection, top-to-bottom connection and transverse connection, and fusing the features of each level of the picture to enable the picture to have strong semantic information and strong spatial information at the same time so as to achieve a stronger feature learning effect;
and S12, setting a head network layer, dividing the feature map extracted and acquired in the step S11 into 7x7 grids, performing bilinear interpolation on each grid to obtain four points, performing maximum pooling after interpolation to obtain a final 7x7 feature map as input, and performing full convolution neural network processing to obtain a fixed size to complete Mask prediction.
In this embodiment, the specific processes of the bottom-up connection, the top-down connection, and the transverse connection in step S11 are as follows:
connecting from bottom to top, realizing the traditional characteristic extraction process, dividing the complete picture into 5 blocks according to the size of a characteristic diagram, and respectively defining the output of the last layer of 2, 3, 4 and 5 in the characteristic diagram as C2, C3, C4 and C5;
the upper sampling is carried out from the highest layer, and the effect of reducing the training parameters is realized by directly utilizing the nearest neighbor upper sampling;
transverse connection, namely fusing an up-sampling result in the top-down connection with a feature map with the same size generated by the bottom-up connection; and performing convolution operation and no-activation function operation on each layer of C2, C3, C4 and C5 by 1x1, reducing the number of channels, setting output channels as 256 channels, then performing addition operation on the 256 channels and the up-sampled feature map, and after fusion, processing the fused feature map by using a 3 x 3 convolution kernel to eliminate aliasing effect of up-sampling.
As shown in fig. 4, in this embodiment, the specific steps of establishing the coloring network in step S2 are as follows:
s21, establishing an example coloring network for coloring example images, cutting targets and coordinates obtained by using a pre-training detection network Mask RCNN as a target detector into example images, inputting the example images into the example coloring network, learning deep semantic information of the images through the convolutional neural network by establishing a convolutional neural network with a plurality of intermediate layers, including convolutional layers, pooling layers, ReLu layers and batch normalization processing, and finally obtaining predicted color example images;
s22, establishing a full image coloring network for coloring the complete image, wherein the structure of the full image coloring network is similar to that of an example coloring network, and the consistent network structure is adopted, so that the corresponding levels of the two subsequent coloring networks are convenient to fuse, then inputting the original gray image into the full image coloring network, extracting the full image characteristics after the multi-layer network structure is carried out, and finally obtaining the predicted complete image.
In this embodiment, the concrete process of the example image coloring and the complete image coloring is as follows: on the basis of a gray level map, a neural network is guided in a point input color information mode and is directly mapped into a full convolution neural network, so that a reasonable color result is quickly generated, and the full convolution neural network can be colored by fusing low-level clues and high-level semantic information after being learned from a large amount of data.
As shown in fig. 5, in this embodiment, the specific steps of establishing the fusion module in step S3 are as follows:
s31, performing feature extraction and convolution, taking the full image feature Fx and the example image feature Fxi after the whole image is colored as input, and performing three-layer convolution extraction to obtain a full image weight map WF and an example image weight map WIi;
s32, adjusting the example image feature image, and converting the example image feature Fxi and the example image weight map WIi into the size of the full image feature Fx through stretching according to specific coordinates in a zero padding mode;
s33, performing weighted fusion, and performing Softmax weighted fusion on the full image feature Fx and the group of example image features Fxi according to the corresponding weight map for each pixel, wherein the specific formula is as follows:
Figure BDA0003238204710000041
where N is the number of instances.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (7)

1. A night video coloring method based on a deep neural network is characterized by comprising the following steps:
s1, establishing a target detection neural network model, inputting a video image to be processed, detecting a target instance and generating a cut target image by using a target detection algorithm, extracting the target image, outputting the characteristics of the extracted target image into a full-connection network for classification and regression, sending the classification and regression result into the full convolution network, and recovering the category of each pixel from the characteristics of the target image;
s2, establishing a coloring network, and performing instance coloring and full image coloring by constructing two end-to-end training backbone networks including an instance coloring network and a full image coloring network; constructing two coloring network corresponding levels, and performing end-to-end training on the full convolution neural network;
s3, establishing a fusion module, performing three-layer convolution on the features extracted from the example coloring network and the full image coloring network to obtain a full image weight map and an example weight map, changing the example image features and the example weight map into the size of the full image features according to coordinates, and finally performing weighted fusion on the full image features and each group of example image features according to corresponding weight maps to obtain a colored night video image.
2. The night video coloring method based on the deep neural network as claimed in claim 1, wherein the specific steps of establishing the target detection neural network model in step S1 are as follows:
s11, setting a main network layer, wherein the main network framework takes a trained or predicted original picture as an input unit, performs multi-scale detection on the picture through three steps of bottom-to-top connection, top-to-bottom connection and transverse connection, and fuses the characteristics of each level of the picture;
and S12, setting a head network layer, dividing the feature map extracted and acquired in the step S11 into 7x7 grids, performing bilinear interpolation on each grid to obtain four points, performing maximum pooling after interpolation to obtain a final 7x7 feature map as input, and performing full convolution neural network processing to obtain a fixed size to complete Mask prediction.
3. The night video coloring method based on the deep neural network as claimed in claim 2, wherein the detailed process of the three steps of bottom-up connection, top-down connection and horizontal connection in the step S11 is as follows:
connecting from bottom to top, dividing the complete picture into 5 blocks according to the size of the feature map, wherein the outputs of the last layers of 2, 3, 4 and 5 in the feature map are respectively defined as C2, C3, C4 and C5;
the up-sampling is carried out from the highest layer, and the up-sampling utilizes nearest neighbor up-sampling;
transverse connection, namely fusing an up-sampling result in the top-down connection with a feature map with the same size generated by the bottom-up connection; and performing convolution operation and no-activation function operation on each layer of C2, C3, C4 and C5 by 1x1, setting an output channel as 256 channels, then performing summation operation on the 256 channels and the up-sampled feature map, and processing the fused feature map by using 3 x 3 convolution kernel after fusion so as to eliminate the aliasing effect of up-sampling.
4. The night video coloring method based on the deep neural network as claimed in claim 1, wherein the specific process of detecting the target instance and generating the cropped target image is as follows: after the bounding box of each object is detected, the corresponding gray scale example image and color example image are cropped, and the cropped image is converted to 256 × 256 resolution.
5. The night video coloring method based on the deep neural network as claimed in claim 1, wherein the specific steps of establishing the coloring network in step S2 are as follows:
s21, establishing an example coloring network for example image coloring, acquiring a target and coordinates by using a pre-training detection network Mask RCNN as a target detector, cutting the target and the coordinates into an example image, inputting the example image into the example coloring network, establishing a convolutional neural network with a plurality of intermediate layers, including a convolutional layer, a pooling layer, a ReLu layer and batch normalization processing, learning deep semantic information of the image through the convolutional neural network, and finally obtaining a predicted color example image;
and S22, establishing a full image coloring network for coloring the complete image, inputting the original image into the full image coloring network, extracting the full image characteristics after the full image coloring network passes through a multi-layer network structure, and finally obtaining the predicted complete image.
6. The night video coloring method based on the deep neural network as claimed in claim 5, wherein the specific process of example image coloring and complete image coloring is as follows: on the basis of a gray level map, a neural network is guided in a point input color information mode and is directly mapped into a full convolution neural network, and the full convolution neural network is colored by fusing low-level clues and high-level semantic information after being learned from a large amount of data.
7. The night video coloring method based on the deep neural network as claimed in claim 1, wherein the specific steps of establishing the fusion module in step S3 are as follows:
s31, performing feature extraction and convolution, taking the full image feature Fx and the example image feature Fxi after the whole image is colored as input, and performing three-layer convolution extraction to obtain a full image weight map WF and an example image weight map WIi;
s32, adjusting the example image feature image, and converting the example image feature Fxi and the example image weight map WIi into the size of the full image feature Fx through stretching according to specific coordinates in a zero padding mode;
s33, performing weighted fusion, and performing Softmax weighted fusion on the full image feature Fx and the group of example image features Fxi according to the corresponding weight map for each pixel, wherein the specific formula is as follows:
Figure FDA0003238204700000021
where N is the number of instances.
CN202111009898.5A 2021-08-31 2021-08-31 Night video coloring method based on deep neural network Pending CN113920455A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111009898.5A CN113920455A (en) 2021-08-31 2021-08-31 Night video coloring method based on deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111009898.5A CN113920455A (en) 2021-08-31 2021-08-31 Night video coloring method based on deep neural network

Publications (1)

Publication Number Publication Date
CN113920455A true CN113920455A (en) 2022-01-11

Family

ID=79233524

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111009898.5A Pending CN113920455A (en) 2021-08-31 2021-08-31 Night video coloring method based on deep neural network

Country Status (1)

Country Link
CN (1) CN113920455A (en)

Similar Documents

Publication Publication Date Title
CN110956094B (en) RGB-D multi-mode fusion personnel detection method based on asymmetric double-flow network
CN112163449B (en) Lightweight multi-branch feature cross-layer fusion image semantic segmentation method
CN110929593B (en) Real-time significance pedestrian detection method based on detail discrimination
JP4708909B2 (en) Method, apparatus and program for detecting object of digital image
CN112733950A (en) Power equipment fault diagnosis method based on combination of image fusion and target detection
US20200279166A1 (en) Information processing device
CN114155527A (en) Scene text recognition method and device
CN111582074A (en) Monitoring video leaf occlusion detection method based on scene depth information perception
CN115131797B (en) Scene text detection method based on feature enhancement pyramid network
JP7490359B2 (en) Information processing device, information processing method, and program
CN112801027A (en) Vehicle target detection method based on event camera
CN116342894B (en) GIS infrared feature recognition system and method based on improved YOLOv5
CN111079864A (en) Short video classification method and system based on optimized video key frame extraction
CN116895098A (en) Video human body action recognition system and method based on deep learning and privacy protection
CN116645592A (en) Crack detection method based on image processing and storage medium
CN114627269A (en) Virtual reality security protection monitoring platform based on degree of depth learning target detection
CN116030361A (en) CIM-T architecture-based high-resolution image change detection method
CN111881915A (en) Satellite video target intelligent detection method based on multiple prior information constraints
CN115019340A (en) Night pedestrian detection algorithm based on deep learning
US11481919B2 (en) Information processing device
CN114399734A (en) Forest fire early warning method based on visual information
CN112801195A (en) Deep learning-based fog visibility prediction method, storage device and server
CN114549391A (en) Circuit board surface defect detection method based on polarization prior
CN111476226A (en) Text positioning method and device and model training method
CN115100409B (en) Video portrait segmentation algorithm based on twin network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination