CN113077385A

CN113077385A - Video super-resolution method and system based on countermeasure generation network and edge enhancement

Info

Publication number: CN113077385A
Application number: CN202110340664.2A
Authority: CN
Inventors: 滕国伟; 王嘉璐
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2021-03-30
Filing date: 2021-03-30
Publication date: 2021-07-06

Abstract

The invention provides a video super-resolution method based on a confrontation generation network and edge enhancement, which comprises the following steps: step S1: independently establishing a data set based on a countermeasure generation network, and acquiring high-resolution continuous frames and corresponding low-resolution continuous frames; step S2: converting the RGB color space of the high-resolution continuous frames and the corresponding low-resolution continuous frames into HSV color space; step S3: establishing a generation network to obtain super-resolution continuous frames; step S4: performing authenticity identification on the super-resolution continuous frames; step S5: if the discrimination result is true, outputting a super-resolution continuous frame; and if the discrimination result is false, regenerating the super-resolution continuous frame for true-false discrimination. The invention directly adopts the original continuous frames of the old film and television as the input of the network, and the corresponding high-definition repair version continuous frames are used as the target of the network. Thus, the intermediate degradation process is omitted, and the actual situation of the natural image is fitted.

Description

Video super-resolution method and system based on countermeasure generation network and edge enhancement

Technical Field

The invention relates to the technical field of video image processing, in particular to a video super-resolution method and a video super-resolution system based on a countermeasure generation network and edge enhancement.

Background

Super resolution aims at reconstructing High Resolution (HR) images or video from a Low Resolution (LR) version, which is a classical problem in computer vision. It not only pursues the enlargement of physical size but also restores high frequency details to ensure clarity. Classical algorithms have existed for decades and can be classified into the following categories: patch, edge, sparse coding, prediction and statistics based methods. These methods are less computationally expensive than the deep learning methods, but their recovery performance is also very limited. With the popularization of deep learning, convolutional neural networks have been widely used, and have led to a leap in super-resolution.

The field can be divided into two parts, single image super-resolution (SISR) and video super-resolution (VSR). The former exploits spatial correlation in a single frame, while the latter uses inter-frame temporal correlation in addition. The resolution for movies and television works around 2000 was low due to the previous shooting conditions and projection equipment. Although the display of image quality is not satisfactory, a large number of excellent movie works are appearing. The perceived needs of today's people are not met when directly copying the previous shot directly into existing devices. Therefore, the super-resolution of the old movies does have certain market demand. However, since temporal correlation is crucial for video super resolution, information from adjacent low resolution frames is often combined. However, some video reconstruction results are still unsatisfactory.

Through retrieval, patent document CN105931189B discloses a video super-resolution method and device based on an improved super-resolution parameterized model, which uses the improved super-resolution parameterized model as a theoretical guidance of the video super-resolution method, and uses a common mark matrix to exclude error reference information introduced by non-common areas corresponding to occlusion and boundary overflow, so that the parameterized model can better describe various actual videos; and the stable implementation of the video super-resolution is ensured by combining a method of jointly estimating a plurality of unknown parameter parameters. Although the prior art solves the defect of non-public content between videos, the technical problem of edge enhancement cannot be solved by improving parameters and not combining spatial correlation and temporal correlation between frames.

Patent document CN111260560B discloses a multi-frame video super-resolution method integrating attention mechanism, which includes acquiring video data and training the video data by using video enhancement technology to generate a training set and a test set; connecting the deformed convolution feature alignment module and the feature reconstruction module to form a multi-frame super-resolution network, and training the multi-frame super-resolution network by adopting a training set; adding a 3D convolution characteristic alignment module into a multi-frame super-resolution network for training; adding the feature fusion module into a multi-frame super-resolution network for training; training a multi-frame super-resolution network by adopting a training set; fine-tuning the multi-frame super-resolution network by adopting a training set to generate a multi-frame super-resolution model; and testing the multi-frame super-resolution model by adopting a test set. The prior art mainly improves super-resolution by analyzing big data, focuses on attention mechanism, does not combine spatial correlation and temporal correlation between frames, and cannot solve the technical problem of edge enhancement.

At present, the prior art has two major problems for the video reconstruction result. First, the super-resolution data set is obtained by down-sampling the high-resolution data HR to obtain low-resolution data LR, and then forming LR and HR pairs. However, the sampling process is an ideal process for natural image degradation, and is not in accordance with reality. Second, the super-resolution result of the GAN network in the prior art has a great progress in subjective perception, but is not applied to video.

Therefore, there is a need to develop a video application result that combines the spatial correlation and the temporal correlation between frames and utilizes the super-resolution result of the GAN network to improve the old movies.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a video super-resolution method and a video super-resolution system based on a countermeasure generation network and edge enhancement.

The video super-resolution method based on the countermeasure generation network and the edge enhancement provided by the invention comprises the following steps:

step S1: independently establishing a data set based on a countermeasure generation network, and acquiring high-resolution continuous frames and corresponding low-resolution continuous frames;

step S2: converting the RGB color space of the high-resolution continuous frames and the corresponding low-resolution continuous frames into HSV color space;

step S3: establishing a generation network to obtain super-resolution continuous frames;

step S4: performing authenticity identification on the super-resolution continuous frames;

step S5: if the discrimination result is true, outputting a super-resolution continuous frame; and if the discrimination result is false, regenerating the super-resolution continuous frame for true-false discrimination.

Preferably, step S1 includes the following sub-steps:

step S11: reading in an original video of an old film and a corresponding high-definition restoration version;

step S12: converting an original video and a high-definition repair version video into a continuous frame sequence;

step S13: the high resolution consecutive frames and the low resolution consecutive frames are rotated and cropped.

Preferably, step S12 includes the following sub-steps:

step S121: aligning the time axes of the original video and the corresponding high-definition repair version;

step S122: selecting low-resolution video within a period from a starting time t1 to an ending time t2 of an original video^LRConverting into continuous frames;

step S123: selecting high-resolution video within a period from the same starting time t1 to the same ending time t2 of the high-definition repair version^HRAnd then, the frame is converted into a continuous frame.

Preferably, the low resolution video in step S12^LRAnd high resolution video^HRThere is no scene cut within.

Preferably, the rotation and cropping parameters employed in step S13 for the high resolution consecutive frames and the low resolution consecutive frames must be consistent.

Preferably, step S3 includes the following sub-steps:

step S31: motion compensation of optical flow network

Motion compensation

Quadruple linear upsampling to obtain upsampled motion compensation V_tCompensating V for the motion after upsampling_tSuper-resolution result from previous frame

Carrying out nonlinear image deformation operation to obtain deformed frame

Step S32: obtaining a super-resolution intermediate result by using a super-resolution reconstruction network;

step S33: and utilizing a Laplacian edge enhancement network for the super-resolution intermediate result.

Preferably, step S32 includes the following sub-steps:

step S321: motion compensation

Linear up-sampling by four times to obtain V_t；

Step S322: will V_tSuper-resolution reconstruction result with previous frame

Carrying out nonlinear image deformation operation to obtain deformed frame

Its shape is (batch size,4 w,4 h, channel);

step S323: will be provided with

Channel recombination is carried out to obtain a frame with a size sampled downwards

The shape at this time is (batch size, w, h,4 × 4 channel);

step S324: will be provided with

And

merging on the third channel, the shape being (batch size, w, h,4 × 4 channel + channel);

step S325: obtaining intermediate results of super-resolution

Wherein

A super-resolution result representing a residual of the low-resolution frame and the high-resolution frame,

representing a low resolution frame and Bicubic () representing Bicubic samples.

Preferably, step S33 includes the following sub-steps:

step S331: utilizing Laplacian L and

convolution is carried out to obtain an image with sudden change of pixel values

Wherein L is a laplace mask and,

in order to perform the convolution operation,

is the intermediate super-resolution result.

Step S332: extracting the edge of the intermediate super-resolution result by using Laplacian

And intermediate super-resolution results

Superimposed to produce a sharpened image

Step S333: post-processing denoising work to obtain a super-resolution final result

Preferably, step S4 includes:

loss of perception: the final result of super-resolution

And the target frame

A second order norm loss is made at the network layer of VGG19,

wherein

As a final result of super-resolution

A map of features on a particular convolutional layer of VGG19,

is a target frame

Feature maps on the same convolutional layer as VGG 19.

Content loss: the intermediate results are:

wherein the content of the first and second substances,

for the final result of the super-resolution,

in order to be a target frame, the frame is,

is the intermediate super-resolution result.

Loss of sequence: forward generated super-resolution end result

And reverse generated super-resolution end result

It should be consistent in theory that,

the invention provides a video super-resolution system based on a countermeasure generation network and edge enhancement, which comprises:

module M1: independently establishing a data set based on a countermeasure generation network, and acquiring high-resolution continuous frames and corresponding low-resolution continuous frames;

module M2: converting the RGB color space of the high-resolution continuous frames and the corresponding low-resolution continuous frames into HSV color space;

module M3: establishing a generation network to obtain super-resolution continuous frames;

module M4: performing authenticity identification on the super-resolution continuous frames;

module M5: if the discrimination result is true, outputting a super-resolution continuous frame; and if the discrimination result is false, regenerating the super-resolution continuous frame for true-false discrimination.

Compared with the prior art, the invention has the following beneficial effects:

1. according to the invention, the generation of a network is resisted, a data set is automatically established, a high-resolution continuous frame and a corresponding low-resolution continuous frame are obtained, the process of obtaining a low-resolution video from a high-resolution video by manual down-sampling is omitted, the original video continuous frame of an old film and television is directly used as input, the corresponding high-definition repair version continuous frame is used as a target, and the technical problem of the process of ideal natural image degradation is avoided.

2. The invention converts the RGB color space into the HSV color space and then carries out nonlinear feature mapping, so that the visual characteristic of the human eye is closer, and the good visual effect is beneficial to being obtained.

3. The method fully utilizes the characteristic that a movie scene is accompanied by a large amount of subtitles, and carries out edge enhancement operation after the intermediate super-resolution result to obtain a final result.

4. According to the invention, the noise mask is arranged on the edge enhancement module to learn the noise, so that the learned noise can be removed from the extracted edge.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

fig. 1 is a flow chart of the video super-resolution method based on the countermeasure generation network and the edge enhancement of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

The invention provides a video super-resolution method based on a confrontation generation network and edge enhancement, which comprises the following steps:

step S1: and based on the countermeasure generation network autonomous establishment data set, acquiring high-resolution continuous frames and corresponding low-resolution continuous frames.

Further, step S1 includes the following sub-steps:

step S11: and reading in an original video of the old film and the corresponding high-definition restoration version.

Step S12: the original video and the high definition restored version video are converted into a continuous sequence of frames.

Step S121: and aligning the time axes of the original video and the corresponding high-definition repair version.

Step S122: selecting low-resolution video within a period from a starting time t1 to an ending time t2 of an original video^LRAnd then, the frame is converted into a continuous frame.

Step S3 includes the following sub-steps:

step S31: motion compensation of optical flow network

Linear up-sampling by four times to obtain V_tWill V_tAnd

performing nonlinear image deformation operation to obtain

Step S32: and obtaining a super-resolution intermediate result by using a super-resolution reconstruction network.

Step S331: utilizing Laplacian L and

Wherein L is a laplace mask and,

in order to perform the convolution operation,

is the intermediate super-resolution result.

Step S332: will Laplace image

And intermediate super-resolution results

Superimposed to produce a sharpened image

Step S333: denoising work of post-processing to obtain final result

As shown in fig. 1, the present invention first creates a particular high-low resolution data pair for a particular scene. The dataset is then subjected to preprocessing including cropping, rotation, and color space conversion. And then obtaining a reconstruction result through a generated network, adding a corresponding layer in the generated network for edge enhancement, and finally identifying the authenticity through judging the network.

The invention is further clearly and completely explained by combining the drawings in the specification, so that the invention is clearer and clearer.

Firstly, reading in an original video of an old film and a corresponding high-definition repair version, and respectively converting the original video and the corresponding high-definition repair version into continuous frames. In the current image or video super-resolution algorithm, a low-resolution data set is basically obtained from a high-resolution data set in the processing of the data set. This process involves down-sampling and artificial noise addition in order to mimic the process of degradation of natural images or video, but does not achieve the same. Aiming at a specific movie scene, the method directly adopts the original continuous frames of the old movie as the input of the network, and the corresponding high-definition repair version continuous frames as the target of the network. Thus, the intermediate degradation process is omitted, and the actual situation of the natural image is fitted.

And secondly, aligning time axes of the original video and the corresponding high-definition repair version, selecting the videos of the original video and the high-definition repair version within the same time period, wherein scene switching does not exist in the selected time period, and then converting the videos into continuous frames. And then, rotating and cutting the high-resolution continuous frames and the low-resolution continuous frames (the operation parameters need to be in one-to-one correspondence), and expanding the training set.

Third, the conversion of color space, from RGB to HSV. HSV color space is another popular color model in addition to RGB color space, which is widely used in computers, and HSV is a color model for users' perception, emphasizing on color representation, which is closer to the perception experience of human color than RGB, and very intuitively expresses hue (H), vividness (S) and brightness (V) of color in television display. Meanwhile, the two color spaces have a definite conversion relation, nonlinear mapping and super-resolution reconstruction are carried out in the HSV space, and then the color spaces are converted back to the RGB space.

Fourthly, inputting the low-resolution continuous frames subjected to the preprocessing into the optical flow network for motion compensation. Video super-resolution has more one-dimensional temporal information available than image super-resolution, while also adding to the consistency challenges. In the invention, the optical flow net is still reserved for motion compensation. The input to the network is two consecutive low resolution consecutive strings, one set of 10 consecutive frames, the second set being the reverse sequence of the first set:

then, the user can use the device to perform the operation,

and

and (3) merging to obtain:

so designed that not only can obtain

And

between the motion compensation

Can also obtain

And

between the motion compensation

For subsequent use

And

both are theoretically identical, which facilitates the later design of the loss function.

And fifthly, reconstructing the super-resolution. In order to enhance the consistency of the reconstructed video, the invention compensates the motion

Linear up-sampling by four times to obtain V_tWill V_tSuper-resolution reconstruction result with previous frame

Performing nonlinear image deformation operation to obtain

The shape is (batch size,4 w,4 h, channel) and the channel recombination is carried out to obtain

The shape at this time is (batch size, w, h,4 channel), and the following will be described

And

merging takes place on the third channel, now shaped as (blocksize, w, h,4 x 4 channel + channel), which is the final input to the hyper-resolution reconstruction network. The output of the super-resolution network is

In addition, in order to obtain stable network training, the invention only learns the residual part, and the final result of the residual part is

Sixth, edge enhancement. The output obtained in the last step is an intermediate super-resolution result

The invention carries out edge enhancement on the basis of the method. First using the Laplace operator L and

Then the Laplace image is processed

And intermediate super-resolution results

Superimposed to produce a sharpened image

The simple edge enhancement can not only produce the effect of Laplace sharpening, but also retain background information, and superimpose an original image on a processing result of Laplace transformation, so that each pixel value in the image can be retained, the contrast at a sudden change position of the pixel value is enhanced, and the final result is that the edge is highlighted on the premise of retaining the image background. Considering that the laplacian operator can enhance the noise while enhancing the edge, the post-processing work is increased to obtain the final result

Aiming at the phenomenon that a large amount of subtitles are accompanied in movie and television theaters, the edge enhancement effect can obviously improve the visual effect.

Seventh, design of the loss function. The method generates a network based on the countermeasure, and adds an additional loss function besides the countermeasure loss. Firstly, loss of perception: the final result is obtained

And the target frame

L2 losses are made at the VGG19 network layer,

wherein

And

are respectively as

And

the resulting signature is convolved on VGG 19. Content loss: in addition to the final result, a portion of the intermediate results is added.

③ loss of sequence: generated in the forward direction

And

it should be consistent in theory that,

the main framework of the invention is based on a countermeasure generation network, and is an end-to-end video super-resolution method based on deep learning. Unlike prior art techniques that combine information from adjacent low resolution frames for video reconstruction. In addition, the method is used for the autonomous creation of the data set of the network training, and is different from the method which is commonly used at present and directly uses the high-resolution data set to degenerate to obtain the low-resolution data set, and the method directly obtains the high-resolution continuous frames and the low-resolution continuous frames of the training set.

Additionally, the present invention also incorporates other image enhancement techniques. And after finishing the middle super-resolution result, performing an edge enhancement technology on the middle result. The edge enhancement technology improves Laplace edge enhancement, simple edge enhancement can misunderstand noise as an edge to be enhanced, and noise amplification is caused, so that a noise mask is arranged in an edge enhancement module to learn noise, and the learned noise is removed from an extracted edge.

Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A video super-resolution method based on a confrontation generation network and edge enhancement is characterized by comprising the following steps:

2. The video super-resolution method based on the countermeasure generation network and the edge enhancement according to claim 1, wherein the step S1 comprises the following sub-steps:

3. The video super-resolution method based on the countermeasure generation network and the edge enhancement according to claim 2, wherein the step S12 comprises the following sub-steps:

4. The video super resolution method based on the countermeasure generation network and the edge enhancement as claimed in claim 3, wherein the low resolution video in step S12^LRAnd high resolution video^HRThere is no scene cut within.

5. The super-resolution video method based on resist generation network and edge enhancement according to claim 2, wherein the rotation and cropping parameters adopted in step S13 for the high-resolution continuous frames and the low-resolution continuous frames must be consistent.

6. The video super-resolution method based on the countermeasure generation network and the edge enhancement according to claim 1, wherein the step S3 comprises the following sub-steps:

step S31: motion compensation of optical flow network

Exercise patchPayment

Carrying out nonlinear image deformation operation to obtain deformed frame

7. The video super-resolution method based on the countermeasure generation network and the edge enhancement according to claim 5, wherein the step S32 comprises the following sub-steps:

step S321: motion compensation

Linear up-sampling by four times to obtain V_t；

Step S322: will V_tSuper-resolution reconstruction result with previous frame

Carrying out nonlinear image deformation operation to obtain deformed frame

Its shape is (batch size,4 w,4 h, channel);

step S323: will be provided with

Performing channel reorganizationThen obtaining the size down-sampled frame

The shape at this time is (batch size, w, h,4 × 4 channel);

step S324: will be provided with

And

step S325: obtaining intermediate results of super-resolution

Wherein

8. The video super-resolution method based on the countermeasure generation network and the edge enhancement according to claim 6, wherein the step S33 comprises the following sub-steps:

step S331: utilizing Laplacian L and

Wherein L is a laplace mask and,

in order to perform the convolution operation,

is the intermediate super-resolution result.

And intermediate super-resolution results

Superimposed to produce a sharpened image

9. The video super-resolution method based on the countermeasure generation network and the edge enhancement according to claim 1, wherein the step S4 includes:

loss of perception: the final result of super-resolution