CN115396667A

CN115396667A - Wide QP range loop filtering method based on deformable convolution

Info

Publication number: CN115396667A
Application number: CN202211005433.7A
Authority: CN
Inventors: 陈婧; 王飞锋; 曾焕强; 朱建清; 蔡灿辉
Original assignee: Huaqiao University
Current assignee: Huaqiao University
Priority date: 2022-08-22
Filing date: 2022-08-22
Publication date: 2022-11-25

Abstract

The invention relates to a wide QP range loop filtering method based on deformable convolution, and belongs to the field of video coding. The method utilizes the flexible receptive field of deformable convolution and the deformation modeling capability to automatically learn the video characteristics, and performs characteristic reinforcement through dense residual connection so as to achieve the purpose of enhancing the quality of the compressed video; meanwhile, a QP attention module is provided to improve the generalization capability of the method, so that the method can simultaneously improve the quality of compressed videos under different QPs. The method can obviously improve the quality of the host and the guest of the compressed video.

Description

Wide QP range loop filtering method based on deformable convolution

Technical Field

The invention relates to the field of video coding and decoding, in particular to a wide QP range loop filtering method based on deformable convolution.

Background

Existing video coding standards such as h.264, HEVC, AV1, and the like all use a block-based coding structure, and correlation between coding blocks is lacking, resulting in an obvious blocking effect in a compressed video, which greatly affects the viewing experience of the video. Therefore, a loop filtering module is usually used in the video encoding process to reduce compression artifacts such as blocking artifacts in the video and improve the video encoding performance.

The traditional loop filter is limited by the problem of computational complexity, and has a limited effect on improving the video quality. Although the current deep learning method significantly improves the performance potential of a loop filter in video coding, the existing network model based on deep learning is only effective for a single Quantization Parameter (QP), and if the Quantization Parameter of the test video is not matched with the assumed value, the processing performance of the model is greatly reduced. In the existing loop filtering method, in the process of processing videos compressed by different QPs, a model needs to be trained for a certain specific quantization parameter (or a certain small quantization parameter range) respectively for filtering, which causes an extra storage burden and affects the actual deployment of the method model. Therefore, how to design a single model capable of processing a video compressed in a wide QP range and obtaining a better video subjective quality is a problem to be solved urgently.

Disclosure of Invention

The invention aims to improve the performance of a loop filter of HEVC, and provides a loop filtering method with a wide QP range based on deformable convolution.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a method of wide QP range loop filtering based on deformable convolution, comprising:

step 1: a data set is made. Collecting a large amount of videos and pictures, converting the videos and the pictures into YUV420 format required by HEVC coding, coding the videos and the pictures in a full I frame configuration mode by using an HEVC standard test platform HM-16.9, closing a loop filtering module in the platform in the coding process, and obtaining unfiltered reconstructed videos under 4 quantization levels (QP =22, 27, 32 and 37) as a training set.

Step 2: and constructing a network model. The network model mainly consists of 3 modules: a lightweight feature generation module based on deformable convolution, a QP attention module, and a feature enhancement module based on channel attention mechanism.

And step 3: and (5) processing by a feature extraction module. The compressed video image is input into a lightweight feature generation module based on deformable convolution, and the module consists of a U-shaped network. The main component of the lightweight feature extraction module based on the deformable convolution is a U-shaped network which mainly comprises a compression module and an expansion module. The compression module comprises 3 convolutional blocks, each convolutional block comprising two convolutional layers, the layers being connected by an activation function PReLU. In order to compress the obtained features, the step size of the 2 nd convolution is 2, the size of the feature map is reduced to 1/2 while the number of the feature maps is kept unchanged, and finally the feature map with the size of 1/8 is obtained. And inputting the down-sampled feature map into a feature expansion module. The path also consists of 3 convolution blocks, the size of the feature map is enlarged by 2 times through deconvolution before each convolution block starts, then the feature maps of jump connection and left symmetrical compression modules are used for combination, and finally the feature map with the same size as the input size is output through 3 feature expansion modules. And inputting the subsequent deformable convolution layer through the feature graph compressed and excited by the U-shaped network, and learning two-dimensional coordinate offset to further extract deformation information in the features.

And 4, step 4: QP attention module processing. The module mainly comprises a generator and a controller, and the essence of the generator and the controller is two Multi-Layer perceptrons (MLPs). The number of layers of the generator and the controller is respectively the same as the number of convolution blocks in the compression module and the expansion module in the U-shaped network, and is set to be 3 in the experiment in this chapter. The input of the generator is a quantization parameter QP of the current code, and 64 QP characteristics f are generated by a linear layer with 64 nodes _QP . The controller takes the output of the generator as input, according to the corresponding QP characteristic f _QP And controlling the output characteristic diagram. The controller can learn a set of affine transformation mapping functions M of the feature map according to the input QP features, namely, each linear layer can learn a set of modulation parameter pairs (gamma, beta) related to QP according to the QP features and can adaptively adapt according to the (gamma, beta)And (5) outputting a feature map.

And 5: and processing by a feature enhancement module. The module mainly comprises a residual Block consisting of 3 Dense blocks (Dense Block) and a compressed Excitation channel attention module (Squeeze and Excitation, SE). The module adopts dense connection, the capacity of the network is greatly expanded, the output characteristic diagram of the forward convolutional layer is input into the backward anti-convolutional layer, and better characteristic expression is obtained through cross-layer characteristic sharing, meanwhile, the gradient disappearance phenomenon in the network training process is relieved, and the trainability of the network is improved. And the SE module firstly uses maximal pooling for input features to compress the features, then performs feature excitation through two layers of convolution, finally uses Sigmoid to activate a door mechanism to obtain the weight related to the channels, controls the importance of each output feature channel of the RDB through the weight and enhances the directivity of the features.

And 6: and (5) feature fusion processing. And fusing the features extracted by the three modules with the originally input compressed video image to obtain a final fusion enhanced video image.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention provides a wide QP range loop filtering method based on deformable convolution, and the adopted loop filtering model based on deformable convolution can effectively enhance the subjective and objective quality of a compressed video and obviously improve the coding efficiency of a video coder.

2. The QP attention model adopted by the invention can effectively improve the generalization capability of the network model, so that the whole network can simultaneously process videos after a plurality of QPs are compressed by a single model.

The present invention will be described in further detail with reference to the drawings and embodiments, but the present invention is not limited to the embodiments.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention.

Detailed Description

Referring to fig. 1, in order to solve the problem of limited performance of the existing HEVC loop filtering, the present invention provides a wide QP range loop filtering method based on deformable convolution, which specifically includes the following steps:

step 1: a data set is produced. Collecting a large amount of videos and pictures, converting the videos and the pictures into YUV420 format required by HEVC coding, coding the videos and the pictures in a full I frame configuration mode by using an HEVC standard test platform HM-16.9, closing a loop filtering module in the platform in the coding process, and obtaining unfiltered reconstructed videos under 4 quantization levels (QP =22, 27, 32 and 37) as a training set.

Step 2: and (5) constructing a network model. The network model mainly consists of 3 modules: a lightweight feature generation module based on deformable convolution, a QP attention module, and a feature enhancement module based on channel attention mechanism.

And step 3: and (5) processing by a feature extraction module. The compressed video image is input into a lightweight feature generation module based on deformable convolution, and the module consists of a U-shaped network. The main component of the lightweight feature extraction module based on the deformable convolution is a U-shaped network which mainly comprises a compression module and an expansion module. The compression module contains 3 convolutional blocks, each convolutional block containing two convolutional layers, connected by an activation function PReLU. In order to compress the obtained features, the step size of the 2 nd convolution is 2, the size of the feature map is reduced to 1/2 while the number of the feature maps is kept unchanged, and finally the feature map with the size of 1/8 is obtained. And inputting the down-sampled feature map into a feature expansion module. The path also comprises 3 convolution blocks, the size of the feature graph is enlarged by 2 times through deconvolution before each convolution block starts, then the feature graphs of jump connection and left symmetrical compression modules are combined, and finally the feature graph with the same size as the input is output after passing through 3 feature expansion modules. And inputting the subsequent deformable convolution layer through the feature graph compressed and excited by the U-shaped network, and learning two-dimensional coordinate offset to further extract deformation information in the features.

And 4, step 4: QP attention module processing. The module mainly comprises a generator and a controller, and the essence of the generator and the controller is two Multi-Layer perceptrons (MLPs). The number of layers of the generator and the controller are respectively equal toThe number of convolution blocks in the compression and expansion modules in the U-type network is the same, and is set to be 3 in the experiment in this chapter. The input of the generator is a quantization parameter QP of the current code, and 64 QP characteristics f are generated by a linear layer with 64 nodes _QP . The controller takes the output of the generator as input, according to the corresponding QP characteristic f _QP And controlling the output characteristic diagram. The controller can learn a set of affine transformation mapping functions M of the feature map according to the input QP features, namely, each linear layer can learn a set of modulation parameter pairs (gamma, beta) related to QP according to the QP features and adaptively adjust the output feature map according to the (gamma, beta).

And 5: and (5) processing by a feature enhancement module. The module mainly comprises a residual Block consisting of 3 Dense blocks (Dense Block) and a compressed Excitation channel attention module (Squeeze and Excitation, SE). The module adopts dense connection, the capacity of the network is greatly expanded, the output characteristic diagram of the forward convolutional layer is input into the backward anti-convolutional layer, and better characteristic expression is obtained through cross-layer characteristic sharing, meanwhile, the gradient disappearance phenomenon in the network training process is relieved, and the trainability of the network is improved. And the SE module firstly uses maximal pooling for input features to compress the features, then performs feature excitation through two layers of convolution, finally uses Sigmoid to activate a door mechanism to obtain the weight related to the channels, controls the importance of each output feature channel of the dense block through the weight and enhances the directivity of the features.

Step 6: and (5) feature fusion processing. And fusing the features extracted by the three modules with the originally input compressed video image to obtain a final fusion enhanced video image.

The above-described embodiments are merely illustrative of the present invention and are not intended to limit the present invention, and variations, modifications, and the like of the above-described embodiments are possible within the scope of the claims of the present invention as long as they are in accordance with the technical spirit of the present invention.

Claims

1. A wide QP range loop filtering method based on deformable convolution is characterized by comprising the following steps:

step 1: making a data set;

step 2: constructing a network model; it comprises 3 modules: the system comprises a light weight feature generation module based on deformable convolution, a QP attention module and a feature enhancement module based on a channel attention mechanism;

and 3, step 3: processing by a feature extraction module; inputting the compressed video image into a lightweight characteristic generation module based on deformable convolution; the lightweight feature generation module based on the deformable convolution is composed of a U-shaped network, a compression module and a feature expansion module; the compression module comprises 3 volume blocks, the size of the feature map is reduced to 1/2 while the number of the feature map is kept unchanged, and finally the feature map with the size of 1/8 is obtained; inputting the down-sampled feature map into a feature expansion module, wherein the path is also composed of 3 convolution blocks, the size of the feature map is enlarged by 2 times through deconvolution before each convolution block starts, then the feature expansion module is used for 3 times, and finally the feature map with the same size as the input size is output;

and 4, step 4: QP attention module processing; the QP attention module consists of a generator and a controller, and the essence of the generator and the controller is two multi-layer perceptrons; the generator generates 64 QP features f _QP (ii) a Controller according to QP characteristic f _QP Learning a group of QP-related modulation parameter pairs (gamma, beta) and adaptively adjusting an output feature map according to the (gamma, beta);

and 5: processing by a feature enhancement module; the feature enhancement module consists of 3 Dense blocks (Dense Block) and a compressed Excitation channel attention module (Squeeze and Excitation, SE);

step 6: performing feature fusion processing; and fusing the features extracted by the light weight feature generation module based on deformable convolution, the QP attention module and the feature enhancement module based on the channel attention mechanism with the originally input compressed video image to obtain a final fusion enhanced video image.