CN115396667A - Wide QP range loop filtering method based on deformable convolution - Google Patents
Wide QP range loop filtering method based on deformable convolution Download PDFInfo
- Publication number
- CN115396667A CN115396667A CN202211005433.7A CN202211005433A CN115396667A CN 115396667 A CN115396667 A CN 115396667A CN 202211005433 A CN202211005433 A CN 202211005433A CN 115396667 A CN115396667 A CN 115396667A
- Authority
- CN
- China
- Prior art keywords
- feature
- module
- convolution
- deformable convolution
- feature map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/154—Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/80—Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
- H04N19/82—Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention relates to a wide QP range loop filtering method based on deformable convolution, and belongs to the field of video coding. The method utilizes the flexible receptive field of deformable convolution and the deformation modeling capability to automatically learn the video characteristics, and performs characteristic reinforcement through dense residual connection so as to achieve the purpose of enhancing the quality of the compressed video; meanwhile, a QP attention module is provided to improve the generalization capability of the method, so that the method can simultaneously improve the quality of compressed videos under different QPs. The method can obviously improve the quality of the host and the guest of the compressed video.
Description
Technical Field
The invention relates to the field of video coding and decoding, in particular to a wide QP range loop filtering method based on deformable convolution.
Background
Existing video coding standards such as h.264, HEVC, AV1, and the like all use a block-based coding structure, and correlation between coding blocks is lacking, resulting in an obvious blocking effect in a compressed video, which greatly affects the viewing experience of the video. Therefore, a loop filtering module is usually used in the video encoding process to reduce compression artifacts such as blocking artifacts in the video and improve the video encoding performance.
The traditional loop filter is limited by the problem of computational complexity, and has a limited effect on improving the video quality. Although the current deep learning method significantly improves the performance potential of a loop filter in video coding, the existing network model based on deep learning is only effective for a single Quantization Parameter (QP), and if the Quantization Parameter of the test video is not matched with the assumed value, the processing performance of the model is greatly reduced. In the existing loop filtering method, in the process of processing videos compressed by different QPs, a model needs to be trained for a certain specific quantization parameter (or a certain small quantization parameter range) respectively for filtering, which causes an extra storage burden and affects the actual deployment of the method model. Therefore, how to design a single model capable of processing a video compressed in a wide QP range and obtaining a better video subjective quality is a problem to be solved urgently.
Disclosure of Invention
The invention aims to improve the performance of a loop filter of HEVC, and provides a loop filtering method with a wide QP range based on deformable convolution.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method of wide QP range loop filtering based on deformable convolution, comprising:
step 1: a data set is made. Collecting a large amount of videos and pictures, converting the videos and the pictures into YUV420 format required by HEVC coding, coding the videos and the pictures in a full I frame configuration mode by using an HEVC standard test platform HM-16.9, closing a loop filtering module in the platform in the coding process, and obtaining unfiltered reconstructed videos under 4 quantization levels (QP =22, 27, 32 and 37) as a training set.
Step 2: and constructing a network model. The network model mainly consists of 3 modules: a lightweight feature generation module based on deformable convolution, a QP attention module, and a feature enhancement module based on channel attention mechanism.
And step 3: and (5) processing by a feature extraction module. The compressed video image is input into a lightweight feature generation module based on deformable convolution, and the module consists of a U-shaped network. The main component of the lightweight feature extraction module based on the deformable convolution is a U-shaped network which mainly comprises a compression module and an expansion module. The compression module comprises 3 convolutional blocks, each convolutional block comprising two convolutional layers, the layers being connected by an activation function PReLU. In order to compress the obtained features, the step size of the 2 nd convolution is 2, the size of the feature map is reduced to 1/2 while the number of the feature maps is kept unchanged, and finally the feature map with the size of 1/8 is obtained. And inputting the down-sampled feature map into a feature expansion module. The path also consists of 3 convolution blocks, the size of the feature map is enlarged by 2 times through deconvolution before each convolution block starts, then the feature maps of jump connection and left symmetrical compression modules are used for combination, and finally the feature map with the same size as the input size is output through 3 feature expansion modules. And inputting the subsequent deformable convolution layer through the feature graph compressed and excited by the U-shaped network, and learning two-dimensional coordinate offset to further extract deformation information in the features.
And 4, step 4: QP attention module processing. The module mainly comprises a generator and a controller, and the essence of the generator and the controller is two Multi-Layer perceptrons (MLPs). The number of layers of the generator and the controller is respectively the same as the number of convolution blocks in the compression module and the expansion module in the U-shaped network, and is set to be 3 in the experiment in this chapter. The input of the generator is a quantization parameter QP of the current code, and 64 QP characteristics f are generated by a linear layer with 64 nodes QP . The controller takes the output of the generator as input, according to the corresponding QP characteristic f QP And controlling the output characteristic diagram. The controller can learn a set of affine transformation mapping functions M of the feature map according to the input QP features, namely, each linear layer can learn a set of modulation parameter pairs (gamma, beta) related to QP according to the QP features and can adaptively adapt according to the (gamma, beta)And (5) outputting a feature map.
And 5: and processing by a feature enhancement module. The module mainly comprises a residual Block consisting of 3 Dense blocks (Dense Block) and a compressed Excitation channel attention module (Squeeze and Excitation, SE). The module adopts dense connection, the capacity of the network is greatly expanded, the output characteristic diagram of the forward convolutional layer is input into the backward anti-convolutional layer, and better characteristic expression is obtained through cross-layer characteristic sharing, meanwhile, the gradient disappearance phenomenon in the network training process is relieved, and the trainability of the network is improved. And the SE module firstly uses maximal pooling for input features to compress the features, then performs feature excitation through two layers of convolution, finally uses Sigmoid to activate a door mechanism to obtain the weight related to the channels, controls the importance of each output feature channel of the RDB through the weight and enhances the directivity of the features.
And 6: and (5) feature fusion processing. And fusing the features extracted by the three modules with the originally input compressed video image to obtain a final fusion enhanced video image.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention provides a wide QP range loop filtering method based on deformable convolution, and the adopted loop filtering model based on deformable convolution can effectively enhance the subjective and objective quality of a compressed video and obviously improve the coding efficiency of a video coder.
2. The QP attention model adopted by the invention can effectively improve the generalization capability of the network model, so that the whole network can simultaneously process videos after a plurality of QPs are compressed by a single model.
The present invention will be described in further detail with reference to the drawings and embodiments, but the present invention is not limited to the embodiments.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention.
Detailed Description
Referring to fig. 1, in order to solve the problem of limited performance of the existing HEVC loop filtering, the present invention provides a wide QP range loop filtering method based on deformable convolution, which specifically includes the following steps:
step 1: a data set is produced. Collecting a large amount of videos and pictures, converting the videos and the pictures into YUV420 format required by HEVC coding, coding the videos and the pictures in a full I frame configuration mode by using an HEVC standard test platform HM-16.9, closing a loop filtering module in the platform in the coding process, and obtaining unfiltered reconstructed videos under 4 quantization levels (QP =22, 27, 32 and 37) as a training set.
Step 2: and (5) constructing a network model. The network model mainly consists of 3 modules: a lightweight feature generation module based on deformable convolution, a QP attention module, and a feature enhancement module based on channel attention mechanism.
And step 3: and (5) processing by a feature extraction module. The compressed video image is input into a lightweight feature generation module based on deformable convolution, and the module consists of a U-shaped network. The main component of the lightweight feature extraction module based on the deformable convolution is a U-shaped network which mainly comprises a compression module and an expansion module. The compression module contains 3 convolutional blocks, each convolutional block containing two convolutional layers, connected by an activation function PReLU. In order to compress the obtained features, the step size of the 2 nd convolution is 2, the size of the feature map is reduced to 1/2 while the number of the feature maps is kept unchanged, and finally the feature map with the size of 1/8 is obtained. And inputting the down-sampled feature map into a feature expansion module. The path also comprises 3 convolution blocks, the size of the feature graph is enlarged by 2 times through deconvolution before each convolution block starts, then the feature graphs of jump connection and left symmetrical compression modules are combined, and finally the feature graph with the same size as the input is output after passing through 3 feature expansion modules. And inputting the subsequent deformable convolution layer through the feature graph compressed and excited by the U-shaped network, and learning two-dimensional coordinate offset to further extract deformation information in the features.
And 4, step 4: QP attention module processing. The module mainly comprises a generator and a controller, and the essence of the generator and the controller is two Multi-Layer perceptrons (MLPs). The number of layers of the generator and the controller are respectively equal toThe number of convolution blocks in the compression and expansion modules in the U-type network is the same, and is set to be 3 in the experiment in this chapter. The input of the generator is a quantization parameter QP of the current code, and 64 QP characteristics f are generated by a linear layer with 64 nodes QP . The controller takes the output of the generator as input, according to the corresponding QP characteristic f QP And controlling the output characteristic diagram. The controller can learn a set of affine transformation mapping functions M of the feature map according to the input QP features, namely, each linear layer can learn a set of modulation parameter pairs (gamma, beta) related to QP according to the QP features and adaptively adjust the output feature map according to the (gamma, beta).
And 5: and (5) processing by a feature enhancement module. The module mainly comprises a residual Block consisting of 3 Dense blocks (Dense Block) and a compressed Excitation channel attention module (Squeeze and Excitation, SE). The module adopts dense connection, the capacity of the network is greatly expanded, the output characteristic diagram of the forward convolutional layer is input into the backward anti-convolutional layer, and better characteristic expression is obtained through cross-layer characteristic sharing, meanwhile, the gradient disappearance phenomenon in the network training process is relieved, and the trainability of the network is improved. And the SE module firstly uses maximal pooling for input features to compress the features, then performs feature excitation through two layers of convolution, finally uses Sigmoid to activate a door mechanism to obtain the weight related to the channels, controls the importance of each output feature channel of the dense block through the weight and enhances the directivity of the features.
Step 6: and (5) feature fusion processing. And fusing the features extracted by the three modules with the originally input compressed video image to obtain a final fusion enhanced video image.
The above-described embodiments are merely illustrative of the present invention and are not intended to limit the present invention, and variations, modifications, and the like of the above-described embodiments are possible within the scope of the claims of the present invention as long as they are in accordance with the technical spirit of the present invention.
Claims (1)
1. A wide QP range loop filtering method based on deformable convolution is characterized by comprising the following steps:
step 1: making a data set;
step 2: constructing a network model; it comprises 3 modules: the system comprises a light weight feature generation module based on deformable convolution, a QP attention module and a feature enhancement module based on a channel attention mechanism;
and 3, step 3: processing by a feature extraction module; inputting the compressed video image into a lightweight characteristic generation module based on deformable convolution; the lightweight feature generation module based on the deformable convolution is composed of a U-shaped network, a compression module and a feature expansion module; the compression module comprises 3 volume blocks, the size of the feature map is reduced to 1/2 while the number of the feature map is kept unchanged, and finally the feature map with the size of 1/8 is obtained; inputting the down-sampled feature map into a feature expansion module, wherein the path is also composed of 3 convolution blocks, the size of the feature map is enlarged by 2 times through deconvolution before each convolution block starts, then the feature expansion module is used for 3 times, and finally the feature map with the same size as the input size is output;
and 4, step 4: QP attention module processing; the QP attention module consists of a generator and a controller, and the essence of the generator and the controller is two multi-layer perceptrons; the generator generates 64 QP features f QP (ii) a Controller according to QP characteristic f QP Learning a group of QP-related modulation parameter pairs (gamma, beta) and adaptively adjusting an output feature map according to the (gamma, beta);
and 5: processing by a feature enhancement module; the feature enhancement module consists of 3 Dense blocks (Dense Block) and a compressed Excitation channel attention module (Squeeze and Excitation, SE);
step 6: performing feature fusion processing; and fusing the features extracted by the light weight feature generation module based on deformable convolution, the QP attention module and the feature enhancement module based on the channel attention mechanism with the originally input compressed video image to obtain a final fusion enhanced video image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211005433.7A CN115396667A (en) | 2022-08-22 | 2022-08-22 | Wide QP range loop filtering method based on deformable convolution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211005433.7A CN115396667A (en) | 2022-08-22 | 2022-08-22 | Wide QP range loop filtering method based on deformable convolution |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115396667A true CN115396667A (en) | 2022-11-25 |
Family
ID=84120342
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211005433.7A Withdrawn CN115396667A (en) | 2022-08-22 | 2022-08-22 | Wide QP range loop filtering method based on deformable convolution |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115396667A (en) |
-
2022
- 2022-08-22 CN CN202211005433.7A patent/CN115396667A/en not_active Withdrawn
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109903228B (en) | Image super-resolution reconstruction method based on convolutional neural network | |
CN112203093B (en) | Signal processing method based on deep neural network | |
CN111028150B (en) | Rapid space-time residual attention video super-resolution reconstruction method | |
CN114092330B (en) | Light-weight multi-scale infrared image super-resolution reconstruction method | |
CN108921910B (en) | JPEG coding compressed image restoration method based on scalable convolutional neural network | |
CN110351568A (en) | A kind of filtering video loop device based on depth convolutional network | |
CN111464814B (en) | Virtual reference frame generation method based on parallax guide fusion | |
CN112734867A (en) | Multispectral image compression method and system based on space spectrum feature separation and extraction | |
CN111667406B (en) | Video image super-resolution reconstruction method based on time domain correlation | |
CN113344773A (en) | Single picture reconstruction HDR method based on multi-level dual feedback | |
CN110677624B (en) | Monitoring video-oriented foreground and background parallel compression method based on deep learning | |
CN113068031B (en) | Loop filtering method based on deep learning | |
CN110677644B (en) | Video coding and decoding method and video coding intra-frame predictor | |
CN113068041B (en) | Intelligent affine motion compensation coding method | |
CN114022356A (en) | River course flow water level remote sensing image super-resolution method and system based on wavelet domain | |
CN115604485A (en) | Video image decoding method and device | |
CN112396674A (en) | Rapid event image filling method and system based on lightweight generation countermeasure network | |
CN115396667A (en) | Wide QP range loop filtering method based on deformable convolution | |
CN110519606A (en) | Intelligent coding method in a kind of deep video frame | |
CN116347107A (en) | QP self-adaptive loop filtering method based on variable CNN for VVC video coding standard | |
Yang et al. | Imrnet: An iterative motion compensation and residual reconstruction network for video compressed sensing | |
CN112468826B (en) | VVC loop filtering method and system based on multilayer GAN | |
CN115131254A (en) | Constant bit rate compressed video quality enhancement method based on two-domain learning | |
CN105704497A (en) | Fast select algorithm for coding unit size facing 3D-HEVC | |
CN115604465B (en) | Light field microscopic image lossless compression method and device based on phase space continuity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20221125 |
|
WW01 | Invention patent application withdrawn after publication |