CN110519606A

CN110519606A - Intelligent coding method in a kind of deep video frame

Info

Publication number: CN110519606A
Application number: CN201910780475.XA
Authority: CN
Inventors: 雷建军; 刘晓寰; 侯春萍; 张凯明; 张静; 何景逸
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2019-08-22
Filing date: 2019-08-22
Publication date: 2019-11-29
Anticipated expiration: 2039-08-22
Also published as: CN110519606B

Abstract

The invention discloses intelligent coding methods in a kind of deep video frame, comprising: the variable resolution predictive coding mode that building is made of down-sampling-up-sampling；To current depth LCU down-sampling to reduce its size, obtains low resolution depth block and carry out low resolution coding；The low resolution depth coding unit after coding is up-sampled using the convolutional neural networks that color property assists, the extraction of depth characteristic and color property is carried out using residual encoding unit；The feature of extraction is subjected to dimension-reduction treatment, two kinds of features after dimensionality reduction are then subjected to Fusion Features, obtain final fusion feature；By the results added of fusion feature and discrete cosine filtering interpolation, so that the data of training process are always the residual error of predicted value and true value；Above-mentioned steps are implanted into 3D-HEVC as a kind of new intra prediction mode, and to select optimal prediction modes compared with other intra prediction modes progress rate distortion costs.

Description

Intelligent coding method in a kind of deep video frame

Technical field

The present invention relates to intelligent coding methods in Video coding, deep learning field more particularly to a kind of deep video frame.

Background technique

3D video due to can provide the user with three-dimensional sense on the spot in person by and receive significant attention.3D-HEVC (new one For high efficiency video encoding standard 3D version) as the 3D of HEVC coding method is expanded, the colored view in addition to encoding each viewpoint Except frequency sequence, also need to encode the corresponding deep video sequence of each viewpoint.Deep video contains the depth and parallax of scene Information, reflects the distance, depth and distribution situation of object in scene, and coding efficiency directly affects the three-dimensional perception matter of scene Amount.Deep video is made of the smooth region of large area and sharp keen boundary.Large area smooth region there are a large amount of spatial redundancies, There is important role on boundary for distinguishing object, prospect and background different in scene, it is therefore desirable to study efficient depth Video intra-frame prediction method.

The intra prediction of traditional 2D video studies encoded pixel and picture to be predicted based on the method for image statistics Linear relationship between element goes out current pixel by encoded pixel value prediction.Its main flow is to make true value and predicted value The residual values between the two that difference obtains, and be incorporated into code stream and transmitted；Residual values are added by decoding end with reference pixel value, just can Obtain corresponding prediction result.In addition, some 2D video intra-frame prediction researchs based on deep learning also achieve progress.Li et al. The intra-frame predictive encoding method based on fully-connected network is proposed, is learnt by multiple complete cascade modes of articulamentum from adjacent heavy The end-to-end mapping for building pixel to current block improves code efficiency to take full advantage of the contextual information of current block.Li Etc. propose it is a kind of based on convolutional neural networks up-sampling intra-frame predictive encoding method, first to the low resolution after down-sampling Block carries out intra prediction, and then prediction block is input in up-sampling CNN (convolutional neural networks) and restores coding block size, passes through Rate-distortion optimization is carried out with conventional intra prediction mode and selects optimal prediction mode, improves the performance of intra prediction.

Deep video intra prediction inherits method in traditional 2D video frame, and joined some for depth map characteristic Encoding tool optimizes deep video coding efficiency.Domestic and international some scholars are based on deep video characteristic and construct prediction model, mention Serial of methods is gone out.In view of the unique visual characteristic of deep video, the propositions such as Merkle are based on geometric primitive residual coding Intra-frame prediction method to replace conventional depth video intra-frame prediction and transform method, this method is using triangle geometric units to depth Degree coding unit is modeled, and the performance of deep video intraframe predictive coding is effectively increased.Lei etc. analyze deep video and Kinematic similarity and structural similarity between color video propose the intra prediction mould based on depth colour joint cluster Formula, the multi-direction prediction technique based on weighting and the depth color border being simple and efficient are misaligned detection and treatment mechanism, effectively Improve coding efficiency.

In the implementation of the present invention, discovery at least has the following disadvantages in the prior art and deficiency by inventor:

How existing deep video coding study focus of attention is mainly based on deep video characteristic building prediction mould Type, the big multi-mode of the method for proposition are fixed, and the robustness to different scenes is lacked；Further, since deep video and color video Characteristic it is different, color video intelligent coding method, which is directly applied to deep video necessarily, will cause the decline of code efficiency.

Summary of the invention

The present invention provides intelligent coding methods in a kind of deep video frame, and the present invention is based on convolutional neural networks excellently Self-learning characteristics improve deep video coding efficiency, in detail in conjunction with depth map own content characteristic and color video similitude See below description:

Intelligent coding method in a kind of deep video frame, by means of convolutional neural networks excellent ground self-learning characteristics, and It is rebuild, be the described method comprises the following steps using color property information auxiliary deep video:

1) the variable resolution predictive coding mode being made of down-sampling-up-sampling is constructed；

2) it uses to current depth LCU down-sampling to reduce its size, obtains low resolution depth block and carry out low resolution Rate coding；

3) the low resolution depth coding unit after coding is carried out using the convolutional neural networks of color property auxiliary Sampling extracts depth characteristic and corresponding color property using residual encoding unit；

4) color property of extraction and depth characteristic are subjected to dimension-reduction treatment, then carried out two kinds of features after dimensionality reduction special Sign fusion, obtains final fusion feature；By the results added of fusion feature and discrete cosine filtering interpolation, so that training process Data be always predicted value and true value residual error；

By above-mentioned steps 1) -4) in implantation 3D-HEVC as the new intra prediction mode of one kind, and with other intra predictions Mode carries out rate distortion costs and compares to select optimal prediction modes.

The method also includes: more adjacent area information are utilized, secondary up-sampling is carried out.

The shallow-layer feature specifically:

Wherein, DF^DIt is operated for deconvolution,For the shallow-layer feature of extraction, I_DFor the input of deep stream.

The depth characteristic specifically:

N-th of RCU is represented as:

Wherein, n=1,2 ... N；For n-th of RCU process,For the input of n-th of RCU, Sum is mutually to add Son,WithFor two continuous convolutions operation in n-th of RCU.

The shallow-layer feature by extraction and depth characteristic carry out dimension-reduction treatment, then carry out two kinds of features after dimensionality reduction Fusion Features specifically:

Wherein,WithThe depth characteristic and color property for respectively residual encoding unit being used to extract,WithPoint It Wei not be for the convolution operation of depth characteristic and color property dimensionality reduction.

The beneficial effect of the technical scheme provided by the present invention is that:

1, present invention employs intelligently encodings in frame, can preferably promote code efficiency；

2, the present invention extracts depth information due to using residual encoding unit step by step, and auxiliary using colour information The reconstruction for helping depth map improves the quality for rebuilding depth map.

Detailed description of the invention

Fig. 1 is a kind of flow chart of intelligent coding method in deep video frame；

Fig. 2 is BDBR (opposite bit rate) experimental result picture.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, embodiment of the present invention is made below further Ground detailed description.

The embodiment of the present invention proposes intelligent coding method in a kind of deep video frame, first of all for effectively reducing depth view Spatial redundancy in frequency uses down-sampling-up-sampling variable resolution predictive coding mode.Then in order to which high efficiency extraction depth is special Reference breath, devises residual encoding unit and is extracted step by step to the effective information in deep video；It finally extracts corresponding colored The texture information of video is to realize the supplement to missing depth characteristic.The specific implementation steps are as follows:

One, down-sampling depth coding unit

For more efficiently compression depth video, the embodiment of the present invention to the low resolution deep video after down-sampling into Row predictive coding.The embodiment of the present invention is that unit is operated with depth coding unit, i.e. depth block.In every frame depth map, Different coding block contains different information, and a frame image of original depth video is denoted as F (x, y), every in registered depth video The maximum coding unit LCU of frame are as follows: f₁(x, y), f₂(x, y) ..., f_N(x, y).

Down-sampling is carried out to each LCU using an interpolation filter, so that its size is reduced to original half, then down-sampling LCU afterwards are as follows:Wherein the pixel value after down-sampling is former surrounding pixel Average value.For each CTU (coding tree unit) there are two types of encoding scheme, a kind of scheme is full resolution prediction mode, is used Original predictive coding method encodes original resolution coding unit f_i(x, y)；Another scheme is based on convolutional neural networks Down-sampling-up-sampling prediction mode encodes down-sampling coding unitWherein i=1,2 ..., N.The program is adopted first With, to reduce its size, obtaining low resolution depth block to current depth LCU down-samplingThen rightIt carries out Low resolution coding, finally carries out super-resolution rebuilding using convolutional neural networks, restores the size of depth block.

Two, low resolution LCU coding

The embodiment of the present invention is encoded using the depth LCU after down-sampling to save bit rate.The step mainly includes The links such as prediction, transformation, quantization and entropy coding.

Traditional distinctive several prediction mould of 35 kinds of intra prediction modes and depth map is traversed to low resolution LCU first Formula obtains optimum prediction value, then converts transformation coefficient for data by map function；It later will be continuous by quantizing process Transformation coefficient be converted to discrete section；Data code stream is incorporated into finally by entropy coding to transmit.

Three, depth block feature extraction

The embodiment of the present invention proposes the convolutional neural networks of color property auxiliary to the low resolution depth after coding Coding unit is up-sampled, to restore its size.The convolutional neural networks are a binary-flow networks, wherein deep stream Input is the low resolution depth unit that exports from encoder, and size is the half of original LCU, i.e., 32 × 32.

1) shallow-layer feature extraction

Before the feature for extracting depth CU using residual encoding unit, depth will be inputted using a warp lamination first CU maps to property field by image area.The warp lamination is also the up-sampling module of network simultaneously, and effect further includes by depth The size restoration of characteristic pattern is to target size.Remember I_DFor the input of deep stream,For the shallow-layer feature of extraction, DF^DFor deconvolution Operation, then shallow-layer feature extraction can indicate are as follows:

Wherein, the size of warp lamination center is 12 × 12, characteristic dimension 48.For with increasing network non-linear expression Ability also adds a ReLU active coating after warp lamination.

2) depth characteristic is extracted using residual encoding unit

For the noise in inhibitory character figure and reinforcing is able to reflect the validity feature of borderline region, the depth block extracted Shallow-layer feature will be by residual encoding unit (Residual Coding Unit, RCU) further process of refinement.RCU takes residual Poor structure, to realize more efficient feature extraction.Each RCU contains two continuous convolutional layers, due to simply Network performance can't be promoted by stacking convolutional layer, in order to inherit shallow-layer characteristic information and reduce amount of training data, by continuous convolution Characteristic pattern afterwards is added with the input of current RCU, forms the residual error structure of short leapfrog connection.NoteFor n-th of RCU process,For the input of n-th of RCU, Sum is phase plus operator,WithIt is operated for two continuous convolutions in n-th of RCU, that N-th of RCU can be represented as:

Wherein, n=1,2 ... N；As n=1,The size of the shallow-layer feature warp lamination center as extracted be 3 × 3, characteristic dimension 48.

Three RCU are continuously stacked in deep stream by the embodiment of the present invention, to realize the extraction step by step of depth characteristic.Each RCU possesses identical structure, and convolution kernel size and characteristic dimension are also completely the same in each RCU.NoteIt is final The depth characteristic of extraction, then can be indicated using the process that residual encoding unit carries out depth characteristic extraction are as follows:

Four, the depth unit up-sampling of color property auxiliary

In MVD (multiple views plus depth) video format, each deep video possesses corresponding color video.In intelligent oversubscription During resolution is rebuild, the reconstruction of deep video can be guided by color video feature, to improve reconstruction quality.Consider The content similarities of every depth map and corresponding cromogram into deep video, texture feature extraction of the embodiment of the present invention are assisted deep The up-sampling for spending block is rebuild.The extraction of textural characteristics will be carried out in the color flow of the luminance component input network of corresponding colored block, Wherein the structure of color flow is roughly the same with deep stream.Since the size of colored block is initially target size, carry out shallow Convolutional layer is used rather than warp lamination when layer feature extraction.

In order to realize guiding function that color property rebuilds depth block, first by the color property of extraction and depth characteristic Dimension-reduction treatment is carried out, two kinds of features after dimensionality reduction are then subjected to Fusion Features, obtain final fusion feature.

The embodiment of the present invention uses two convolutional layersWithThe characteristic pattern of color flow and deep stream is dropped respectively Dimension, and mixing operation is carried out using phase plus operator Sum, fusion process can indicate are as follows:

In order to enable the network of design quickly to restrain in training, the embodiment of the present invention also uses residual error training to optimize Network structure.The low resolution of input depth block is rebuild first to restore by discrete cosine interpolation filter to target size, Then by the results added of the characteristic pattern of fusion and discrete cosine filtering interpolation, so that the data of training process are always predicted value With the residual error of true value, achieve the purpose that reduce training data.Remember D^DIt is discrete cosine filtering interpolation as a result, R^DFinally to weigh It is building as a result, then residual error study process can indicate are as follows:

R^D=Sum (D^D,X^D)

Five, rate-distortion optimization

By intra prediction mode new as one kind in intelligent inner frame coding method implantation 3D-HEVC set forth above, and By with other intra prediction modes carry out rate distortion costs compared with (RDO) to select optimal prediction modes.View Synthesis is excellent Changing (VSO) is another advanced distortion cost measurement pattern in 3D-HEVC, and partially synthetic viewpoint is distorted and rebuilds depth by it The weighted sum of distortion is schemed as final target distortion, can effectively improve synthesis viewpoint quality.

VSO process when the low resolution coding of depth block can use corresponding colored block, at this time depth block and colored block It is not of uniform size, cause the part of proposition mode of the present invention and 3D-HEVC incompatible.Therefore in low resolution cataloged procedure Model selection is carried out using traditional RDO；Depth block is completed after up-sampling, the variable resolution proposed using VSO to this method Prediction carries out model selection with former resolution prediction again.

Six, secondary up-sampling

In order to reach higher coding gain, the embodiment of the present invention also uses secondary up-sampling.It is first due to depth block In upper sampling process, the pixel value of current block right and downside can not be utilized due to not being predicted also, when current After all LCU of frame complete intra prediction, secondary up-sampling will be carried out to utilize more adjacent area information.Secondary up-sampling The step identical with once up-sampling, only inputting depth block size becomes 128 × 128, to include more pixel letters Breath.

Embodiment 2

Illustrate experiment effect below with reference to chart

This method is incorporated into HTM16.2, carries out test experiments using three 3D video encoding standard cycle tests, They are respectively as follows: Balloons, Kendo and Newspaper.Experimental setup is the configuration of full frame interior coding, and quantization parameter is to setting For { 25/34,30/39,35/42,40/45 }.This method is using HTM16.2 original platform as benchmark algorithm, the moving party to prove The validity of method.

Fig. 2's the experimental results showed that, compared with HTM, this method realizes the saving of code rate on three cycle tests, And average BDBR decreasing value has reached 6.5%.

The embodiment of the present invention to the model of each device in addition to doing specified otherwise, the model of other devices with no restrictions, As long as the device of above-mentioned function can be completed.

It will be appreciated by those skilled in the art that attached drawing is the schematic diagram of a preferred embodiment, the embodiments of the present invention Serial number is for illustration only, does not represent the advantages or disadvantages of the embodiments.

The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims

1. intelligent coding method in a kind of deep video frame, it is characterised in that by means of convolutional neural networks excellent ground autonomous learning Characteristic, and rebuild using color property information auxiliary deep video, it the described method comprises the following steps:

2) it uses to current depth LCU down-sampling to reduce its size, obtains low resolution depth block and carry out low resolution volume Code；

3) the low resolution depth coding unit after coding is up-sampled using the convolutional neural networks of color property auxiliary, Depth characteristic and corresponding color property are extracted using residual encoding unit；

4) color property of extraction and depth characteristic are subjected to dimension-reduction treatment, two kinds of features after dimensionality reduction is then subjected to feature and are melted It closes, obtains final fusion feature；By the results added of fusion feature and discrete cosine filtering interpolation, so that the number of training process According to the residual error for being always predicted value and true value；

By above-mentioned steps 1) -4) in implantation 3D-HEVC as the new intra prediction mode of one kind, and with other intra prediction modes Rate distortion costs are carried out to compare to select optimal prediction modes.

2. intelligent coding method in a kind of deep video frame according to claim 1, which is characterized in that the method is also wrapped It includes: utilizing more adjacent area information, carry out secondary up-sampling.

3. intelligent coding method in a kind of deep video frame according to claim 1, which is characterized in that the shallow-layer feature Specifically:

4. intelligent coding method in a kind of deep video frame according to claim 1, which is characterized in that the depth characteristic Specifically:

N-th of RCU is represented as:

Wherein, n=1,2...N；For n-th of RCU process,For the input of n-th of RCU, Sum is phase plus operator,WithFor two continuous convolutions operation in n-th of RCU.

5. intelligent coding method in a kind of deep video frame according to claim 1, which is characterized in that described by extraction Shallow-layer feature and depth characteristic carry out dimension-reduction treatment, and two kinds of features after dimensionality reduction are then carried out Fusion Features specifically:

Wherein,WithThe depth characteristic and color property for respectively residual encoding unit being used to extract,WithRespectively Convolution operation for depth characteristic and color property dimensionality reduction.