CN105120290A

CN105120290A - Fast coding method for depth video

Info

Publication number: CN105120290A
Application number: CN201510470699.2A
Authority: CN
Inventors: 彭宗举; 韩慧敏; 陈芬; 李鹏
Original assignee: Ningbo University
Current assignee: Ningbo University
Priority date: 2015-08-04
Filing date: 2015-08-04
Publication date: 2015-12-02
Anticipated expiration: 2035-08-04
Also published as: CN105120290B

Abstract

The invention discloses a fast coding method for a depth video. According to the method, first space domain enhancement and time domain enhancement are performed on a depth video sequence to complete preprocessing, then each frame in the preprocessed depth video sequence is coded, and in a coding process, a 3D-HEVC original coding platform is directly adopted to perform coding of the first frame and the last frame; and for the rest of frames, according to the area type that a coding unit belongs to, the categories of HEVC coding prediction modes which need to be traversed are reduced, thereby effectively reducing complexity of coding while guaranteeing virtual viewpoint drawing quality and coding rate distortion performance.

Description

A kind of deep video fast encoding method

Technical field

The present invention relates to a kind of video signal processing method, especially relate to a kind of deep video fast encoding method towards 3D-HEVC.

Background technology

Along with computer, the developing rapidly of communication and multimedia technology, the three-dimensional video system that user plays up such as 3D TV, free view-point TV, three-dimensional scenic is more and more interested.Multiple views plus depth video (MVD) is the main flow representation representing three-dimensional scenic, and MVD includes multiple views color video and deep video.In scene display, virtual view obtains virtual view by drawing based on the rendering technique of DIBR, and MVD data mode meets the demand of three-dimensional video system and wide viewing angle 3D can be supported to show and automatic stereo display.But MVD comprises googol according to amount, make the storage of data and the transmission of network face huge challenge, therefore, the deep video of multiple views is the same with color video, needs to compress quickly and efficiently.

Color video represents the actual visual information of scene, and corresponding deep video represents the geological information of scene.Mode at present for the Depth Information Acquistion of real scene mainly contains three kinds: Kinect sensor, depth camera system and estimation of Depth software.Due to the restriction of these acquiring technology, there is inaccurate and discontinuous phenomenon in deep video, particularly occur that pixel is suddenlyd change in smooth region, in the regional depth boundary alignment mistake of corresponding scene boundary, even if this just causes the encoding efficiency using up-to-date coded system also cannot reach optimum.A lot of expert proposes towards deep video processing scheme H.264/AVC, but these processing methods be not suitable for up-to-date multi-vision-point encoding scheme 3D-HEVC.

The encoder complexity of deep video is also the problem that research needs to pay close attention to.At present, 3D Video coding joint specialist group (JCT-3V) be made up of ISO and ITU is just being devoted to the standardization of the 3D extended coding framework (3D-HEVC) of efficient video coding (HEVC), and the overall coding framework of 3D-HEVC as shown in Figure 1.3D-HEVC still adopts the Quadtree Partition structure of HEVC, the corresponding deep video frame of re-encoding after a frame color video frame of encoding.Multi-view depth video is split into a series of code tree unit (CodingTreeUnits, CTUs), each CTU is divided into the coding unit (CodingUnit of four formed objects in the mode of recursive subdivision, CU), its segmentation degree of depth can be that 0,1,2,3, CU sizes are followed successively by 64 × 64,32 × 32,16 × 16,8 × 8 according to the segmentation degree of depth, wherein, maximum CU is called maximum coding unit (LargestCodingUnit, LCU).Each CU is divided into different predicting unit (PredictionUnits according to the predictive mode selected, PU), predictive mode comprises intra (intraN_N, intra2N_2N) pattern in SKIP pattern, merge pattern, interframe inter (AMP, interN_2N, inter2N_N, inter2N_2N) pattern, frame, is all employed in the process of inter prediction and interview prediction.The partitioning scheme of all coding units and the rate-distortion optimization (Rate-DistortionOptimization of predictive mode is calculated in traversal, RDO) after, the minimum a kind of predictive mode of elimination factor distortion cost value is as final predictive mode, simultaneously, determine the partitioning scheme of coding unit, the computation complexity of whole process is very high.The formula calculating rate distortion costs value in 3D-HEVC is: J=(w _synth× D _synth+ w _depth× D _depth)+λ × R, wherein, J representation rate distortion cost value, D _synthrepresent the distortion value of virtual view through drawing, or the distortion value of the deep video estimated according to drawing principle, D _depthrepresent the distortion value of deep video, w _synthfor D _synthweight, w _depthfor D _depthweight, λ is Lagrange's multiplier, the code check that R encodes under representing each predictive mode.

At present, a lot of expert is studied deep video fast coding, as the method that the people such as Shen propose relation between a kind of information such as predictive mode, motion vector utilizing color video and deep video, reduces encoder complexity; And for example the people such as More proposes a kind of information utilizing coloud coding, the Quadtree Partition of controlling depth video and the fast coding algorithm of predictive mode.But these methods do not consider the inaccurate and non-continuous event of deep video, this phenomenon can increase the complexity of coding indirectly, therefore, is necessary to study and a kind ofly can solves the fast encoding method that this phenomenon increases encoder complexity problem.

Summary of the invention

Technical problem to be solved by this invention is to provide a kind of deep video fast encoding method towards 3D-HEVC, and it is ensureing, under the prerequisite of drawing virtual view quality and encoding rate distortion performance, effectively to reduce encoder complexity.

The present invention solves the problems of the technologies described above adopted technical scheme: a kind of deep video fast encoding method, is characterized in that comprising the following steps:

1. pending high definition deep video sequence is defined as current depth video sequence;

2. preliminary treatment is carried out to current depth video sequence, preliminary treatment comprises two parts, Part I is: carry out airspace enhancement to the every frame depth image of all the other in current depth video sequence except the 1st frame depth image and last frame depth image, determines the borderline region of corresponding depth image and non-borderline region in airspace enhancement process; Part II is: carry out time domain enhancing to the deep video sequence after airspace enhancement process;

3. pending depth image current in pretreated deep video sequence is defined as present frame;

4. judge that whether present frame is the 1st frame depth image in pretreated deep video sequence or last frame depth image, if so, then directly adopt 3D-HEVC original coding platform to encode to present frame, then perform step 7.; Otherwise, perform step 5.;

5. moving region and the non-moving areas of present frame is obtained; And in the current frame, to step 2. in the corresponding region of the borderline region of the depth image corresponding with present frame that obtains as the borderline region of present frame, the region corresponding to the non-borderline region of the step 2. middle depth image corresponding with present frame obtained is as the non-borderline region of present frame;

6. determine whether each coding unit in present frame not only belongs to borderline region but also belong to moving region, for not only belonging to borderline region but also belonging to the coding unit of moving region, travel through all HEVC coding prediction mode, the minimum HEVC coding prediction mode of elimination factor distortion cost value is as the optimal prediction modes of current coded unit; For other coding unit, only travel through the SKIP pattern in all HEVC coding prediction mode and Merge pattern, the minimum pattern of elimination factor distortion cost value is as the optimal prediction modes of current coded unit; Then the optimal prediction modes of each coding unit in present frame is utilized to encode to each coding unit in present frame;

7. using depth image pending for next frame in pretreated deep video sequence as present frame, then return step and 4. continue to perform, until all depth images coding in pretreated deep video sequence is complete.

Described step 2. in the detailed process of Part I be: 2.-1a, pending depth image current in current depth video sequence is defined as present frame; If 2.-1b present frame is the 1st frame depth image in current depth video sequence or last frame depth image, then present frame is not dealt with; If present frame is not the 1st frame depth image in current depth video sequence or last frame depth image, then Gaussian filter is adopted to carry out filtering process to fracture pixels belonging to non-borderline region all in present frame, adopt self-adapting window to carry out filtering process to non-fracture pixels belonging to non-borderline region all in present frame, in present frame all belong to borderline region fracture pixel and all non-fracture pixels belonging to borderline region all do not deal with; 2.-1c, using depth image pending for next frame in current depth video sequence as present frame, then step 2.-1b continuation execution is returned, until all depth images in current depth video sequence are disposed, complete the airspace enhancement of all the other the every frame depth images in current depth video sequence except the 1st frame depth image and last frame depth image;

Described step 2. in the detailed process of Part II be: 2.-2a, suppose the depth image in current depth video sequence width and highly correspond to W and H, and suppose that the totalframes of the depth image comprised in current depth video sequence is T frame; 2.-2b, space-time transformation is carried out to the deep video sequence after airspace enhancement process, obtain the deep video sequence after space-time transformation, the pixel value being the pixel of (i, t) by coordinate position in the jth frame depth image in the deep video sequence after space-time transformation is designated as d _j(i, t), d _j(i, t)=d _t(i, j), wherein, 1≤j≤H, 1≤i≤W, 1≤t≤T, d _tin t frame depth image in deep video sequence after (i, j) represents airspace enhancement process, coordinate position is the pixel value of the pixel of (i, j); 2.-2c, adopt time-domain filtering window to carry out time domain to the every frame depth image in the deep video sequence after space-time transformation to strengthen process, obtain time domain and strengthen the deep video sequence after processing, the pixel value being the pixel of (i, t) by coordinate position in the jth frame depth image in the deep video sequence after time domain enhancing process is designated as d' _j(i, t), wherein, t ₀represent the size of time-domain filtering window, d _jin jth frame depth image in deep video sequence after (i, t') represents space-time transformation, coordinate position is the pixel value of the pixel of (i, t'), w _j(i, t') represents d _jthe weight of (i, t'); 2.-2d, to time domain strengthen process after deep video sequence carry out space-time transformation, obtain pretreated deep video sequence, the pixel value being the pixel of (i, j) by coordinate position in the t frame depth image in pretreated deep video sequence is designated as d' _t(i, j), d' _t(i, j)=d' _j(i, t).

The deterministic process of the fracture pixel of described step 2. in-1b in present frame and non-fracture pixel is: coordinate position in present frame is that the fracture of the pixel of (i, j) marks to make cr (i, j) represent; Then according to coordinate position and the pixel value of each pixel in present frame, determining the fracture mark of each pixel in present frame, is the pixel of (i, j) for coordinate position in present frame, then mark according to the fracture of the pixel in present frame, expansion value is the scope of the fracture mark of 1, detailed process is: be (i' for coordinate position in present frame, j') pixel, if the value of the fracture mark of this pixel is 1, then when the value being the fracture mark of the pixel of (i'-1, j') when coordinate position in present frame is 0, by coordinate position in present frame be the pixel of (i'-1, j') fracture mark value change to 1; The pixel being 0 by the value that present frame Fracture marks again is defined as non-fracture pixel, and the pixel being 1 by the value that present frame Fracture marks is defined as the pixel that ruptures;

Wherein, the value of cr (i, j) is 0 or 1,1≤i≤W, 1≤j≤H, W and H correspondence represents width and the height of the depth image in current depth video sequence, dp (i-1, j) represent that in present frame, coordinate position is the pixel value of the pixel of (i-1, j), dp (i, j) represent that in present frame, coordinate position is (i, the pixel value of pixel j), symbol " || " is the symbol that takes absolute value, Th ₀for the fracture judgment threshold of setting, 2≤i'≤W, 1≤j'≤H.

The fracture judgment threshold Th of described setting ₀value is 10.

Described step 2. in-1b the borderline region of present frame and non-borderline region obtain by adopting Canny boundary detection method to carry out Boundary Detection to present frame.

The high threshold th adopted in described Canny boundary detection method _hvalue is 120, and the Low threshold th adopted _lvalue is 40.

2. described step in-1b to the detailed process that non-fracture pixels belonging to non-borderline region all in present frame adopt self-adapting window to carry out filtering process is: for any one the non-fracture pixel belonging to non-borderline region in present frame, using this non-fracture pixel as central pixel point, and with n pixel for step-size in search carries out up and down and Left-right Searching, stop search when searching the fracture pixel belonging to non-borderline region or the fracture pixel belonging to borderline region or belonging to the non-fracture pixel of borderline region or search for the image boundary arriving present frame, form a crossing window, then centered by each pixel on this crossing window longitudinal axis, and with n pixel for step-size in search carries out Left-right Searching, stop search when searching the fracture pixel belonging to non-borderline region or the fracture pixel belonging to borderline region or belonging to the non-fracture pixel of borderline region or search for the image boundary arriving present frame, form a self-adapting window, again the average assignment of the pixel value of all pixels in this self-adapting window is realized filtering process to central pixel point.

Described step is the size t of time-domain filtering window in-2c 2. ₀value is 5; Described step is 2. in-2c dif _j(i, t')=| d _j(i, t'+1)-d _j(i, t') |, d _jin jth frame depth image in deep video sequence after (i, t'+1) represents space-time transformation, coordinate position is the pixel value of the pixel of (i, t'+1), and symbol " || " is the symbol that takes absolute value.

The described step 5. middle acquisition moving region of present frame and the detailed process of non-moving areas is:

-1 5., suppose the depth image in current depth video sequence width and highly correspond to W and H, and suppose that W × H can be divided exactly by 4 × 4, then coloured image corresponding for present frame be divided into the sub-block of individual non-overlapping copies;

-2 5., h pending sub-block current in coloured image corresponding for present frame is defined as current sub-block, wherein, the initial value of h is 1,

1 \leq h \leq \frac{W \times H}{4 \times 4};

-3 5., in the former frame coloured image that calculates the current sub-block coloured image corresponding with present frame corresponding region squared difference and, be designated as SSD _{pre, h},

{SSD}_{p r e, h} = Σ_{u = 1}^{4} Σ_{v = 1}^{4} {| C_{c u r} (u, v) - C_{p r e} (u, v) |}^{2};

And calculate corresponding region in a rear color image frame of the current sub-block coloured image corresponding with present frame squared difference and, be designated as SSD _{back, h}, wherein, C _cur(u, v) represents that in current sub-block, coordinate position is the pixel value of the pixel of (u, v), C _pre(u, v) represents that in region corresponding with current sub-block in the former frame coloured image of the coloured image that present frame is corresponding, coordinate position is the pixel value of the pixel of (u, v), C _back(u, v) represents that in region corresponding with current sub-block in a rear color image frame of the coloured image that present frame is corresponding, coordinate position is the pixel value of the pixel of (u, v), and symbol " || " is the symbol that takes absolute value;

5.-4, min (SSD is judged _{pre, h}, SSD _{back, h}) whether <Th set up, if set up, then determines that current sub-block is moving sub-block, otherwise, determine that current sub-block is non-athletic sub-block; Wherein, min () is for getting minimum value function, and Th is the moving sub-block judgment threshold of setting;

5.-5, h=h+1 is made, then using pending sub-block next in coloured image corresponding for present frame as current sub-block, then return step and 5.-3 continue to perform, until all sub-blocks in coloured image corresponding to present frame are disposed, wherein, "=" in h=h+1 is assignment;

5.-6, region corresponding for all moving sub-block in coloured image corresponding with it in present frame is defined as the moving region of present frame, region corresponding for all non-athletic sub-blocks in coloured image corresponding with it in present frame is defined as the non-moving areas of present frame.

Described step detailed process is 6.:

6.-1, pending maximum coding unit current in present frame is defined as current maximum coding unit;

-2 6., pending coding unit current in current maximum coding unit is defined as current coded unit, and the degree of depth at current coded unit place is defined as current depth;

6.-3, if there is the pixel belonging to the borderline region of present frame in current coded unit, and in current coded unit, there is the pixel belonging to the moving region of present frame, then think that current coded unit not only belongs to borderline region but also belong to moving region, travel through all HEVC coding prediction mode, calculate the rate distortion costs value of current coded unit under often kind of HEVC coding prediction mode, then in rate-distortion optimization process, the minimum HEVC coding prediction mode of elimination factor distortion cost value is as the optimal prediction modes of current coded unit, and obtain the rate distortion costs value under optimal prediction modes, perform step 6.-4 again,

If there is the pixel and the pixel that there is not the moving region belonging to present frame in current coded unit that belong to the borderline region of present frame in current coded unit, or there is not the pixel of the borderline region belonging to present frame in current coded unit and in current coded unit, there is the pixel belonging to the moving region of present frame, or there is not the pixel of the borderline region belonging to present frame in current coded unit and in current coded unit, there is not the pixel of the moving region belonging to present frame, then only travel through the SKIP pattern in all HEVC coding prediction mode and Merge pattern, calculate the rate distortion costs value that current coded unit is respective under SKIP pattern and Merge pattern, then in rate-distortion optimization process, the minimum pattern of elimination factor distortion cost value is as the optimal prediction modes of current coded unit, and obtain the rate distortion costs value under optimal prediction modes, perform step 6.-4 again,

-4 6., compare the optimum segmentation degree of depth of coding unit corresponding with current coded unit in coloured image corresponding to present frame and the size of current depth, if the former is greater than the latter, then perform step 6.-5; If the former is less than or equal to the latter, then using the optimum segmentation degree of depth of current depth as current coded unit, then perform step 6.-6;

6.-5, according to step 6.-3 process, obtain the identical coding unit optimal prediction modes separately of four sizes in lower one deck of current coded unit place layer and the rate distortion costs value under optimal prediction modes in an identical manner, then the rate distortion costs value sum of four coding units in lower one deck of current coded unit place layer and the size of the rate distortion costs value of current coded unit under optimal prediction modes is compared, if the former is greater than the latter, then using the optimum segmentation degree of depth of current depth as current coded unit, current coded unit is not carried out to the segmentation of lower one deck, then step 6.-6 is performed, if the former is less than or equal to the latter, then current coded unit is carried out to the segmentation of lower one deck, then using current pending coding unit in four coding units in lower one deck of current coded unit place layer as current coded unit, and using the degree of depth at current coded unit place as current depth, then return step 6.-4 continuation execution,

6.-6, the optimal prediction modes of current coded unit is utilized to encode to current coded unit, after current coded unit end-of-encode, judge whether all coding units in current maximum coding unit are all disposed, if, then determine that current maximum encoding unit encodes terminates, then perform step 6.-7, otherwise, judge whether the coding unit that four sizes in the layer of current coded unit place are identical is all disposed again, when being all disposed, judge whether the coding unit in the last layer of current coded unit place layer is maximum coding unit, if maximum coding unit, then using the maximum coding unit in the last layer of current coded unit place layer as current maximum coding unit, then step 6.-7 is performed, if not maximum coding unit, then using pending coding unit next in the last layer of current coded unit place layer as current coded unit, using the degree of depth at this current coded unit place as current depth, then step 6.-3 continuation execution are returned,

6.-7, judge that whether current maximum coding unit is last the maximum coding unit in present frame, if so, then step is performed 7., otherwise, using maximum coding unit next pending in present frame as current maximum coding unit, then return step 6.-2 continuation execution.

Compared with prior art, the invention has the advantages that:

1) the inventive method first carries out airspace enhancement to deep video sequence and time domain has strengthened preliminary treatment, again the every frame in pretreated deep video sequence is encoded, in an encoding process, 3D-HEVC original coding platform is directly adopted to encode for the 1st frame and last frame; For all the other frames, the area type belonging to coding unit, reduces the kind of the HEVC coding prediction mode needing traversal, thus when ensureing virtual viewpoint rendering quality and encoding rate distortion performance, significantly reduces the complexity of coding.

2) in the process of deep video sequence airspace enhancement; by adopting different filtering method process to the pixel in zones of different; make to obtain good protection to the larger fringe region of drawing process impact in the deep video sequence after airspace enhancement; simultaneously; carry out to a certain degree to other regions level and smooth, thus ensure that rendering quality and the encoding rate distortion performance of final virtual view.

3) in the process of deep video sequence time domain enhancing, original depth video sequence time-space domain is changed, carry out time-domain filtering to the every frame after conversion, make original depth video sequence obtain smoothly in time domain direction, the correlation of the neighbor pixel between frame and frame is enhanced.

Accompanying drawing explanation

Fig. 1 is the overall coding framework of the multiple views colored plus depth encoder 3D-HEVC that the inventive method adopts;

Fig. 2 be the inventive method totally realize block diagram;

Fig. 3 a is the frame depth image in cycle tests Newspaper1, shows the partial enlarged drawing of original depth image zones of different in the frame in figure;

The design sketch that Fig. 3 b obtains after preliminary treatment for the depth image shown in Fig. 3 a, the partial enlarged drawing of the depth image zones of different after showing preliminary treatment in the frame in figure;

Fig. 4 is cycle tests when encoding on test model HTM10.0 after preliminary treatment, the statistics of the code check situation of change obtained under different coding quantization parameter QP;

Fig. 5 is cycle tests when encoding on test model HTM10.0 after preliminary treatment, the drafting PSNR obtained under different coding quantization parameter QP and original change of drawing compared with PSNR;

Fig. 6 is the deep video after preliminary treatment, on test model HTM10.0 after coding, finally in non-moving areas and the Distribution Statistics of the predictive mode of non-borderline region.

Embodiment

Below in conjunction with accompanying drawing embodiment, the present invention is described in further detail.

A kind of deep video fast encoding method towards 3D-HEVC that the present invention proposes, Fig. 1 gives the overall coding framework of the multiple views colored plus depth encoder 3D-HEVC that the inventive method adopts, the inventive method totally realize block diagram as shown in Figure 2, the inventive method comprises the following steps:

1. pending high definition deep video sequence is defined as current depth video sequence.

2. preliminary treatment is carried out to current depth video sequence, preliminary treatment comprises two parts, Part I is: carry out airspace enhancement to the every frame depth image of all the other in current depth video sequence except the 1st frame depth image and last frame depth image, determines the borderline region of corresponding depth image and non-borderline region in airspace enhancement process; Part II is: carry out time domain enhancing to the deep video sequence after airspace enhancement process.

In this particular embodiment, step 2. in the detailed process of Part I be: 2.-1a, pending depth image current in current depth video sequence is defined as present frame.If 2.-1b present frame is the 1st frame depth image in current depth video sequence or last frame depth image, then present frame is not dealt with; If present frame is not the 1st frame depth image in current depth video sequence or last frame depth image, then Gaussian filter is adopted to carry out filtering process to fracture pixels belonging to non-borderline region all in present frame, adopt self-adapting window to carry out filtering process to non-fracture pixels belonging to non-borderline region all in present frame, in present frame all belong to borderline region fracture pixel and all non-fracture pixels belonging to borderline region all do not deal with.2.-1c, using depth image pending for next frame in current depth video sequence as present frame, then step 2.-1b continuation execution is returned, until all depth images in current depth video sequence are disposed, complete the airspace enhancement of all the other the every frame depth images in current depth video sequence except the 1st frame depth image and last frame depth image.

At this, the deterministic process of the fracture pixel of step 2. in-1b in present frame and non-fracture pixel is: coordinate position in present frame is that the fracture of the pixel of (i, j) marks to make cr (i, j) represent, then according to coordinate position and the pixel value of each pixel in present frame, determining the fracture mark of each pixel in present frame, is the pixel of (i, j) for coordinate position in present frame, then mark according to the fracture of the pixel in present frame, expansion value is the scope of the fracture mark of 1, detailed process is: be (i' for coordinate position in present frame, j') pixel, if the value of the fracture mark of this pixel is 1, then when coordinate position in present frame is (i'-1, when the value of the fracture mark of pixel j') is 0, be (i'-1 by coordinate position in present frame, the value of the fracture mark of pixel j') changes to 1, when coordinate position in present frame is (i'-1, when the value of the fracture mark of pixel j') is 1, be (i'-1 by coordinate position in present frame, the value of the fracture mark of pixel j') remains unchanged, the pixel being 0 by the value that present frame Fracture marks again is defined as non-fracture pixel, and the pixel being 1 by the value that present frame Fracture marks is defined as the pixel that ruptures, wherein, the value of cr (i, j) is 0 or 1,1≤i≤W, 1≤j≤H, W and H correspondence represents width and the height of the depth image in current depth video sequence, dp (i-1, j) represent that in present frame, coordinate position is the pixel value of the pixel of (i-1, j), dp (i, j) represent that in present frame, coordinate position is (i, the pixel value of pixel j), symbol " || " is the symbol that takes absolute value, Th ₀for the fracture judgment threshold of setting, get Th in the present embodiment ₀=10,2≤i'≤W, 1≤j'≤H.

At this, step 2. in-1b the borderline region of present frame and non-borderline region obtain by adopting existing Canny boundary detection method to carry out Boundary Detection to present frame, wherein, the high threshold th adopted in Canny boundary detection method _hvalue is 120, and the Low threshold th adopted _lvalue is 40.Detailed process is: carry out gaussian filtering process to present frame; Then in the depth image after obtaining filtering process, coordinate position is the pixel first-order difference in the horizontal direction and the vertical direction of (i, j), and correspondence is designated as g _x(i, j) and g _y(i, j); Then in the depth image after calculation of filtered process, coordinate position is gradient direction angle and the gradient magnitude of the pixel of (i, j), and correspondence is designated as θ (i, j) and g _t(i, j),

θ (i, j) = \arctan (\frac{g_{x} (i, j)}{g_{x} (i, j)}), g_{t} (i, j) = \sqrt{{(g_{x} (i, j))}^{2} + {(g_{y} (i, j))}^{2}};

Carrying out non-maxima suppression to the gradient magnitude of each pixel in the depth image after the filtering process of trying to achieve afterwards, is the pixel of (i, j) for coordinate position in the depth image after filtering process, by g _t(i, j) be the gradient magnitude g1 of most vicinity points with θ (i, j) two ends respectively _t(i, j), g2 _t(i, j) compares, if g _t(i, j) >g1 _t(i, j) and g _t(i, j) >g2 _t(i, j), then by g _t(i, j) is set to 1, otherwise, by g _t(i, j) is set to 0; Carry out Boundary Detection according to the dual threshold of setting again, obtain the borderline region of present frame and non-borderline region.

At this, 2. step in-1b to the detailed process that non-fracture pixels belonging to non-borderline region all in present frame adopt self-adapting window to carry out filtering process is: for any one the non-fracture pixel belonging to non-borderline region in present frame, using this non-fracture pixel as central pixel point, and with n pixel for step-size in search carries out up and down and Left-right Searching, stop search when searching the fracture pixel belonging to non-borderline region or the fracture pixel belonging to borderline region or belonging to the non-fracture pixel of borderline region or search for the image boundary arriving present frame, form a crossing window, then centered by each pixel on this crossing window longitudinal axis, and with n pixel for step-size in search carries out Left-right Searching, stop search when searching the fracture pixel belonging to non-borderline region or the fracture pixel belonging to borderline region or belonging to the non-fracture pixel of borderline region or search for the image boundary arriving present frame, form a self-adapting window, again the average assignment of the pixel value of all pixels in this self-adapting window is realized filtering process to central pixel point.Get n=5 in the present embodiment.

In this particular embodiment, step 2. in the detailed process of Part II be: 2.-2a, suppose the depth image in current depth video sequence width and highly correspond to W and H, and the totalframes supposing the depth image comprised in current depth video sequence is T frame, gets T=60 in the present embodiment.2.-2b, space-time transformation is carried out to the deep video sequence after airspace enhancement process, obtain the deep video sequence after space-time transformation, the pixel value being the pixel of (i, t) by coordinate position in the jth frame depth image in the deep video sequence after space-time transformation is designated as d _j(i, t), d _j(i, t)=d _t(i, j), wherein, 1≤j≤H, 1≤i≤W, 1≤t≤T, d _tin t frame depth image in deep video sequence after (i, j) represents airspace enhancement process, coordinate position is the pixel value of the pixel of (i, j).In actual process, first the deep video sequence after airspace enhancement process is considered as a three-dimensional system of coordinate, using this three-dimensional system of coordinate as former three-dimensional system of coordinate, the short transverse that the x-axis of former three-dimensional system of coordinate is the Width of depth image in the deep video sequence after airspace enhancement process, y-axis is depth image in deep video sequence after airspace enhancement process, t axle are the time orientation of the depth image in the deep video sequence after airspace enhancement process; Then space-time transformation is carried out to former three-dimensional system of coordinate, obtain new three-dimensional system of coordinate, the x-axis of new three-dimensional system of coordinate be the Width of depth image in the deep video sequence after airspace enhancement process, short transverse that time orientation that y-axis is depth image in deep video sequence after airspace enhancement process, t axle are the depth image in deep video sequence after airspace enhancement process; H frame depth image is included, the width of the depth image in the deep video sequence that new three-dimensional system of coordinate is corresponding and highly correspond to W and T in the deep video sequence that new three-dimensional system of coordinate is corresponding.2.-2c, adopt time-domain filtering window to carry out time domain to the every frame depth image in the deep video sequence after space-time transformation to strengthen process, obtain time domain and strengthen the deep video sequence after processing, the pixel value being the pixel of (i, t) by coordinate position in the jth frame depth image in the deep video sequence after time domain enhancing process is designated as d' _j(i, t), wherein, t ₀represent the size of time-domain filtering window, get t in the present embodiment ₀=5, d _jin jth frame depth image in deep video sequence after (i, t') represents space-time transformation, coordinate position is the pixel value of the pixel of (i, t'), w _j(i, t') represents d _jthe weight of (i, t'), dif _j(i, t')=| d _j(i, t'+1)-d _j(i, t') |, d _jin jth frame depth image in deep video sequence after (i, t'+1) represents space-time transformation, coordinate position is the pixel value of the pixel of (i, t'+1), and symbol " || " is the symbol that takes absolute value.2.-2d, to time domain strengthen process after deep video sequence carry out space-time transformation, obtain pretreated deep video sequence, the pixel value being the pixel of (i, j) by coordinate position in the t frame depth image in pretreated deep video sequence is designated as d' _t(i, j), d' _t(i, j)=d' _j(i, t).

3. pending depth image current in pretreated deep video sequence is defined as present frame.

4. judge that whether present frame is the 1st frame depth image in pretreated deep video sequence or last frame depth image, if so, then directly adopt 3D-HEVC original coding platform to encode to present frame, then perform step 7.; Otherwise, perform step 5..

5. moving region and the non-moving areas of present frame is obtained; And in the current frame, to step 2. in the corresponding region of the borderline region of the depth image corresponding with present frame that obtains as the borderline region of present frame, the region corresponding to the non-borderline region of the step 2. middle depth image corresponding with present frame obtained is as the non-borderline region of present frame.

In this particular embodiment, step 5. in obtain the moving region of present frame and the detailed process of non-moving areas and be:

-1 5., suppose the depth image in current depth video sequence width and highly correspond to W and H, and suppose that W × H can be divided exactly by 4 × 4, then coloured image corresponding for present frame be divided into the sub-block of individual non-overlapping copies.

1 \leq h \leq \frac{W \times H}{4 \times 4} .

{SSD}_{p r e, h} = Σ_{u = 1}^{4} Σ_{v = 1}^{4} {| C_{c u r} (u, v) - C_{p r e} (u, v) |}^{2};

And calculate corresponding region in a rear color image frame of the current sub-block coloured image corresponding with present frame squared difference and, be designated as SSD _{back, h}, wherein, C _cur(u, v) represents that in current sub-block, coordinate position is the pixel value of the pixel of (u, v), C _pre(u, v) represents that in region corresponding with current sub-block in the former frame coloured image of the coloured image that present frame is corresponding, coordinate position is the pixel value of the pixel of (u, v), C _back(u, v) represents that in region corresponding with current sub-block in a rear color image frame of the coloured image that present frame is corresponding, coordinate position is the pixel value of the pixel of (u, v), and symbol " || " is the symbol that takes absolute value.

5.-4, min (SSD is judged _{pre, h}, SSD _{back, h}) whether <Th set up, if set up, then determines that current sub-block is moving sub-block, otherwise, determine that current sub-block is non-athletic sub-block; Wherein, min () is for getting minimum value function, and Th is the moving sub-block judgment threshold of setting, gets Th=1000 in the present embodiment.

5.-5, h=h+1 is made, then using pending sub-block next in coloured image corresponding for present frame as current sub-block, then return step and 5.-3 continue to perform, until all sub-blocks in coloured image corresponding to present frame are disposed, wherein, "=" in h=h+1 is assignment.

6. determine whether each coding unit in present frame not only belongs to borderline region but also belong to moving region, for not only belonging to borderline region but also belonging to the coding unit of moving region, travel through all HEVC coding prediction mode, the minimum HEVC coding prediction mode of elimination factor distortion cost value is as the optimal prediction modes of current coded unit; For other coding unit, only travel through the SKIP pattern in all HEVC coding prediction mode and Merge pattern, the minimum pattern of elimination factor distortion cost value is as the optimal prediction modes of current coded unit; Then the optimal prediction modes of each coding unit in present frame is utilized to encode to each coding unit in present frame.

In this particular embodiment, step detailed process is 6.:

6.-1, pending maximum coding unit current in present frame is defined as current maximum coding unit.

-2 6., pending coding unit current in current maximum coding unit is defined as current coded unit, and the degree of depth at current coded unit place is defined as current depth.

If there is the pixel and the pixel that there is not the moving region belonging to present frame in current coded unit that belong to the borderline region of present frame in current coded unit, or there is not the pixel of the borderline region belonging to present frame in current coded unit and in current coded unit, there is the pixel belonging to the moving region of present frame, or there is not the pixel of the borderline region belonging to present frame in current coded unit and in current coded unit, there is not the pixel of the moving region belonging to present frame, then only travel through the SKIP pattern in all HEVC coding prediction mode and Merge pattern, calculate the rate distortion costs value that current coded unit is respective under SKIP pattern and Merge pattern, then in rate-distortion optimization process, the minimum pattern of elimination factor distortion cost value is as the optimal prediction modes of current coded unit, and obtain the rate distortion costs value under optimal prediction modes, perform step 6.-4 again.

-4 6., compare the optimum segmentation degree of depth of coding unit corresponding with current coded unit in coloured image corresponding to present frame and the size of current depth, if the former is greater than the latter, then perform step 6.-5; If the former is less than or equal to the latter, then using the optimum segmentation degree of depth of current depth as current coded unit, then perform step 6.-6.

6.-5, according to step 6.-3 process, obtain the identical coding unit optimal prediction modes separately of four sizes in lower one deck of current coded unit place layer and the rate distortion costs value under optimal prediction modes in an identical manner, then the rate distortion costs value sum of four coding units in lower one deck of current coded unit place layer and the size of the rate distortion costs value of current coded unit under optimal prediction modes is compared, if the former is greater than the latter, then using the optimum segmentation degree of depth of current depth as current coded unit, current coded unit is not carried out to the segmentation of lower one deck, then step 6.-6 is performed, if the former is less than or equal to the latter, then current coded unit is carried out to the segmentation of lower one deck, then using current pending coding unit in four coding units in lower one deck of current coded unit place layer as current coded unit, and using the degree of depth at current coded unit place as current depth, then return step 6.-4 continuation execution.

6.-6, the optimal prediction modes of current coded unit is utilized to encode to current coded unit, after current coded unit end-of-encode, judge whether all coding units in current maximum coding unit are all disposed, if, then determine that current maximum encoding unit encodes terminates, then perform step 6.-7, otherwise, judge whether the coding unit that four sizes in the layer of current coded unit place are identical is all disposed again, when being all disposed, judge whether the coding unit in the last layer of current coded unit place layer is maximum coding unit, if maximum coding unit, then using the maximum coding unit in the last layer of current coded unit place layer as current maximum coding unit, then step 6.-7 is performed, if not maximum coding unit, then using pending coding unit next in the last layer of current coded unit place layer as current coded unit, using the degree of depth at this current coded unit place as current depth, then step 6.-3 continuation execution are returned.

Below for test the inventive method, so that validity and the feasibility of the inventive method to be described.

The inventive method mainly carries out performance test on 3D-HEVC test model HTM10.0.The test environment of experiment is set to the universal test environment of JCT-3V, basic parameter is as shown in table 1, the viewpoint of cycle tests and use is as shown in table 2, in table 2, Poznan_Street sequence is provided by Poznan University of Science and Technology, Kendo and Balloons sequence is provided by Nagoya university, and Newspaper1 is provided by Guangzhou Institute Of Science And Technology.

Table 1 test environment is arranged

Table 2 cycle tests

Cycle tests	Resolution	Viewpoint	Draw viewpoint
				Balloons	1024×768	3-1	2
Kendo	1024×768	5-3	4
				Newspaper1	1024×768	4-2	3
Poznan_Street	1920×1088	3-5	4

The encoder complexity change of the inventive method adopts the minimizing ratio Δ T of the total time (the scramble time summation of color video and deep video) of 3D-HEVC coding cost _prorepresent, t _prorepresent the coding total time of the inventive method, T _orirepresent the coding total time of original coding platform.Equally, the scramble time of deep video reduces ratio employing Δ t _prorepresent, computational methods and Δ T _proidentical.Encoding rate distortion performance adopts BDBR to represent, wherein, the BDBR of preprocessing part uses the Y-PSNR PSNR of virtual view of drafting and the code check of coding depth video to obtain; The BDBR of fast coding part uses the Multi-scale model similarity MS-SSIM of virtual view and the total bitrate of encoded color video plus depth video to draw.The change ratio Δ BR of encoder bit rate _prorepresent, bR _prorepresent the encoder bit rate that the inventive method coding depth video produces, BR _orirepresent the encoder bit rate that primary platform coding depth video produces.The encoder bit rate BR that primary platform coding depth video produces _oriand the coding total time T of original coding platform _oriall the fast algorithm in HTM10.0 is closed test, as the source of initial data.

In order to the pretreated effect of deep video and the impact on encoder bit rate in the inventive method are described, local corresponding after the local of original depth video one frame and this frame preliminary treatment is carried out amplification contrast, further, by the video after original video and preliminary treatment respectively on original HTM10.0 platform test obtain encoder bit rate.Provide through the Δ BR of pretreated all deep video codings compared with original depth Video coding _prostatistical chart, and represent the difference statistical chart drawing virtual view mass change PSNR.Meanwhile, adopt BDBR represent under identical drafting virtual view quality PSNR, pretreated deep video relative to the change of original depth video frequency coding rate, with weigh pretreated deep video coding distortion performance.

Fig. 3 a gives the frame depth image in Newspaper1 sequence, and Fig. 3 b gives the design sketch that the depth image shown in Fig. 3 a obtains after preliminary treatment.In Fig. 3 a and Fig. 3 b, the frame in left side is the enlarged drawing of a part of non-borderline region, and the frame on right side is the enlarged drawing of a part of borderline region.Through relatively knowing, the pixel of the borderline region of pretreated deep video does not almost change, and for other regions, the region of especially rupturing creates filter effect in various degree.

Fig. 4 gives cycle tests when encoding on test model HTM10.0 after preliminary treatment, the statistics of the code check situation of change obtained under different coding quantization parameter QP.Fig. 4 can find out the increase along with coded quantization parameter QP, and the amplitude that the code check of coding reduces reduces, and when coded quantization parameter QP is 40, the code check of the coding of Balloons sequence is almost constant, and the code check of Kendo sequence rises slightly.Generally, all cycle testss are after preliminary treatment, and the code check of coding all reaches the effect declined in various degree, and it is maximum that the code check that wherein Poznan_Street encodes reduces.

Fig. 5 gives cycle tests when encoding on test model HTM10.0 after preliminary treatment, the drafting PSNR obtained under different coding quantization parameter QP and original change of drawing compared with PSNR.Wherein encode adopt the viewpoint of deep video and the viewpoint of drafting such as table 2 list.Draw and adopt the color video after encoding and deep video.As can be seen from Figure 5, the PSNR that the deep video after preliminary treatment is drawn declines to some extent, but the amplitude declined is all below 0.1, and the quality that Balloons sequence is drawn then slightly improves.When adopting different coding quantization parameter QP coding, overall rendering quality almost remains unchanged.

For comprehensively weighing the distortion performance of preprocess method to coding, table 3 gives deep video after the preliminary treatment of the inventive method, the situation of change of BDBR after using HTM10.0 Raw encoder coding, as can be seen from Table 3, the BDBR that encodes after preliminary treatment totally declines, and the most obvious Poznan_Street sequence B DBR that declines reaches 13.38%.Show through pretreated deep video in identical drafting virtual view quality, need less encoder bit rate, distortion performance improve.

Preliminary treatment in table 3 the inventive method is on the impact of drawing virtual view BDBR

Cycle tests	BDBR
		Balloons	-7.20％
Kendo	-3.60％
		Newspaper	-1.39％

Poznan_Street

-13.38％

For the reasonability of predictive mode advance determination method in checking fast coding, Fig. 6 gives deep video and is in non-moving areas and the predictive mode statistics of non-borderline region.As can be seen from Figure 6, the AMP model selection ratio of all cycle testss is all less than 1%, and the ratio of skip pattern is more than 90%, the toatl proportion selected for other predictive modes is then less than 10%, especially for the deep video sequence comprising large area flat site, as Poznan_Street sequence, the ratio of skip pattern reaches more than 95%.Can find out, sequence be carried out to the restriction of regional prediction pattern traversal, will the efficiency of coding be improved, larger distortion can not occur simultaneously.

In order to the binary encoding effect of the inventive method is described, table 4 gives the time decreased situation using the scramble time of the inventive method to compare HTM10.0 Raw encoder coding, can find out that the inventive method obtains the minimizing of the overall time of 45.53%, deep video then obtains the time saving of 74.13%.Along with the increase of coded quantization parameter QP, the ratio that the scramble time is saved slightly increases.Deep video used of encoding is through pretreated deep video, adds the deep video pretreated time in the scramble time, and final overall coding is saved time and deep video coding is saved time is 44.24% and 72%.Show that the complexity of encoding obtains effectively to reduce.Table 5 gives the situation of change that the code check using the inventive method coding to produce compares the encoder bit rate that HTM10.0 Raw encoder produces.As can be seen from Table 5, along with the increase of coded quantization parameter QP, the code check fall of coding reduces.To all cycle testss, the code check of coding declines all to some extent, and the code check of mean depth Video coding have dropped 24.07%.After table 6 gives the inventive method coding depth video and original HTM10.0 coding depth video, different cycle testss and different coding quantization parameter, draw the result of PSNR and MS-SSIM of virtual view.As can be seen from Table 6, the degree that the inventive method impact coding draws PSNR and MS-SSIM of viewpoint is less, the amplitude that PSNR declines is below 0.07, MS-SSIM only has Poznan_Street sequence to have dropped 0.0006 under coded quantization parameter is the condition of 35 and 40, and the MS-SSIM of other sequences remains unchanged all substantially.For considering the quality of encoder bit rate and drafting virtual view, thus weigh the distortion performance of the inventive method.Table 7 gives the inventive method final BD-MS-SSIM and BDBR result, and as can be seen from Table 7, the average BD-MS-SSIM of the inventive method improves 0.0002, and coding BDBR on average have dropped 1.65%.To sum up, the inventive method significantly reduces the complexity of coding, meanwhile, ensure that the distortion performance of coding and draws the quality of virtual view.

Table 4 the inventive method is on the impact of encode total time and coding depth video time

Table 5 the inventive method is on the impact of encoder bit rate

Table 6 the inventive method is on the impact of drawing virtual view quality

Table 7 the inventive method is on the impact of coding BD-MS-SSIM and BDBR

Claims

1. a deep video fast encoding method, is characterized in that comprising the following steps:

2. a kind of deep video fast encoding method according to claim 1, is characterized in that the detailed process of Part I is during described step is 2.: 2.-1a, pending depth image current in current depth video sequence is defined as present frame; If 2.-1b present frame is the 1st frame depth image in current depth video sequence or last frame depth image, then present frame is not dealt with; If present frame is not the 1st frame depth image in current depth video sequence or last frame depth image, then Gaussian filter is adopted to carry out filtering process to fracture pixels belonging to non-borderline region all in present frame, adopt self-adapting window to carry out filtering process to non-fracture pixels belonging to non-borderline region all in present frame, in present frame all belong to borderline region fracture pixel and all non-fracture pixels belonging to borderline region all do not deal with; 2.-1c, using depth image pending for next frame in current depth video sequence as present frame, then step 2.-1b continuation execution is returned, until all depth images in current depth video sequence are disposed, complete the airspace enhancement of all the other the every frame depth images in current depth video sequence except the 1st frame depth image and last frame depth image;

3. a kind of deep video fast encoding method according to claim 2, it is characterized in that the deterministic process of the fracture pixel of described step 2. in-1b in present frame and non-fracture pixel is: make cr (i, j) represent that in present frame, coordinate position is the fracture mark of the pixel of (i, j); Then according to coordinate position and the pixel value of each pixel in present frame, determining the fracture mark of each pixel in present frame, is the pixel of (i, j) for coordinate position in present frame, then mark according to the fracture of the pixel in present frame, expansion value is the scope of the fracture mark of 1, detailed process is: be (i' for coordinate position in present frame, j') pixel, if the value of the fracture mark of this pixel is 1, then when the value being the fracture mark of the pixel of (i'-1, j') when coordinate position in present frame is 0, by coordinate position in present frame be the pixel of (i'-1, j') fracture mark value change to 1; The pixel being 0 by the value that present frame Fracture marks again is defined as non-fracture pixel, and the pixel being 1 by the value that present frame Fracture marks is defined as the pixel that ruptures;

4. a kind of deep video fast encoding method according to claim 3, is characterized in that the fracture judgment threshold Th of described setting ₀value is 10.

5. a kind of deep video fast encoding method according to Claims 2 or 3, it is characterized in that described step 2. in-1b the borderline region of present frame and non-borderline region obtain by adopting Canny boundary detection method to carry out Boundary Detection to present frame.

6. a kind of deep video fast encoding method according to claim 5, is characterized in that the high threshold th adopted in described Canny boundary detection method _hvalue is 120, and the Low threshold th adopted _lvalue is 40.

7. a kind of deep video fast encoding method according to claim 5, it is characterized in that 2. described step in-1b to the detailed process that non-fracture pixels belonging to non-borderline region all in present frame adopt self-adapting window to carry out filtering process is: for any one the non-fracture pixel belonging to non-borderline region in present frame, using this non-fracture pixel as central pixel point, and with n pixel for step-size in search carries out up and down and Left-right Searching, stop search when searching the fracture pixel belonging to non-borderline region or the fracture pixel belonging to borderline region or belonging to the non-fracture pixel of borderline region or search for the image boundary arriving present frame, form a crossing window, then centered by each pixel on this crossing window longitudinal axis, and with n pixel for step-size in search carries out Left-right Searching, stop search when searching the fracture pixel belonging to non-borderline region or the fracture pixel belonging to borderline region or belonging to the non-fracture pixel of borderline region or search for the image boundary arriving present frame, form a self-adapting window, again the average assignment of the pixel value of all pixels in this self-adapting window is realized filtering process to central pixel point.

8. a kind of deep video fast encoding method according to claim 7, is characterized in that the described step 2. size t of time-domain filtering window in-2c ₀value is 5; Described step is 2. in-2c dif _j(i, t')=| d _j(i, t'+1)-d _j(i, t') |, d _jin jth frame depth image in deep video sequence after (i, t'+1) represents space-time transformation, coordinate position is the pixel value of the pixel of (i, t'+1), and symbol " || " is the symbol that takes absolute value.

9. a kind of deep video fast encoding method according to claim 1, is characterized in that obtaining the moving region of present frame and the detailed process of non-moving areas during described step is 5. is:

1 \leq h \leq \frac{W \times H}{4 \times 4};

-3 5., in the former frame coloured image that calculates the current sub-block coloured image corresponding with present frame corresponding region squared difference and, be designated as SSD _{pre, h}, and calculate corresponding region in a rear color image frame of the current sub-block coloured image corresponding with present frame squared difference and, be designated as SSD _{back, h}, wherein, C _cur(u, v) represents that in current sub-block, coordinate position is the pixel value of the pixel of (u, v), C _pre(u, v) represents that in region corresponding with current sub-block in the former frame coloured image of the coloured image that present frame is corresponding, coordinate position is the pixel value of the pixel of (u, v), C _back(u, v) represents that in region corresponding with current sub-block in a rear color image frame of the coloured image that present frame is corresponding, coordinate position is the pixel value of the pixel of (u, v), and symbol " || " is the symbol that takes absolute value;

10. a kind of deep video fast encoding method according to claim 1, is characterized in that described step detailed process is 6.: