CN106791864B

CN106791864B - Realization method for improving video transcoding rate based on HEVC standard

Info

Publication number: CN106791864B
Application number: CN201611119061.5A
Authority: CN
Inventors: 伏长虹; 陈则希; 王允
Original assignee: Nanjing Tech University
Current assignee: Nanjing Tech University
Priority date: 2016-12-08
Filing date: 2016-12-08
Publication date: 2019-12-27
Anticipated expiration: 2036-12-08
Also published as: CN106791864A

Abstract

The invention provides a realization method for improving video transcoding rate based on HEVC standard, which comprises the following steps: step 1, according to QP, the original video stream with high bit rate₁Extracting the related coding unit division information of the corresponding frame of the original video stream in the process of decoding into a video sequence; step 2, after the decoded video sequence is processed according to QP₂Extracting stored related coding unit division information after coding of a previous frame of a time domain in the process of coding the video stream into a low bit rate; step 3, inputting the division information obtained in the step 1 and the step 2 as features into a corresponding naive Bayes classifier which is trained to obtain a training result; step 4, if the training result is 1, continuously dividing the current frame coding unit; and if the training result is 0, stopping dividing the current frame coding unit.

Description

Realization method for improving video transcoding rate based on HEVC standard

Technical Field

The invention relates to a video transcoding technology, in particular to a realization method for improving video transcoding rate based on HEVC standard.

Background

High Efficiency Video Coding (HEVC) is a new generation of Video Coding standard jointly established by ISO/IEC Moving Picture Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG). As a successor of h.264, HEVC overcomes the defect of the h.264 macroblock mechanism in processing high definition video, is more suitable for processing high definition video, is a mainstream coding standard for a long time in the future, and will have a wide application prospect in video telephony, video conferencing, and network streaming media video on demand. With the progress of science and technology, a network video playing platform is not limited to a traditional personal computer, mobile terminals such as mobile phones and iPads also occupy a great proportion, and meanwhile, the network video playing platform is subject to network bandwidth limitation factors under different scenes, so that the urgent need of the market for video transcoding is caused at present.

Generally, video transcoding mainly includes: bit rate transcoding, resolution transcoding, video coding format conversion, etc. For bit rate transcoding, the earliest foreign research on transcoding starts with reducing the bit rate of an original compressed code stream, and aims to reduce the code rate under the conditions of low complexity and high quality output on the basis of not changing the resolution. The most direct method is a pixel domain cascade structure (CPDT), which is very simple and has no error propagation, but the computation is very complex, it is difficult to transcode in real time, and information is lost during motion re-estimation, and the video quality is also reduced. The transcoding structure of the current bit rate transcoding is mainly divided into two categories: open-loop and closed-loop configurations. The open-loop transcoding does not need motion estimation and DCT/IDCT conversion, has high efficiency, but has error diffusion, and the effective solution is to adopt an intra-frame block refreshing structure and increase the number of coding macro blocks. The closed loop structure increases algorithm complexity and storage, and can obtain ideal image quality. According to the texture data processing method, the method can be divided into: pixel domain cascade transcoding and DCT domain transcoding. The pixel domain cascade structure is a combination of partial decoding and partial encoding, and is relatively complex, while the pixel domain fast cascade transcoding structure is a prediction residual error obtained by performing motion compensation on the prediction residual error of an original code stream, and the DCT/IDCT operation can be reduced by half even. In the transcoding structure (DDT) of the DCT domain, when the motion vector is a multiple of 0 or 8, no operation is required for motion compensation in the DCT domain. For low-speed moving or still video series, DDT is less computationally intensive, but the application of this structure is limited due to some other factor. The invention is based on the latest video compression standard HEVC, and mainly aims at carrying out speed optimization on the closed-loop structure in the resolution-reducing transcoding. In the process of decoding an input video stream with a high bit rate and then re-encoding a reconstructed sequence into a video stream with a lower bit rate by using a larger Quantization Parameter (QP), the dividing mode of a Coding Unit (CU) during re-encoding is judged in advance by using information of the dividing mode of the CU in the original input video stream and a naive Bayesian classifier which is subjected to a large amount of training, the calculation amount of RDcost is reduced, and the transcoding time is greatly reduced while the video quality is ensured.

Disclosure of Invention

The method mainly utilizes machine learning in the HEVC standard, and reduces the judgment of CU partition modes during recoding through original code stream information and previous frame information, thereby reducing the calculated amount, and effectively improving the speed and efficiency of video transcoding.

A realization method for improving video transcoding rate based on HEVC standard comprises the following steps:

step 1, according to QP, the original video stream with high bit rate₁Extracting the related coding unit division information of the corresponding frame of the original video stream in the process of decoding into a video sequence;

step 2, after the decoded video sequence is processed according to QP₂Extracting stored related coding unit division information after coding of a previous frame of a time domain in the process of coding the video stream into a low bit rate;

step 3, inputting the division information obtained in the step 1 and the step 2 as features into a corresponding naive Bayes classifier which is trained to obtain a training result;

step 4, if the training result is 1, continuously dividing the current frame coding unit; if the training result is 0, stopping dividing the current frame coding unit;

the relevant coding units are coding units with the same position and size in the corresponding frames;

wherein, QP₁＜QP₂。

Compared with the prior art, the invention has the following advantages:

(1) compared with other methods which only judge the division of the CU at the encoding end, the method disclosed by the invention is quicker and more effective by utilizing the prior information of the original code stream;

(2) the invention not only utilizes the original code stream information, but also combines the CU partition information of the adjacent frames in the time domain, so that the training result is more scientific and comprehensive;

(3) the invention uses a naive Bayes classification model, the model is visual and simple, the operation is easy, the effect is very obvious, and a large number of training samples can be trained, thereby being more accurate than the traditional statistical method.

The invention is further described below with reference to the accompanying drawings.

Drawings

Fig. 1 is an HEVC full de-full codec transcoding flow diagram.

Fig. 2 is a transcoding flow diagram of the present invention.

Fig. 3 shows CU partitioning corresponding depth.

FIG. 4 classifier T₀、T₁、T₂Schematic diagrams are used.

Fig. 5 is a flow chart of a method of the present invention.

Detailed Description

With reference to fig. 5, a method for improving video transcoding rate based on the HEVC standard includes the following steps:

wherein, QP₁＜QP₂。

Extracting the partition information of a frame-related Coding Unit (CU) corresponding to an original code stream in the step 1, extracting the stored partition information of the CU after the previous frame is coded in the step 2, inputting the two partition information as features into a trained corresponding naive Bayesian classifier to obtain a class A and a class B, and further judging a CU partition mode according to the classification, thereby reducing re-coding calculation amount and improving transcoding speed.

Specifically, let Δ QP be QP₂-QP₁The quantization parameter of the video original code stream is QP₁The quantization parameter after transcoding is QP₂And QP₁＜QP₂Then the quantization parameter variance is Δ QP. For this fixed Δ QP, classifier training is performed.

The training video samples are required to cover various types as much as possible, including low-speed motion, medium-speed motion, high resolution, low resolution and the like. For example, the alternatives ParkScene, basetballdrill, BQMall, BQSquare, FourPeople; each video segment selects 50 frames, and only P-frames and B-frames.

Run the training video sample by QP₁Transcoding the high bit rate video stream to QP₂The low bit rate video stream of (2) is transcoded by full codec, as shown in fig. 1.

The extracted naive Bayes classifier training sample video is characterized in that:

delta Depth _ corresponding CU (hereinafter referred to as Delta Dc)

Delta Depth _ timeproviousCU (hereinafter referred to as Delta Dt)

Wherein, corresponding CU represents the CU with the strongest correlation with the CU to be processed in the input original video stream, timeproviouscu represents the CU with the strongest correlation with the CU to be processed in the previous frame in the time domain, wherein the CU with the strongest correlation represents the CU with the same position and size in the corresponding LCU in the corresponding frame (the same coding sequence number of the frame), and the position can be determined by the uiabsprotidx in the program; Δ Depth represents the difference between the Depth corresponding to the CU size and the deepest Depth of the partition in the CU. Note: CU depth of 64 × 64 is 0, CU depth of 32 × 32 is 1, CU depth of 16 × 16 is 2, and CU depth of 8 × 8 is 3 (see fig. 3 in particular).

Classifier training result classification:

a type: the split-flag value of the CU to be processed at the recoding end is 1, and the CU continues to be divided;

b type: and the split-flag value of the CU to be processed at the encoding end is 0, and the CU stops dividing.

The classifier model is a naive Bayes classificationMachine, three in total: t is₀，T₁，T₂. For three sizes of CUs 64 × 64, 32 × 32, 16 × 16, respectively.

Training is carried out for different trainers: training classifier T₀Extracting features and categories corresponding to all CU with the size of 64 x 64 of the encoding end in the training video; in the same way, training classifier T₁Extracting features and categories corresponding to all CU with the size of 32 × 32 at the encoding end in the training video; training classifier T₂Then, the features and categories corresponding to all the CUs with the size of 16 × 16 at the encoding end in the training video are extracted for training.

After three classifiers are trained, transcoding is carried out according to the flow shown in fig. 2, if the frame is an I frame during recoding, no processing is carried out, if the frame is a B frame or a P frame, information (delta Dc) in an original code stream and information (delta Dt) of a previous frame in a time domain are extracted according to different CU sizes, and the information (delta Dc) and the information (delta Dt) pass through corresponding classifiers (T) respectively₀，T₁，T₂) And classifying to obtain a predicted split-flag value (pSF) of the corresponding CU, and directly judging the CU division condition without performing a large amount of RDcost calculation, wherein a specific judgment flow is shown in FIG. 4.

Claims

1. A realization method for improving video transcoding rate based on HEVC standard is characterized by comprising the following steps:

wherein, QP₁<QP₂。

2. The method according to claim 1, wherein the naive Bayes classifier in step 3 is plural, and corresponds to coding units of different sizes respectively.