WO2019179523A1 - 基于深度学习方法的块分割编码复杂度优化方法及装置 - Google Patents
基于深度学习方法的块分割编码复杂度优化方法及装置 Download PDFInfo
- Publication number
- WO2019179523A1 WO2019179523A1 PCT/CN2019/079312 CN2019079312W WO2019179523A1 WO 2019179523 A1 WO2019179523 A1 WO 2019179523A1 CN 2019079312 W CN2019079312 W CN 2019079312W WO 2019179523 A1 WO2019179523 A1 WO 2019179523A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- eth
- cnn
- lstm
- segmentation
- training
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/119—Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/12—Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
- H04N19/122—Selection of transform size, e.g. 8x8 or 2x4x8 DCT; Selection of sub-band transforms of varying structure or type
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/149—Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/157—Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
- H04N19/159—Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/18—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a set of transform coefficients
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/90—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
- H04N19/96—Tree coding, e.g. quad-tree coding
Definitions
- the present disclosure relates to the field of video coding technologies, and in particular, to a block division coding complexity optimization method and apparatus based on a deep learning method.
- the High Efficiency Video Coding (HEVC) standard can save about 50% bit rate at the same video quality. This is due to some advanced video coding techniques, such as a coding unit (CU) partition structure based on a quadtree structure. However, these techniques also bring considerable complexity.
- the prior art HEVC has an encoding time of about 255% more than H.264/AVC, which limits the practical application of the standard. Therefore, it is necessary to significantly reduce the complexity of HEVC coding under the premise that the rate-distortion (RD) performance is hardly affected.
- Early CU segmentation prediction methods are generally heuristic. These methods determine CU segmentation in advance before performing recursive search based on some features in the encoding process.
- brute force search can be simplified by extracting some intermediate features. For example, a method of determining a CU partition at a frame level, which skips the CU depth that is less frequently present in the previous frame when determining the CU depth of the current frame.
- the industry also proposes a CU segmentation decision method based on pyramidal motion divergence and based on the number of high-frequency key points.
- CU partitioning can also be determined by characterizing the complete and low complexity RD cost.
- the industry has also proposed a variety of heuristic methods to reduce coding complexity at the prediction unit (PU) and transform unit (TU) levels. For example, a fast PU size decision method has been proposed to adaptively integrate smaller PUs into larger PUs.
- the prior art also predicts the maximum probability of PU partitioning based on a coding block flag (CBF) and an RD cost of the encoded CU.
- CBF coding block flag
- the coding coefficients are modeled using a mixed Laplacian distribution and based on this, the RDO quantization process is accelerated.
- coding complexity is also simplified in the prior art at other levels of HEVC, such as intra or inter prediction mode selection, and loop filtering.
- a CU depth decision method based on a support vector machine (SVM) three-level joint classifier is also proposed to predict whether three sizes of CUs need to be segmented.
- SVM support vector machine
- These methods use a large amount of data to learn the coding rules in some aspects of HEVC to simplify or replace the violent search in the original encoding process.
- the prior art utilizes logistic regression and SVM to perform two-class modeling on CU segmentation.
- the trained model can be used to determine in advance whether each CU is split to avoid time-consuming, recursive and violent searches.
- three early termination mechanisms are proposed to estimate the optimal CTU segmentation result to simplify the CTU segmentation process in the original encoder.
- the industry has studied several intermediate features related to CU segmentation and combined these features to determine the CU segmentation depth and skip the violent RDO search, thus reducing the HEVC coding complexity.
- the technician proposed a SVM method that combines two classifications and multiple classifications, predicting CU partitioning and PU mode selection in advance, which can further reduce the coding time of HEVC.
- the above learning-based approach relies heavily on manual extraction of features. This requires a lot of prior knowledge and may ignore some hidden but valuable features.
- CNN convolutional neural network
- the present disclosure provides a block partitioning coding complexity optimization method and apparatus based on a deep learning method, which can significantly shorten the time required for determining CU segmentation during encoding while ensuring CU segmentation prediction accuracy. , effectively reduce the complexity of HEVC coding.
- the present disclosure provides a block segmentation coding complexity optimization method based on a deep learning method, including:
- the CU segmentation prediction model is a pre-established and trained model, and the model has early termination capability
- the frame coding mode is an intra mode
- the CU segmentation prediction model is a layered convolutional neural network ETH-CNN capable of early termination
- the frame coding mode is an inter mode
- the CU segmentation prediction model is an ETH-LSTM and the ETH-CNN that can be terminated early.
- the method before the step of viewing the frame coding mode currently used by the HEVC, the method further includes:
- the ETH-LSTM is constructed and the ETH-LSTM is trained.
- the step of constructing the ETH-CNN and training the ETH-CNN includes:
- the ETH-CNN corresponding to the intra mode is trained using the positive sample and the negative sample.
- the resolution of each image in the first database is 4928 ⁇ 3264;
- the first database includes: a training set, a verification set, and a test set; each of the training set, the verification set, and the test set includes four subsets;
- the resolution of each image in the first subset of the four subsets is 4928 ⁇ 3264, the resolution of each image in the second subset is 2880 ⁇ 1920, and the resolution of each image in the third subset is 1536 ⁇ 1024.
- the resolution of each of the four subsets is 768 x 512.
- the step of constructing the ETH-CNN, training the ETH-CNN, constructing the ETH-LSTM, and training the ETH-LSTM includes:
- the pre-processed video in the second database is encoded by using a HEVC standard reference program to obtain a positive sample and a negative sample in the second database;
- the ETH-CNN corresponding to the inter mode and the ETH-LSTM corresponding to the inter mode are trained by using the positive sample and the negative sample.
- the second database comprises one or more of the following resolution videos:
- the second database includes: a training set, a verification set, and a test set.
- the input of the ETH-CNN is a 64 ⁇ 64 matrix, which represents luminance information of the entire CTU, and is represented by U;
- the ETH-CNN structured output consists of three branches that represent the predicted results of the three-level HCPM: with
- the early termination mechanism of ETH-CNN can end the calculation of the full connection layer on the second and third branches ahead of time;
- the specific structure of ETH-CNN includes two pre-processing layers, three convolutional layers, one merged layer and three fully connected layers.
- the pre-processing layer is configured to perform a pre-processing operation on the matrix
- the pre-processed data is convolved with 16 4 ⁇ 4 cores to obtain 16 different feature maps to extract low-level features in the image information, in preparation for determining the CU segmentation;
- the above feature map is sequentially convolved through 24 and 32 2 ⁇ 2 cores to extract higher-level features, and finally 32 types are obtained in each B 1 branch.
- the step size of the convolution operation is equal to the side length of the core
- the fully connected layer divides the merged feature into three branches Processing, which also corresponds to the tertiary output in HCPM;
- each branch B 1 the feature vector passes through three fully connected layers in sequence: two hidden layers, and one output layer; the outputs of the two hidden layers are in turn with The output of the last layer is the final HCPM;
- the number of features in each fully connected layer is related to the branch in which it is located, and it can ensure that three branches B 1 , B 2 and B 3 respectively output 1, 4 and 16 features, corresponding to the predicted values of the three-level HCPM.
- QP is added as an external feature to the feature vector, enabling ETH-CNN to model the relationship between QP and CU segmentation.
- the predicted CU segmentation result is represented by a structured output manner of the hierarchical CU segmentation graph HCPM;
- the HCPM includes 1 ⁇ 1, 2 ⁇ 2, and 4 ⁇ 4 two-category labels at levels 1, 2, and 3, respectively, corresponding to the true value y 1 (U), with And predicted values with
- the CU segmentation result includes: a first level classification tag
- the CU segmentation result includes a second-level two-category tag or a third-level two-category tag
- the CU segmentation result includes a null value in the second-level classification tag or the third-level classification tag.
- the second and third level two-category labels exist, but sometimes the null value is null.
- the objective function of the ETH-CNN model training is cross entropy
- H( ⁇ , ⁇ )(l ⁇ 1,2,3 ⁇ ) represents the cross entropy between the predicted value and the true value label of a two classifier in HCPM
- r represents the sample number in a batch of training samples
- L r represents the objective function of the rth sample
- y 1 (U) Representing the true value
- the residual CTU obtained by fast precoding is input to the ETH-CNN, and the CU partition label in the second database is used as a true value, and the ETH-CNN of the inter mode is trained;
- the LSTM unit and the fully connected layer of each level in the ETH-LSTM are trained by the CUs of this level, that is, the ETH-LSTM level 1 is trained by the 64 ⁇ 64 CU, and the level 2 is trained by the 32 ⁇ 32 CU. , level 3 is trained by a 16 ⁇ 16 CU.
- the cross entropy is used as the loss function
- the loss function of the t-th frame of the r-th sample is L r (t)
- the training is performed by the momentum stochastic gradient descent method
- HCPM is obtained by ETH-LSTM to predict the inter-mode CU partitioning result.
- the embodiment of the present disclosure further provides a block segmentation coding complexity optimization device based on a deep learning method, including:
- a memory a processor, a bus, and a computer program stored on the memory and running on the processor, the processor executing the program, the method of any of the first aspect.
- a computer storage medium having stored thereon a computer program, wherein the program is executed by a processor to implement the method of any of the second aspects.
- the present disclosure utilizes the structured output of the HCPM to efficiently represent the CU partitioning process. Only need to run the trained ETH-CNN/ETH-LSTM model once, all the CU segmentation results in the entire CTU can be obtained in the form of one HCPM, which significantly reduces the running time of the deep neural network itself and helps to reduce the overall coding. the complexity.
- the depth ETH-CNN structure in the present disclosure solves the defect of manually extracting features in the prior art by automatically extracting features related to CU segmentation.
- the depth ETH-CNN structure has more trainable parameters than the CNN structure in the prior art, which significantly improves the prediction accuracy of the CU segmentation.
- the depth ETH-LSTM model proposed in the present disclosure is for learning long-term and short-term dependencies of CU partitioning between different frames of an inter mode. For the first time in the present disclosure, LSTM is used to predict CU partitioning to reduce HEVC coding complexity.
- a CU partition database is established in advance for the intra mode and the inter mode. Compared to other methods in the prior art, it only relies on the existing JCT-VC database, which is much smaller than the database of the present disclosure.
- FIG. 1 is a schematic diagram of a rate distortion cost check and comparison in the prior art
- FIG. 2 is a schematic diagram of a CU partition structure
- FIG. 3 is a schematic diagram of an HCPM according to an embodiment of the present disclosure.
- FIG. 4 is a schematic diagram of an ETH-CNN structure according to an embodiment of the present disclosure.
- FIG. 5 is a schematic diagram of an ETH-LSTM structure according to an embodiment of the present disclosure.
- FIG. 6 is a schematic flowchart of a block partition coding complexity optimization method based on a deep learning method according to an embodiment of the present disclosure
- FIG. 7 is a schematic structural diagram of a block division coding complexity optimization apparatus based on a depth learning method according to an embodiment of the present disclosure
- FIG. 8 is a schematic diagram of using ETH-LSTM according to an embodiment of the present disclosure.
- Deep learning does not require manual extraction of features during the encoding process, but automatically extracts a variety of features associated with the encoded results from large-scale data.
- in-depth research using deep learning to reduce coding complexity is still rare.
- a shallow CNN structure is mainly used in the CU segmentation prediction of the intra mode, and the CNN structure includes only two convolution layers, each containing 6 and 16 3 ⁇ 3 convolution kernels. .
- the work of simplifying coding complexity without deep learning has not explored the CU partition correlation between frames of different distances.
- the embodiments of the present disclosure propose a CU partition prediction model based on ETH-CNN and ETH-LSTM deep network structure, which is used for accurately predicting CU segmentation results, reducing HEVC intra-frame and inter-frame complexity, ie, reducing coding complexity. degree.
- the HCPM of the embodiment of the present disclosure differs from the conventional method in determining whether a single CU is split, respectively, but predicts the CU partitioning condition in the entire CTU at a time through hierarchical structured output.
- the deep CNN structure is improved by introducing an early termination mechanism for reducing the complexity of the HEVC intra mode.
- the core improvement points of the present disclosure may include: 1. Constructing a large-scale CU partitioning database suitable for HEVC intra and inter mode, and promoting research on reducing HECV complexity based on deep learning. 2. Propose a deep CNN network, ETH-CNN, structured output of CU partitioning by HCPM to reduce the complexity of HEVC intra mode. A deep LSTM network, ETH-LSTM, is proposed to combine with ETH-CNN to learn the time-space correlation of CU partitioning, which is used to reduce the complexity of HEVC inter-frame mode.
- the embodiment of the present disclosure proposes a block segmentation coding complexity optimization method based on a deep learning method, which is applicable to both intra-frame and inter-frame modes.
- This method can learn the entire coding tree unit (CTU) from the above database.
- CU split case That is, the hierarchical CU partition map (HCPM) is used to efficiently represent the CU partition in the entire CTU.
- HCPM hierarchical CU partition map
- the deep learning network structure can be more "deep", so that by learning enough parameters to explore a variety of CU partitioning modes.
- the deep learning method of the embodiment of the present disclosure introduces an early terminated hierarchical CNN (ETH-CNN), and generates a structured HCPM with a layered idea. This early termination can save the computation time of CNN itself and promote the reduction of intra mode HEVC coding complexity.
- the embodiment of the present disclosure also introduces an early terminated hierarchical LSTM (ETH-LSTM) suitable for inter mode.
- ETH-LSTM the temporal correlation of CU partitioning can be learned in the LSTM unit.
- the ETH-LSTM After taking the features in the ETH-CNN as input, the ETH-LSTM combines the learned LSTM unit with the early termination mechanism to output the HCPM hierarchically. As such, the above method can be effectively used to reduce the coding complexity of the HEVC inter mode.
- the block division coding complexity optimization method based on the depth learning method of the present disclosure may include the following steps:
- CU segmentation prediction model corresponding to the frame coding mode, where the CU segmentation prediction model is a model that is pre-established and trained and has an early termination mechanism.
- the CU segmentation prediction result in the HEVC is predicted according to the selected CU segmentation prediction model, and the entire coding tree unit CTU is segmented according to the predicted CU segmentation result.
- the above method may further include the step 600 not shown in the following figure:
- the frame coding mode is an intra mode
- the CU segmentation prediction model is ETH-CNN; at this time, only ETH-CNN is constructed, and ETH-CNN can be trained.
- the frame coding mode is an inter mode
- the CU segmentation prediction model is ETH-LSTM and ETH-CNN, that is, ETH-CNN is constructed, ETH-CNN is trained, ETH-LSTM is constructed, and ETH-LSTM is trained. That is to say, a long-short-term memory structure is designed to learn the time-domain dependence of the inter-mode CU partitioning, and then the CNN is combined with the LSTM to predict the inter-mode CU partitioning. In this way, the HEVC coding complexity of the inter mode can be significantly reduced.
- the training of the CU segmentation prediction model for the intra mode may include the following steps:
- the training of the CU partition prediction model for the inter mode may include the following steps:
- the ETH-CNN corresponding to the inter mode and the ETH-LSTM corresponding to the inter mode are trained by using the positive sample and the negative sample.
- ETH-LSTM can effectively reduce the complexity of the HEVC inter mode.
- a large-scale inter-mode CU partition database is established in the embodiment of the present disclosure, and the database covers the intra mode (2000 lossless images, which are compressed by 4 quantization parameters (QP)) and Inter mode (111 lossless images, compressed with 4 QPs), while facilitating research on reducing HEVC complexity based on deep learning.
- QP quantization parameters
- the CTU partition structure with CU partition as the core is one of the main components of the HEVC standard.
- the default size of the CTU is 64 ⁇ 64 pixels.
- a CTU can contain either a single CU or a number of smaller CUs based on the quadtree recursive structure.
- the default minimum size of the CU is 8 ⁇ 8.
- the CTU or CU size can be set before encoding, that is, the maximum and minimum CTU or CU size are artificially set according to the encoding requirements. Therefore, CUs in CTUs come in many possible sizes.
- the CU size in each CTU is determined by a recursive search.
- This process in the standard encoder is a violent search process that includes a top-down inspection process and a bottom-up comparison process.
- Figure 1 illustrates the RD cost checking and comparison process between the parent CU and its four sub-CUs.
- the encoder checks the RD cost of the entire CTU and retrieves the RD cost of its sub-CU. For each sub-CU, if there are still possible sub-CUs, then check the RD cost of each sub-CU for the next generation. ... so recursive until the smallest size CU is checked.
- the RD cost of the parent CU is represented as R pa
- the RD cost of the child CU is expressed as Where m ⁇ ⁇ 1, 2, 3, 4 ⁇ represents the sequence number of each sub-CU. Then, by comparing the RD loss of the parent CU and the child CU, it is determined whether the parent CU needs to be split. As shown in Figure 1-(b), if Then the parent CU needs to be split, otherwise it is not needed. It is worth noting that when deciding whether to split the CU, it is also necessary to consider the RD cost of the split flag itself. After the complete RDO search process, the CU segmentation result with the lowest RD cost is obtained. Note that recursive RDO searches are extremely time consuming.
- CTU a 64 ⁇ 64 it is necessary to check 85 may be of the CU, which comprises: CU a 64 ⁇ 64, the four CU 32 ⁇ 32, the four 2 CU 16 ⁇ 16 and four 3 of 8 ⁇ 8 CU.
- the encoder precodes the CU, in the process it is necessary to encode the possible prediction and transform modes.
- all 85 possible CUs must be precoded, which occupies most of the coding time.
- the final CU partitioning result only retains 1 (CTU is not split) to 64 (the entire CTU is divided into CUs of minimum size 8 ⁇ 8) CUs, far less than all 85. Therefore, if a reasonable CU segmentation result can be predicted in advance, the RD cost checking process of up to 84 and a minimum of 21 CUs can be omitted, thereby achieving the purpose of reducing coding complexity.
- the first database of a large-scale CU partition database (CU partition of HEVC-Intra, CPH-Intra) suitable for HEVC intra mode is described below.
- the first database is the first database for CU partitioning in HEVC.
- first 2000 images of resolution 4928 ⁇ 3264 were selected from the Raw Images Dataset (RAISE).
- the 2,000 images were randomly divided into a training set (1700), a validation set (100), and a test set (200).
- each set is equally divided into 4 subsets: one of the subsets remains unchanged from the original resolution, and the other three subsets downsample the original image to 2880 ⁇ 1920, 1536 ⁇ 1024, and 768 ⁇ 512, respectively.
- the CPH-Intra database contains images of multiple resolutions, ensuring the diversity of training data for CU segmentation.
- the above images are then encoded using HEVC standard reference software such as HM16.5.
- 4 different QPs ⁇ 22, 27, 32, 37 ⁇ are used for encoding, corresponding to the All-Intra (AI) configuration of the standard encoder (file encoder_intra_main.cfg).
- the CPH-Intra database consists of 110,405,784 samples. All samples are divided into 12 sub-databases according to their QP value and CU size. The segmentation (49.2%) is close to the number of CUs without segmentation (50.8%), which ensures positive and negative. The sample is relatively balanced.
- the CU partition database that establishes the inter mode is the second database: CPH-Inter database.
- CPH-Inter database In order to establish the second database, first select 111 lossless videos, including 6 of the 1080P (1920 ⁇ 1080) video, and 18 AE standards recommended by the Joint Collaborative Team on Video Coding (JCT-VC). Test the video and 87 videos from Xiph.org.
- the second database contains videos of various resolutions: SIF (352 ⁇ 240), CIF (352 ⁇ 288), NTSC (720 ⁇ 486), 4CIF (704 ⁇ 576), 240p (416 ⁇ 240), 480p.
- the HEVC encoded video only supports 8 ⁇ 8 multiple resolution, it is necessary to adjust the video that does not meet the requirement. Therefore, in the second database, the bottom edge of the NTSC video is uniformly cropped, so that the resolution is changed. It is 720 ⁇ 480. At the same time, if the video is greater than 10 seconds, it will be cropped to 10 seconds.
- the above video is divided into training sets (83), verification sets (10) and test sets (18) that do not overlap each other.
- the video in the test set was derived from 18 standard sequences of JCT-VC.
- the CPH-Inter database is also encoded using HM16.5 under four QPs ⁇ 22, 27, 32, 37 ⁇ .
- LDP Low Delay P
- LDB Low Delay B
- RA Random Access
- HCPM Hierarchical CU partition map
- the default CU can take four different sizes: 64 ⁇ 64, 32 ⁇ 32, 16 ⁇ 16 and 8 ⁇ 8, corresponding to CU depths: 0, 1, 2 and 3.
- a CU of a non-minimum size (16 ⁇ 16 or more) may or may not be divided.
- the entire CU segmentation process can be regarded as a 3-level two-category tag.
- a CU with a depth of 0 is denoted as U.
- the sub-CU whose depth is 3 is 3
- the subscripts i, j, k ⁇ ⁇ 1, 2, 3, 4 ⁇ represent the sequence numbers of each sub-CU in U, U i and U i,j , respectively.
- the hierarchical CU split label described above is shown by the downward arrow in FIG.
- the standard HEVC encoder obtains the CU partition label y 1 (U) through a time-consuming RDO process. with of.
- these tags can be predicted by machine learning to replace the traditional RDO process.
- this is difficult to predict in one step by a simple multi-level classifier.
- the predicted CU partition label should be performed layer by layer, that is, the label y 1 (U) is separately divided for each layer of the CU, with Make predictions and record the predictions as with
- the present embodiment utilizes a hierarchical CU partition map HCPM to efficiently represent the CU partition result with a structured output. In this way, the trained model can be called once, that is, the CU segmentation result in the entire CTU can be predicted, which greatly reduces the calculation time of the prediction process itself.
- Figure 3 is an example of HCPM that hierarchically represents a CU partitioned label as a structured output.
- the HCPM includes 1 ⁇ 1, 2 ⁇ 2, and 4 ⁇ 4 two-category labels at levels 1, 2, and 3, respectively, corresponding to the true value y 1 (U), with And predicted values with Regardless of the CU partitioning situation, the first level classification label must exist; but when U or U i is not divided, the corresponding sub CU or Does not exist, at this time will be in HCPM with The label is set to null (null), as shown by "-" in Figure 3.
- the main task of the method of the present embodiment is to predict the segmentation result of the CU from the image information of the CTU, and the input information is represented by a matrix, which has significant spatial correlation, so in the present embodiment, the HCPM is modeled by CNN.
- the input of ETH-CNN is a 64 ⁇ 64 matrix, which represents the luminance information of the entire CTU, and is represented by U.
- the ETH-CNN structured output consists of three branches that represent the predicted results of the three-level HCPM: with Compared with the ordinary CNN structure, ETH-CNN introduces an early termination mechanism, which can end the calculation of the full connection layer on the second and third branches ahead of time.
- the specific structure of ETH-CNN consists of two pre-processing layers, three convolutional layers, one merged layer and three fully connected layers.
- the CTU original luminance matrix (64 ⁇ 64) is subjected to pre-processing such as de-averaging and downsampling.
- pre-processing such as de-averaging and downsampling.
- the input information is in three parallel branches. Processing and transformation. De-averaging operation in three branches The brightness matrix of each input CTU will subtract the average brightness within a certain range of the image to reduce the difference in brightness between the images.
- the luminance matrix is subtracted from the overall luminance of the CTU, corresponding to one prediction result of the first stage of the HCPM.
- the 64 ⁇ 64 luma matrix can be divided into 2 ⁇ 2 non-overlapping 32 ⁇ 32 units, and each unit is respectively subtracted from its internal average brightness, which corresponds to the 4 labels of the HCPM second level.
- the matrix will be divided into 64 ⁇ 64 luminance B 3 is 4 ⁇ 4 non-overlapping units of 16 ⁇ 16, and then to operate the mean within each unit, corresponding to the third stage HCPM tabs 4 ⁇ 4
- the de-scoring luminance matrix is continuously downsampled, as shown in the figure.
- converting the matrix size to 16x16 and 32x32 further reduces subsequent computational complexity. Moreover, by selective downsampling, it is ensured that the output size of the subsequent convolution layer in B 1 to B 3 is consistent with the number of output labels of the first to third stages of the HCPM, so that the output result of the convolution layer is relatively clear and clear. significance.
- each branch B l a three-layer convolution operation is performed on all preprocessed data, expressed as with In the same layer, the convolution kernels of all three branches are the same size.
- the pre-processed data is convolved with 16 4 ⁇ 4 cores to obtain 16 different feature maps to extract low-level features in the image information to prepare for determining the CU partition.
- the above feature map is sequentially convolved through 24 and 32 2 ⁇ 2 cores to extract higher-level features, and finally 32 kinds are obtained in each B l branch.
- Feature map is sequentially convolved through 24 and 32 2 ⁇ 2 cores to extract higher-level features.
- the step size of the convolution operation is equal to the side length of the core, which can achieve non-overlapping convolution operations, and most convolution kernels are 8 ⁇ 8, 16 ⁇ 16, 32 ⁇ 32 or 64. ⁇ 64, etc. (the integer powers of both sides are 2), which correspond exactly to the position and size of the non-overlapping CUs in HEVC.
- ⁇ Merging layer Will three branches All features of the second and third convolutional layers are grouped together and combined into a vector. As shown in Figure 4, the features of this layer are composed of a combination of feature maps of six sources, namely with In order to obtain a variety of global and local features. After the feature is merged, in the subsequent fully connected layer, the features in the complete CTU can be utilized to predict the segmentation result of a certain level of CU in the HCPM, not limited to a certain branch B 1 , B 2 or B 3 . Characteristics.
- all convolutional layers and the first and second fully connected layers are activated with rectified linear units (ReLU) to introduce appropriate sparsity in the network to improve training efficiency.
- All branches The third fully connected layer, the output layer, is activated by a sigmoid function, so that the output value is within (0,1), which is compatible with the two-class label in HCPM.
- ETH-CNN The specific configuration of ETH-CNN is shown in Table 1. From the table, there are 1,287,189 trainable parameters in the network. Compared to the shallow CNN of the prior art, ETH-CNN has a higher network capacity and can model the CU partitioning problem more efficiently. Thanks to more than 100 million training samples in the CPH-Intra database, the network is able to reduce the risk of overfitting under a wide range of parameters. In addition, using the same network to predict the output of all three levels in HCPM is also a major advantage of the ETH-CNN, which makes the network predict y 1 (U), with The features in the convolutional layer and the merged layer can be shared.
- ETH-CNN uses HCPM as the output, it has the characteristics of network structure sharing and parameter sharing, which can be accurate. Under the premise of predicting CU segmentation, the computational complexity of the network itself is significantly reduced, further reducing the overall complexity of the coding.
- H( ⁇ , ⁇ )(l ⁇ 1,2,3 ⁇ ) represents the cross entropy between the predicted value and the true value label of a two classifier in the HCPM. Considering that some true value tags do not exist, such as in Figure 2. Only valid true and predicted values ( and ) is counted in the objective function.
- the objective function on a batch of samples is the average of all sample objective functions:
- each part should be randomly selected in a large number of training samples as a network input, so the momentum random gradient descent method is used for optimization.
- the CU segmentation in the HEVC inter mode has a certain correlation in time. For example, the closer the frame distance is, the more similar the CU segmentation result is; the frame distance is increased, and the degree of similarity is reduced.
- the present disclosure further proposes an ETH-LSTM network to learn the long and short-term dependencies of inter-frame CU partitioning.
- the overall framework of ETH-LSTM is shown in Figure 5.
- the network takes the residual CTU as input.
- the residual here is obtained by fast precoding the current frame. This process is similar to the standard encoding process. The only difference is that the CU and PU are forced to the maximum size of 64 ⁇ 64 to save time. Although the extra precoding process brings time redundancy, it only accounts for 3% of the standard encoding time and does not significantly affect the performance of the proposed algorithm.
- the residual CTU is input to the ETH-CNN.
- the parameters in the ETH-CNN are retrained by the residual CTU in the CPH-Inter database and the true value of the CU partition.
- the features of the 7th layer (1st fully connected layer) in the ETH-CNN are output in each frame. Send ETH-LSTM for later processing.
- ETH-LSTM the three-level LSTM used to determine the CU depth is shown in Figure 5.
- the feature vector output from the LSTM unit passes through two fully connected layers in turn, and each fully connected layer also contains two external features: the QP value and the frame order of the current frame in the GOP.
- the frame order is represented by a one-hot vector.
- the output characteristics of the LSTM unit and the output characteristics of the first fully connected layer are denoted as f' 1-l (t) and f' 2-l (t), respectively.
- the second fully connected layer then outputs the probability of CU partitioning, which is the result of the two classifications in HCPM.
- the early termination mechanism is also introduced in ETH-LSTM. Wherein, if the first-level LSTM predicts that the CU is not split, then the second level of the HCPM The two fully connected layers will be skipped and terminated early.
- ETH-LSTM is output in the form of HCPM, that is, the result of the division of the current CTU in the t-th frame.
- the ETH-LSTM can utilize the segmentation results for the same location CTU in the previous frame by learning the long and short time correlation of the CU segmentation with different levels of LSTM units.
- the LSTM units of each level in the ETH-LSTM are trained by the CUs of this level, that is, the ETH-LSTM level 1 is trained by the 64 ⁇ 64 CU, and the level 2 is trained by the 32 ⁇ 32 CU.
- Level 3 is trained by a 16x16 CU.
- the LSTM unit of the t-th frame of the first level is taken as an example to introduce the learning mechanism of the ETH-LSTM.
- the LSTM network consists of three types of gates: the input gate i l (t), the output gate o l (t), and the forgotten gate g l (t).
- the input gate i l (t) of the current frame LSTM ie, the feature of the first fully connected layer of ETH-CNN
- the previous frame LSTM output feature f' 1-l (t-1) Given the input feature f 1-l (t) of the current frame LSTM (ie, the feature of the first fully connected layer of ETH-CNN) and the previous frame LSTM output feature f' 1-l (t-1), then The three doors can be expressed as:
- ⁇ ( ⁇ ) represents a sigmoid function.
- W i , W o and W f are the trainable parameters of the three gates, and b i , b o and b f are the corresponding offsets.
- the LSTM unit updates the state of the t-th frame with the following strategy:
- LSTM output unit f' 1-l (t) can be expressed as:
- the lengths of the state vector c l (t) and the output vector f' 1-l (t) are the same as the input vector f 1-l (t).
- the configuration of the ETH-LSTM is shown in Table 2, which includes all the trainable parameters.
- the training is performed using the momentum stochastic gradient descent method.
- the HCPM can be obtained by ETH-LSTM to predict the inter-mode CU segmentation results.
- Step P1 initializing the current frame.
- Step P2 all CTUs of the current frame:
- Step P3 Extract residuals of all CTUs of the current frame.
- the residual is the residual mentioned in the HEVC standard, which is the difference between the result obtained after each PU prediction and the original image.
- the source of the image information is different, such as it is possible to use the previous frame prediction, or it is possible to predict with a certain frame earlier, and so on.
- the residuals of all CTUs eventually form a residual frame, so it is difficult to say which frame and which frame the current frame residual is subtracted because the prediction source of each PU is different.
- Step P4 All CTUs of the current frame:
- Step P5 Perform post-processing on the current frame, such as loop filtering.
- the embodiment of the present disclosure uses the trained deep neural network for prediction, and can be implemented by a general deep learning framework, such as Tensorflow, caffe, pytouch, etc., as long as the above ETH- can be constructed.
- CNN and ETH-LSTM are all right. For example, you can use the Python language to call Tensorflow.
- ETH-LSTM is only used in inter mode because LSTM is used to extract inter-frame dependencies in image features.
- Inter mode includes three sub-modes, LDP (Low Delay P), LDB (Low Delay B), and RA (Random Access)
- these three sub-modes have multiple configurations, and the performance of the test algorithm is based on the standard configuration of each seed mode.
- the frame order is IPPPPPP..., that is, the first frame is an I frame (pure intra prediction), and then all frames are P frames (supporting intra prediction, or inter prediction of a single reference frame) .
- the I frame is predicted by ETH-CNN, and only the P frame is input to the LSTM.
- the LSTM time length is set to 20, and in order to increase the number of training samples, there are 10 frames overlap between two adjacent LSTMs. That is, the first to 20th frames, the 11th to 30th frames, the 21st to the 40th frames, and the like other than the I frame are put into the same LSTM for training.
- the LSTM length is set to the number of frames of all P frames, that is, all P frames are continuously placed in the same LSTM until the last frame of the video.
- the frame order is IBBBBBB..., that is, the first frame is an I frame, and then all frames are B frames (supporting intra prediction, or inter prediction of dual reference frames).
- the LSTM time length is also the same as the LDP mode.
- the standard RA is slightly more complicated, and the frame encoding order is different from the playback order.
- information is passed in coding order, i.e., the first encoded frame is first input into the LSTM.
- the coding order of the frames is I(BBB...BIBBBBBBB)(BBB...BIBBBBBBB)(BBB...BIBBBBBBB). That is, the first frame is an I frame, and then every 32 frames, the 25th frame in each group is an I frame, and the other frames are B frames. Because there is a 32-frame period, regardless of the training or test phase, the LSTM length is set to 32, exactly one set corresponds to one LSTM, and there is no overlap between two adjacent LSTMs.
- the test does not specifically distinguish between the I frame and the B frame. Instead, the 32 frames of each group are input into the LSTM, and the CM partition is determined by the HCPM outputted by the LSTM at each moment. In this way, each group of 32 frames is a whole, and there is no breakpoint, and the information can be continuously transmitted.
- the LSTM length setting in this embodiment is flexible and configured according to actual requirements.
- block division coding complexity optimization method based on the depth learning method of the above-described embodiments of the present disclosure may be implemented by a block division coding complexity optimization apparatus. As shown in Figure 7.
- the block partitioning coding complexity optimization apparatus based on the deep learning method may include a processor 501 and a memory 502 storing computer program instructions.
- the processor 501 may include a central processing unit (CPU), or an application specific integrated circuit (ASIC), or may be configured to implement one or more integrated circuits of the embodiments of the present disclosure.
- CPU central processing unit
- ASIC application specific integrated circuit
- Memory 502 can include mass storage for data or instructions.
- the memory 502 can include a hard disk drive (HDD), a floppy disk drive, a flash memory, an optical disk, a magneto-optical disk, a magnetic tape, or a Universal Serial Bus (USB) drive, or two or more. A combination of more than one of these.
- Memory 502 may include removable or non-removable (or fixed) media, where appropriate.
- Memory 502 may be internal or external to the data processing device, where appropriate.
- memory 502 is a non-volatile solid state memory.
- memory 502 includes a read only memory (ROM).
- the ROM may be a mask programmed ROM, a programmable ROM (PROM), an erasable PROM (EPROM), an electrically erasable PROM (EEPROM), an electrically rewritable ROM (EAROM) or flash memory or A combination of two or more of these.
- PROM programmable ROM
- EPROM erasable PROM
- EEPROM electrically erasable PROM
- EAROM electrically rewritable ROM
- flash memory or A combination of two or more of these.
- the processor 501 implements any of the block division coding complexity optimization methods in the above embodiments by reading and executing computer program instructions stored in the memory 502.
- the block-segment coding complexity optimization apparatus based on the deep learning method may further include a communication interface 503 and a bus 510. As shown in FIG. 7, the processor 501, the memory 502, and the communication interface 503 are connected by the bus 510 and complete communication with each other.
- the communication interface 503 is mainly used to implement communication between modules, devices, units and/or devices in the embodiments of the present disclosure.
- Bus 510 includes hardware, software, or both, coupling the components of the above described devices together.
- the bus may include an accelerated graphics port (AGP) or other graphics bus, an enhanced industry standard architecture (EISA) bus, a front side bus (FSB), a super transfer (HT) interconnect, an industry standard architecture (ISA).
- AGP accelerated graphics port
- EISA enhanced industry standard architecture
- FBB front side bus
- HT super transfer
- ISA industry standard architecture
- Bus Infinite Bandwidth Interconnect
- LPC Low Pin Count
- MCA Micro Channel Architecture
- PCI Peripheral Component Interconnect
- PCI-X PCI-Express
- SATA Serial Advanced Technology Attachment
- VLB Video Electronics Standards Association Local
- Bus 510 may include one or more buses, where appropriate. Although a particular bus is described and illustrated with respect to embodiments of the present disclosure, this disclosure contemplates any suitable bus or interconnect.
- the embodiment of the present disclosure may be implemented by providing a computer readable storage medium.
- the computer readable storage medium stores computer program instructions; when the computer program instructions are executed by the processor, any one of the above embodiments is implemented.
- the functional blocks shown in the block diagrams described above may be implemented as hardware, software, firmware, or a combination thereof.
- hardware When implemented in hardware, it can be, for example, an electronic circuit, an application specific integrated circuit (ASIC), suitable firmware, plug-ins, function cards, and the like.
- ASIC application specific integrated circuit
- elements of the present disclosure are programs or code segments that are used to perform the required tasks.
- the program or code segments can be stored in a machine readable medium or transmitted over a transmission medium or communication link through a data signal carried in the carrier.
- a "machine-readable medium” can include any medium that can store or transfer information.
- machine-readable media examples include electronic circuits, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio frequency (RF) links, and the like.
- the code segments can be downloaded via a computer network such as the Internet, an intranet, and the like.
- the exemplary embodiments mentioned in the present disclosure describe some methods or systems based on a series of steps or devices.
- the present disclosure is not limited to the order of the above steps, that is, the steps may be performed in the order mentioned in the embodiment, or may be different from the order in the embodiment, or several steps may be simultaneously performed.
- the present disclosure has the beneficial effects of: (1)
- the present disclosure utilizes the structured output of the HCPM to efficiently represent the CU partitioning process as compared to the prior art three-level CU partitioning tag. Only need to run the trained ETH-CNN/ETH-LSTM model once, all the CU segmentation results in the entire CTU can be obtained in the form of one HCPM, which significantly reduces the running time of the deep neural network itself and helps to reduce the overall coding. the complexity.
- the depth ETH-CNN structure in the present disclosure solves the defect of manually extracting features in the prior art by automatically extracting features related to CU segmentation.
- the depth ETH-CNN structure has more trainable parameters than the CNN structure in the prior art, which significantly improves the prediction accuracy of the CU segmentation.
- the depth ETH-LSTM model proposed in the present disclosure is for learning long-term and short-term dependencies of CU partitioning between different frames of an inter mode. For the first time in the present disclosure, LSTM is used to predict CU partitioning to reduce HEVC coding complexity.
- a CU partition database is established in advance for the intra mode and the inter mode. Compared to other methods in the prior art, it only relies on the existing JCT-VC database, which is much smaller than the database of the present disclosure.
- the present disclosure has strong industrial applicability.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Algebra (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Discrete Mathematics (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (12)
- 一种基于深度学习方法的块分割编码复杂度优化方法,其特征在于,包括:在高效率视频编码HEVC中,查看所述HEVC当前使用的帧编码模式;根据所述帧编码模式选取与所述帧编码模式对应的编码单元CU分割预测模型;所述CU分割预测模型为预先建立并训练的模型;根据选取的所述CU分割预测模型预测所述HEVC中的CU分割结果,根据预测的所述CU分割结果对整个编码树单元CTU进行分割。
- 根据权利要求1所述的方法,其特征在于,所述帧编码模式为帧内模式,则所述CU分割预测模型为能够提前终止的分层卷积神经网络ETH-CNN;所述帧编码模式为帧间模式,则所述CU分割预测模型为能够提前终止的ETH-LSTM和所述ETH-CNN。
- 根据权利要求2所述的方法,其特征在于,所述查看所述HEVC当前使用的帧编码模式的步骤之前,所述方法还包括:构建所述ETH-CNN,训练所述ETH-CNN;构建所述ETH-LSTM,训练所述ETH-LSTM。
- 根据权利要求3所述的方法,其特征在于,构建所述ETH-CNN,训练所述ETH-CNN的步骤,包括:构建帧内模式下HEVC中用于预测CU分割结果的第一数据库;采用HEVC标准参考程序对所述第一数据库中的图像进行编码,获取所述第一数据库中的正样本和负样本;采用所述正样本和所述负样本训练帧内模式对应的ETH-CNN。
- 根据权利要求4所述的方法,其特征在于,所述第一数据库中的每一个图像的分辨率为4928×3264;所述第一数据库包括:训练集、验证集和测试集;所述训练集、验证集和测试集中的每一个均包括四个子集;四个子集中第一个子集中每一个图像的分辨率为4928×3264,第二个子集中每一个图像的分辨率为2880×1920,第三个子集中每一个图像的分辨率为1536×1024,第四个子集中每一个图像的分辨率为768×512。
- 根据权利要求3所述的方法,其特征在于,构建所述ETH-CNN,训练所述ETH-CNN;构建所述ETH-LSTM,训练所述ETH-LSTM的步骤,包括:构建帧间模式下HEVC中用于预测CU分割结果的第二数据库;对第二数据库中的所有视频的分辨率进行预处理,使得每一个视频段分辨率在预设范围内,以及对视频长度进行预处理,使得每一个视频长度为预设长度以内;采用HEVC标准参考程序对预处理后的所述第二数据库中的视频进行编码,获取所述第二数据库中的正样本和负样本;采用所述正样本和所述负样本训练帧间模式对应的ETH-CNN和帧间模式对应的ETH-LSTM。
- 根据权利要求7所述的方法,其特征在于,所述预处理层用于对所述矩阵进行预处理操作;首先,在第1卷积层中,预处理后数据与16个4×4的核进行卷积,获得16种不同的特征图,以提取图像信息中的低级特征,为决定CU分割做准备;在第2、第3卷积层中,将上述特征图依次通过24个和32个2×2的核进行卷积,以提取较高级的特征,最终在每条B l分支中均得到32种特征图;所有卷积层中,卷积操作的步长等于核的边长;在ETH-CNN的第一、第二全连接层中,将QP作为一个外部特征,添加到特征向量中,使ETH-CNN能够对QP与CU分割的关系进行建模。
- 根据权利要求2至8任一所述的方法,其特征在于,预测的所述CU分割结果采用分层CU分割图HCPM的结构化输出方式表示;所述CU分割结果包括:第1级分类标签;和/或,当U或者U i被分割时,所述CU分割结果包括的第2级二分类标签或第3级二分类标签;当U或者U i没被分割时,所述CU分割结果包括的第2级分类标签或第3级分类标签中的空值null;和/或,所述ETH-CNN模型训练的目标函数为交叉熵;对于每个样本,其目标函数L r为所有二分类标签的交叉熵之和:
- 根据权利要求6所述的方法,其特征在于,将快速预编码后得到的残差CTU输入到所述ETH-CNN,以第二数据库中的CU分割标签作为真值,训练所述帧间模式的ETH-CNN;将所述ETH-CNN第一个全连接层输出的三个向量,分别输入到所述ETH-LSTM的三个级别;以及,以所述第二数据库中的CU分割标签作为真值,训练所述帧间模式的ETH-LSTM;ETH-LSTM中每一级的LSTM单元和全连接层,由这一级的CU分别进行训练,即由64×64的CU训练ETH-LSTM第1级,由32×32的CU训练第2级,由16×16的CU训练第3级;和/或,训练ETH-LSTM的配置信息中的参数时,以交叉熵作为损失函数;设训练时一批有R个样本,每个样本中LSTM的时间长度为T即T个LSTM单元,第r个样本第t帧的损失函数为L r(t),则这一批样本的损失函数L定义为所有L r(t)的平均值,即之后,利用动量随机梯度下降法进行训练;最终,给定训练好的LSTM,由ETH-LSTM得到HCPM,以预测帧间模式CU分割结果。
- 一种基于深度学习方法的块分割编码复杂度优化装置,其特征在于,包括:存储器、处理器、总线以及存储在存储器上并在处理器上运行的计算机程序,所述处理器执行所述程序时实现如权利要求1-10任意一项的方法。
- 一种计算机存储介质,其上存储有计算机程序,其特征在于:所述程序被处理器执行时实现如权利要求1-10任意一项的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810240912.4A CN108495129B (zh) | 2018-03-22 | 2018-03-22 | 基于深度学习方法的块分割编码复杂度优化方法及装置 |
CN201810240912.4 | 2018-03-22 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019179523A1 true WO2019179523A1 (zh) | 2019-09-26 |
Family
ID=63319290
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/079312 WO2019179523A1 (zh) | 2018-03-22 | 2019-03-22 | 基于深度学习方法的块分割编码复杂度优化方法及装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108495129B (zh) |
WO (1) | WO2019179523A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111654698A (zh) * | 2020-06-12 | 2020-09-11 | 郑州轻工业大学 | 一种针对h.266/vvc的快速cu分区决策方法 |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108495129B (zh) * | 2018-03-22 | 2019-03-08 | 北京航空航天大学 | 基于深度学习方法的块分割编码复杂度优化方法及装置 |
EP3743855A1 (en) * | 2018-09-18 | 2020-12-02 | Google LLC | Receptive-field-conforming convolution models for video coding |
CN111163320A (zh) * | 2018-11-07 | 2020-05-15 | 合肥图鸭信息科技有限公司 | 一种视频压缩方法及*** |
CN110009640B (zh) * | 2018-11-20 | 2023-09-26 | 腾讯科技(深圳)有限公司 | 处理心脏视频的方法、设备和可读介质 |
CN109769119B (zh) * | 2018-12-18 | 2021-01-19 | 中国科学院深圳先进技术研究院 | 一种低复杂度视频信号编码处理方法 |
CN109788296A (zh) * | 2018-12-25 | 2019-05-21 | 中山大学 | 用于hevc的帧间编码单元划分方法、装置和存储介质 |
CN109714584A (zh) * | 2019-01-11 | 2019-05-03 | 杭州电子科技大学 | 基于深度学习的3d-hevc深度图编码单元快速决策方法 |
CN109996084B (zh) * | 2019-04-30 | 2022-11-01 | 华侨大学 | 一种基于多分支卷积神经网络的hevc帧内预测方法 |
CN112087624A (zh) * | 2019-06-13 | 2020-12-15 | 深圳市中兴微电子技术有限公司 | 基于高效率视频编码的编码管理方法 |
CN110675893B (zh) * | 2019-09-19 | 2022-04-05 | 腾讯音乐娱乐科技(深圳)有限公司 | 一种歌曲识别方法、装置、存储介质及电子设备 |
CN110717898A (zh) * | 2019-09-25 | 2020-01-21 | 上海众壹云计算科技有限公司 | 一种运用ai和大数据管理的半导体制造缺陷自动管理方法 |
CN111263145B (zh) * | 2020-01-17 | 2022-03-22 | 福州大学 | 基于深度神经网络的多功能视频快速编码方法 |
CN111405295A (zh) * | 2020-02-24 | 2020-07-10 | 核芯互联科技(青岛)有限公司 | 一种视频编码单元分割方法、***以及硬件实现方法 |
CN111385585B (zh) * | 2020-03-18 | 2022-05-24 | 北京工业大学 | 一种基于机器学习的3d-hevc深度图编码单元划分方法 |
CN111556316B (zh) * | 2020-04-08 | 2022-06-03 | 北京航空航天大学杭州创新研究院 | 一种基于深度神经网络加速的快速块分割编码方法和装置 |
JP2021175126A (ja) * | 2020-04-28 | 2021-11-01 | キヤノン株式会社 | 分割パターン決定装置、分割パターン決定方法、学習装置、学習方法およびプログラム |
CN111583364A (zh) * | 2020-05-07 | 2020-08-25 | 江苏原力数字科技股份有限公司 | 一种基于神经网络的群组动画生成方法 |
CN111596366B (zh) * | 2020-06-24 | 2021-07-30 | 厦门大学 | 一种基于地震信号优化处理的波阻抗反演方法 |
CN112084949B (zh) * | 2020-09-10 | 2022-07-19 | 上海交通大学 | 视频实时识别分割和检测方法及装置 |
CN111931732B (zh) * | 2020-09-24 | 2022-07-15 | 苏州科达科技股份有限公司 | 压缩视频的显著性目标检测方法、***、设备及存储介质 |
CN112465664B (zh) * | 2020-11-12 | 2022-05-03 | 贵州电网有限责任公司 | 一种基于人工神经网络及深度强化学习的avc智能控制方法 |
WO2023198057A1 (en) * | 2022-04-12 | 2023-10-19 | Beijing Bytedance Network Technology Co., Ltd. | Method, apparatus, and medium for video processing |
CN117319679A (zh) * | 2023-07-20 | 2023-12-29 | 南通大学 | 一种基于长短时记忆网络的hevc帧间快速编码方法 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104602000A (zh) * | 2014-12-30 | 2015-05-06 | 北京奇艺世纪科技有限公司 | 一种编码单元的分割方法和装置 |
CN104754357A (zh) * | 2015-03-24 | 2015-07-01 | 清华大学 | 基于卷积神经网络的帧内编码优化方法及装置 |
US20150189270A1 (en) * | 2013-10-08 | 2015-07-02 | Kabushiki Kaisha Toshiba | Image compression device, image compression method, image decompression device, and image decompression method |
JP2016213615A (ja) * | 2015-05-01 | 2016-12-15 | 富士通株式会社 | 動画像符号化装置、動画像符号化方法及び動画像符号化用コンピュータプログラム |
CN108495129A (zh) * | 2018-03-22 | 2018-09-04 | 北京航空航天大学 | 基于深度学习方法的块分割编码复杂度优化方法及装置 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106162167B (zh) * | 2015-03-26 | 2019-05-17 | 中国科学院深圳先进技术研究院 | 基于学习的高效视频编码方法 |
JP2017034531A (ja) * | 2015-08-04 | 2017-02-09 | 富士通株式会社 | 動画像符号化装置及び動画像符号化方法 |
CN105120295B (zh) * | 2015-08-11 | 2018-05-18 | 北京航空航天大学 | 一种基于四叉树编码分割的hevc复杂度控制方法 |
-
2018
- 2018-03-22 CN CN201810240912.4A patent/CN108495129B/zh active Active
-
2019
- 2019-03-22 WO PCT/CN2019/079312 patent/WO2019179523A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150189270A1 (en) * | 2013-10-08 | 2015-07-02 | Kabushiki Kaisha Toshiba | Image compression device, image compression method, image decompression device, and image decompression method |
CN104602000A (zh) * | 2014-12-30 | 2015-05-06 | 北京奇艺世纪科技有限公司 | 一种编码单元的分割方法和装置 |
CN104754357A (zh) * | 2015-03-24 | 2015-07-01 | 清华大学 | 基于卷积神经网络的帧内编码优化方法及装置 |
JP2016213615A (ja) * | 2015-05-01 | 2016-12-15 | 富士通株式会社 | 動画像符号化装置、動画像符号化方法及び動画像符号化用コンピュータプログラム |
CN108495129A (zh) * | 2018-03-22 | 2018-09-04 | 北京航空航天大学 | 基于深度学习方法的块分割编码复杂度优化方法及装置 |
Non-Patent Citations (1)
Title |
---|
LI, TIANYI ET AL.: "A Deep Convolutional Neural Network Approach For Complexity Reduction On Intra-mode HEVC", 2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME, 14 July 2017 (2017-07-14), pages 1256 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111654698A (zh) * | 2020-06-12 | 2020-09-11 | 郑州轻工业大学 | 一种针对h.266/vvc的快速cu分区决策方法 |
CN111654698B (zh) * | 2020-06-12 | 2022-03-22 | 郑州轻工业大学 | 一种针对h.266/vvc的快速cu分区决策方法 |
Also Published As
Publication number | Publication date |
---|---|
CN108495129A (zh) | 2018-09-04 |
CN108495129B (zh) | 2019-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019179523A1 (zh) | 基于深度学习方法的块分割编码复杂度优化方法及装置 | |
Xu et al. | Reducing complexity of HEVC: A deep learning approach | |
Kim et al. | Fast CU depth decision for HEVC using neural networks | |
TWI744827B (zh) | 用以壓縮類神經網路參數之方法與裝置 | |
TWI806199B (zh) | 特徵圖資訊的指示方法,設備以及電腦程式 | |
US20230336758A1 (en) | Encoding with signaling of feature map data | |
CN114286093A (zh) | 一种基于深度神经网络的快速视频编码方法 | |
US11062210B2 (en) | Method and apparatus for training a neural network used for denoising | |
US20230353764A1 (en) | Method and apparatus for decoding with signaling of feature map data | |
CN111800642B (zh) | Hevc帧内角度模式选择方法、装置、设备及可读存储介质 | |
CN115311605B (zh) | 基于近邻一致性和对比学习的半监督视频分类方法及*** | |
CN114710667A (zh) | 针对h.266/vvc屏幕内容帧内cu划分的快速预测方法及装置 | |
CN116508320A (zh) | 基于机器学习的图像译码中的色度子采样格式处理方法 | |
TWI814540B (zh) | 視訊編解碼方法及裝置 | |
WO2023122132A2 (en) | Video and feature coding for multi-task machine learning | |
Bakkouri et al. | Effective CU size decision algorithm based on depth map homogeneity for 3D-HEVC inter-coding | |
US20240185572A1 (en) | Systems and methods for joint optimization training and encoder side downsampling | |
CN114449273B (zh) | 基于hevc增强型块划分搜索方法和装置 | |
WO2023081091A2 (en) | Systems and methods for motion information transfer from visual to feature domain and feature-based decoder-side motion vector refinement control | |
WO2023024115A1 (zh) | 编码方法、解码方法、编码器、解码器和解码*** | |
WO2023122244A1 (en) | Intelligent multi-stream video coding for video surveillance | |
WO2023122149A2 (en) | Systems and methods for video coding of features using subpictures | |
CN116634173A (zh) | 视频的特征提取及切片方法、装置、电子设备及存储介质 | |
WO2023137003A1 (en) | Systems and methods for privacy protection in video communication systems | |
WO2023069337A1 (en) | Systems and methods for optimizing a loss function for video coding for machines |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19772007 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19772007 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19772007 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23.03.2021) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19772007 Country of ref document: EP Kind code of ref document: A1 |