CN109120932B

CN109120932B - Video significance prediction method of HEVC compressed domain double SVM model

Info

Publication number: CN109120932B
Application number: CN201810766665.1A
Authority: CN
Inventors: 张鑫生; 刘浩; 孙晓帆; 吴乐明; 况奇刚; 魏国林; 廖荣生; 孙嘉曈; 刘洋
Original assignee: Donghua University
Current assignee: Donghua University
Priority date: 2018-07-12
Filing date: 2018-07-12
Publication date: 2021-10-26
Anticipated expiration: 2038-07-12
Also published as: CN109120932A

Abstract

The invention provides a video significance prediction method based on a HEVC compressed domain double-SVM model. The method classifies all training video sequences selected in a video data set, and uses classified training video sequences A and B to respectively train HEVC compressed domain double SVM significance prediction models to obtain two different compressed domain significance prediction models. A certain test video sequence is selected from a video data set to carry out pre-classification operation, and a trained HEVC dual-SVM significance prediction model is used for carrying out significance prediction on the test video sequence.

Description

Video significance prediction method of HEVC compressed domain double SVM model

Technical Field

The invention relates to a method for predicting video significance in a compressed domain, and belongs to the field of video significance detection.

Background

In view of the current increasing resolution of Video signals and the wide application of parallel processing, the 2013 hevc (high Efficiency Video coding) coding standard was released. Compared to the previous h.264/AVC standard, HEVC defines a flexible partitioning structure, while optimizing and improving individual coding modules and adding a large number of new coding tools. On the premise of the same application condition and video quality, the compression rate of HEVC is doubled compared with that of H.264/AVC, and the characteristic information of a video can be more effectively extracted, so that the HEVC compression technology increasingly becomes a common tool for video analysis.

When a human being observes an object, the human being can quickly capture a salient region different from the background and the surroundings, so that the useful information can be acquired maximally in a short time. Thus, a computational visual saliency model can greatly help solve many challenging computer vision and image processing problems. For example, by detecting the location of the protrusions and ignoring most of the extraneous background, object recognition will become more efficient and reliable; by detecting the space-time saliency points, the visual saliency model is beneficial to realizing the tracking of the target.

Motion saliency is an important feature of video saliency prediction to distinguish it from image saliency prediction, which can help machines to better predict important content in video. The video saliency prediction algorithm based on the pixel domain needs to fully decode the compressed video into the pixel domain before prediction, which increases the computational complexity of the terminal device. If the significance prediction is directly carried out in the compressed domain, the complex operation steps brought by decoding can be avoided, and the data processing efficiency can also be improved.

Disclosure of Invention

The purpose of the invention is: and obtaining a video significance prediction result consistent with a human eye visual fixation mechanism by utilizing the motion information in the compressed code stream.

In order to achieve the above object, the present invention provides a video saliency prediction method of an HEVC compressed domain bi-SVM model, which is characterized by comprising the following steps:

step 1, acquiring a video data set, and dividing video sequences into a training video sequence and a testing video sequence;

step 2, classifying all training video sequences selected in the video data set, wherein the classification process comprises the following steps:

step 201, performing significance prediction on a certain training video sequence by using an HEVC compressed domain video significance prediction method;

step 202, selecting a pixel domain significance prediction method to predict the significance of the same training video sequence in step 201;

step 203, evaluating the significance prediction result obtained in the step 201 and the significance prediction result obtained in the step 202 by using the significance prediction evaluation index;

step 204, classifying the current training video sequence according to the evaluation result of the step 203, wherein if the significance prediction result of the current training video sequence obtained in the step 202 is better than the significance prediction result of the same training video sequence obtained in the step 201, the current training video sequence is an A-class training video sequence, otherwise, the current training video sequence is a B-class training video sequence;

step 3, training the HEVC compressed domain dual-SVM significance prediction model by respectively using an A-type training video sequence and a B-type training video sequence to obtain two different compressed domain significance prediction models;

step 4, predicting the significance of a certain test video sequence by using two different trained compression domain significance prediction models, which comprises the following steps:

step 401, selecting a test video sequence from all test video sequences at will, and performing a pre-classification operation on the test video sequence to obtain a category to which the current test video sequence belongs;

step 402, obtaining an HEVC compressed code stream of a test video sequence, and extracting HEVC characteristics from the compressed code stream;

step 403, inputting the obtained HEVC features into a compressed domain significance prediction model corresponding to the category to which the current test video sequence belongs;

and step 404, performing Kalman filtering to obtain a final video saliency map.

Preferably, the step 201 comprises the steps of:

step 2011, a training video sequence is selected from all the training video sequences randomly, and an HEVC compressed code stream of the current training video sequence is obtained;

step 2012, extracting HEVC features from the HEVC compressed code stream;

step 2013, inputting the obtained HEVC features into an HEVC significance prediction model;

step 2014, performing forward smoothing filtering;

and step 2015, obtaining a final video saliency map.

Preferably, the significance prediction assessment indicators used in step 203 are AUC, CC and NSS.

Preferably, the step 3 comprises the steps of:

301, obtaining an HEVC compressed code stream of an A-class training video sequence or a B-class training video sequence;

step 302, extracting relevant HEVC features;

step 303, inputting the obtained HEVC features and the human eye visual attention to a double-SVM significance prediction model in an HEVC compressed domain;

and step 304, obtaining two classification training compressed domain significance prediction models.

The invention provides a new solution for the video significance prediction task, and by combining the characteristics of high real-time performance and capability of effectively utilizing video information of a compressed domain prediction method and the advantage that the significance prediction of a pixel domain prediction method is more accurate in certain scenes, the effective classification of video sequences can be realized, so that the high-efficiency and accurate training of an HEVC compressed domain SVM significance model is carried out on different types of video sequences. The method has high accuracy in predicting the video saliency, and provides a better basis for the subsequent application field based on the video saliency.

Drawings

FIG. 1 is a flow chart of the main process of the present invention;

FIG. 2 is a video classification flow diagram of the present invention;

FIG. 3 is a flow chart of the training process of the video classification-based dual SVM model of the present invention;

fig. 4 is a HEVC compressed domain video saliency prediction flow diagram;

fig. 5 is a flow chart of FES video saliency prediction based on pixel domain.

Detailed Description

In order to make the invention more comprehensible, preferred embodiments are described in detail below with reference to the accompanying drawings.

The invention provides a video significance prediction method of a HEVC compressed domain double SVM model, which comprises the following steps:

step 201, performing significance prediction on a certain training video sequence by using an HEVC compressed domain video significance prediction method, including the following steps:

step 2012, extracting HEVC features from the HEVC compressed code stream;

step 2014, performing forward smoothing filtering;

and step 2015, obtaining a final video saliency map.

step 203, evaluating the significance prediction result obtained in the step 201 and the significance prediction result obtained in the step 202 by using significance prediction evaluation indexes, wherein the used significance prediction evaluation indexes are AUC, CC and NSS;

step 3, training the HEVC compressed domain dual-SVM significance prediction model by respectively using the A-type training video sequence and the B-type training video sequence to obtain two different compressed domain significance prediction models, and the method comprises the following steps:

step 302, extracting relevant HEVC features;

step 304, obtaining two compressed domain significance prediction models for classification training;

and step 404, performing Kalman filtering to obtain a final video saliency map.

With reference to fig. 1, this example shows a video saliency prediction method based on the HEVC compressed domain bi-SVM model. The proposed method acquires a video data set, and the video sequences therein are divided into two categories, training and testing. In this example, the video sequences used for training are Tennis, Kimono, ParkScene, Cactus, BQTerace, BasketbalDrive, Yan, Simo, Male, Female, Lee, Couple, RaceHorsessC, BQMall, PartyScene, BasketbalDrill, KeceHorsessD, Bquare, and BlongQSbush; examples of video sequences to be tested are BasketbalPass, FourPeople, Johnny, KristennedSara, Vidyo1, Vidyo3, Vidyo4, BasketbalDrillText, Chinaseed, SlideEditing, SlideShow, Traffic and PeopleOnStreet.

Fig. 2 shows the classification of all selected training video sequences in a video data set. In this embodiment, a pixel domain based fes (fast Efficient salience) significance prediction method is selected, and Tennis is taken as an example for explanation:

according to the method, the video sequence Tennis is subjected to significance prediction by using an HEVC compressed domain significance prediction method, as shown in FIG. 3, an FFMPEG tool is used for obtaining an HEVC compressed code stream of the video sequence Tennis; extracting related HEVC features of split-depth, mv and bit-allocation from a compressed code stream; inputting the obtained HEVC features into an HEVC significance prediction model; in order to better predict moving or emerging targets, forward smoothing filtering is carried out; and obtaining a final video saliency map.

According to the method, the significance of the Tennis is predicted by using an FES significance prediction method, as shown in FIG. 4, firstly, image frames of the Tennis are obtained; extracting a CIELab color vector of each pixel of each frame in a video sequence Tennis; calculating the significance of each pixel by using a center-around method based on Bayes and a trained Gaussian kernel density function; calculating the average significance of each pixel under different scales by using a multi-scale method; and calculating the significance of the Tennis 240 frames of the video sequence to obtain a final video significance map.

Two significance predictions of the video sequence Tennis were evaluated using significance prediction evaluation indices, here AUC, CC and NSS evaluation indices. And classifying the videos according to the evaluation result. If the significance evaluation result of the FES method is better than that of HEVC, namely at least two of the three indexes are better than that of HEVC, the video is classified as A, otherwise, the video is classified as B. In this example, class a includes: kimono, ParkScene, BQTerace, BasketbalDrive, Tennis, RaceHorsessC, BQMall, PartyScene, BasketbalDrill, Keiba, RaceHorsessD, BQSquare; class B includes: cactus, Yan, Simo, Male, Female, Lee, Couple, blowingbunbles.

The method comprises the steps of respectively training a double-SVM significance model of the HEVC compressed domain by using classified training video sequences of class A and class B to obtain two different compressed domain significance prediction models, and firstly, acquiring an HEVC compressed code stream of a classified video by using FFMPEG (fast Fourier transform and motion Picture experts group) as shown in figure 5; extracting related HEVC features of split-depth, mv and bit-allocation from a compressed code stream; inputting the obtained HEVC features and a Human eye visual attention Map (Human visualization Map) into an SVM learning model, and training the model; and obtaining two HEVC double SVM significance prediction models for classification training.

As shown in fig. 1, the present embodiment performs significance prediction on a test video Traffic using the HEVC dual SVM significance prediction model described above. Classifying the test video Traffic; obtaining an HEVC compressed code stream of a video sequence Traffic by using FFMPEG; three characteristics of HEVC are extracted from a video sequence Traffic compressed code stream: splitting-depth, mv and bit-allocation; inputting the obtained HEVC features into a corresponding classification-trained HEVC significance prediction model; in order to better predict the moving or emerging target, Kalman filtering is carried out; and obtaining a final video saliency map. The same operation is carried out on the rest test video sequences, and a better video saliency map can be obtained.

Claims

1. A video significance prediction method of an HEVC compressed domain dual-SVM model is characterized by comprising the following steps of:

step 2012, extracting HEVC features from the HEVC compressed code stream;

step 2014, performing forward smoothing filtering;

step 2015, obtaining a final video saliency map;

and step 404, performing Kalman filtering to obtain a final video saliency map.

2. The method of claim 1, wherein the significance prediction estimation criteria used in step 203 are AUC, CC and NSS.

3. The method of claim 1, wherein the step 3 comprises the steps of:

step 302, extracting relevant HEVC features;