CN102567467A

CN102567467A - Method for acquiring hotspot video information based on video tags

Info

Publication number: CN102567467A
Application number: CN2011103965154A
Authority: CN
Inventors: 金海�; 廖小飞; 陆枫; 曲鑫
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2011-12-02
Filing date: 2011-12-02
Publication date: 2012-07-11

Abstract

The invention discloses a method for acquiring hotspot video information based on video tags, which comprises the following steps of: acquiring a video tag of a video, wherein the video tag comprises time points and tag contents of the video; carrying out Chinese word segmentation on the tag contents of the video tag; filtering the obtained word segmentation results so as to keep nouns and verbs as feature description words of a time point at which the video tag is located; calculating the importance degree values of the feature description words in the video tag; according to the importance degree values, carrying out sequencing on the feature description words, and taking the most important k feature description words as standby hotspot words of the time point; and carrying out statistical modeling on the standby hotspot words so as to generate hotspot video information. By using the method disclosed by the invention, the accuracy of video description can be improved through hotspots, thus the hotspot-based advertisement delivery effect is improved.

Description

Obtain the method for video hot information based on video tab

Technical field

The present invention relates to the Internet video application, more particularly, the present invention relates to a kind of method of obtaining the video hot information based on video tab.

Background technology

Video tab is meant the phrase that is used to describe video features.Existing video tab technology all is to describe to whole video, can't describe the video highlight fragment.In addition, existing video tab technology also exists description accurate inadequately, cause based on the advertisement pushing of existing video tab technology inadequately precisely, ad content and video content is uncorrelated, the problem of throwing in weak effect.

Summary of the invention

In view of this, the purpose of this invention is to provide and a kind ofly obtain the method for video hot information based on video tab, it can improve the video presentation accuracy through focus, and then improves the advertisement pushing effect based on focus.

The present invention realizes through following technical scheme:

A kind ofly obtain the method for video hot information based on video tab, may further comprise the steps: obtain the video tab of video, video tab comprises the time point and the label substance of video; Label substance to video tab carries out Chinese word segmentation; Filter word segmentation result, to keep noun, verb feature description speech, the significance level value of calculated characteristics descriptor in video tab as video tab place time point; According to the significance level value feature description speech is sorted; And get the focus reserved word of K most important characteristic descriptor as time point, the focus reserved word is carried out statistical modeling, to generate the video hot information.

The step of the significance level value of calculated characteristics descriptor in video tab comprises: the word frequency value of calculated characteristics descriptor, concrete computing formula does

N wherein _{I, j}Be the number of times that feature description speech i occurs in video tab j, M is the feature description speech sum of video tab j, the reverse file frequency values of calculated characteristics descriptor, and concrete computing formula does Wherein | D| is the sum of video tab, | { j:t _i∈ d _j| for comprising the video tab sum of feature description speech i, according to the reverse file frequency values of the word frequency value * feature description speech of the significance level value=feature description speech of following formula calculated characteristics descriptor.

The focus reserved word is carried out statistical modeling; Step to generate the video hot information comprises: take out all time points that have the focus reserved word in the video; From all time points, select N maximum time point of number of clicks as interim focus; All focus reserved words in before and after the interim focus 10 seconds are deposited in the interim focus, to generate the video hot information.

The step of the significance level value of calculated characteristics descriptor in video tab is to adopt the TF-IDF algorithm.

The value of K equals 5.

The value of N equals length/300 second of video.

The invention has the advantages that:

(1) video segment is described: through the video focus that extracts, can obtain the wonderful of video, thereby to the fragment pushed advertisement;

(2) the focus feature description is accurate: the feature description speech that each wonderful has, push the relevant advertisement of theme to the feature description speech, and cause user's sympathetic response more easily, improve advertisement delivery effect.

Description of drawings

Fig. 1 the present invention is based on the process flow diagram that video tab obtains the method for video hot information.

Fig. 2 is the refinement process flow diagram of step in the inventive method (3).

Fig. 3 is the refinement process flow diagram of step in the inventive method (5).

Embodiment

As shown in Figure 1, the present invention is based on the method that video tab obtains the video hot information and may further comprise the steps:

(1) obtain the video tab of video, video tab comprises the time point and the label substance of video;

(2) label substance to video tab carries out Chinese word segmentation, filters word segmentation result, to keep noun, verb belong to time point as video tab feature description speech;

(3) the significance level value of calculated characteristics descriptor in video tab;

In this step, adopt TF-IDF algorithm computation significance level value, specifically comprise following substep:

(31) word frequency of calculated characteristics descriptor (Term Frequency is called for short TF) value, the number of times that on behalf of certain speech, TF in this video tab, occur, concrete computing formula does

N wherein _{I, j}Be the number of times that feature description speech i occurs in video tab j, M is the feature description speech sum of video tab j;

(32) the reverse file frequency of calculated characteristics descriptor (Inverse Document Frequency is called for short IDF) value, IDF represents the tolerance of the general importance of certain word, and computing formula does

Wherein | D| is the sum of video tab, | { j:t _i∈ d _j| for comprising the video tab sum of feature description speech i;

(33) according to the IDF value of the TF value * feature description speech of the significance level value=feature description speech of following formula calculated characteristics descriptor.

(4) according to the significance level value feature description speech is sorted, and get the focus reserved word of K most important characteristic descriptor as this time point, in this embodiment, the value of K is 5;

(5) the focus reserved word is carried out statistical modeling,, specifically comprises following substep to generate the video hot information:

(51) take out all time points that have the focus reserved word in the video;

(52) get the maximum N of number of clicks time point as interim focus, the N value equals video length/300 second;

(53) deposit all the focus reserved words in before and after the interim focus 10 seconds in interim focus, to generate the video hot information.

Instance

Suppose that the video tab content in the current video is that " dunk shot of this penalty line take-off of Jordon is very handsome! "; through semantic participle can obtain " Jordon ", " this ", " penalty line ", " take-off ", " ", " dunk shot ", " very handsome " several speech; filter through part of speech and to obtain " Jordon ", " penalty line ", " dunk shot " two speech; calculate the TF-IDF value of two speech, suppose that here the TF-IDF value of " Jordon " is 0.4, the TF-IDF value of " penalty line " is 0.45, the TF-IDF value of " dunk shot " is 0.3, obtain " penalty line ", " Jordon ", " dunk shot " three feature description speech after the ordering.The number of tags of each time point in the statistics whole video; get the maximum k of a label time point (the k value equals video length divided by 3 quotient); this k time point is defined as interim focus; feature description speech of 10 seconds before and after each interim focus is merged in the interim focus, after the merging newly temporarily focus be the video focus.

Claims

1. one kind is obtained the method for video hot information based on video tab, may further comprise the steps:

Obtain the video tab of video, said video tab comprises the time point and the label substance of said video;

Label substance to said video tab carries out Chinese word segmentation, filters word segmentation result, to keep noun, verb belong to time point as said video tab feature description speech;

Calculate the significance level value of said feature description speech in said video tab;

According to said significance level value said feature description speech is sorted, and get the focus reserved word of K most important characteristic descriptor as said time point;

Said focus reserved word is carried out statistical modeling, to generate the video hot information.

2. method according to claim 1 is characterized in that, the step of the significance level value of the said feature description speech of said calculating in said video tab comprises:

Calculate the word frequency value of said feature description speech, concrete computing formula does

N wherein _{I, j}Be the number of times that feature description speech i occurs in video tab j, M is the feature description speech sum of video tab j; Calculate the reverse file frequency values of said feature description speech, concrete computing formula does

Reverse file frequency values according to the word frequency value * feature description speech of the significance level value=feature description speech of following formula calculated characteristics descriptor.

3. method according to claim 1 is characterized in that, said said focus reserved word is carried out statistical modeling, comprises with the step that generates the video hot information:

Take out all time points that have said focus reserved word in the said video;

From said all time points, select N maximum time point of number of clicks as interim focus;

All focus reserved words in before and after the said interim focus 10 seconds are deposited in the said interim focus, to generate said video hot information.

4. method according to claim 1 is characterized in that, the step of the significance level value of the said feature description speech of said calculating in said video tab is to adopt the TF-IDF algorithm.

5. method according to claim 1 is characterized in that the value of said K equals 5.

6. method according to claim 1 is characterized in that, the value of said N equals the length of said video/300 second.