CN102163285A

CN102163285A - Cross-domain video semantic concept detection method based on active learning

Info

Publication number: CN102163285A
Application number: CN2011100567757A
Authority: CN
Inventors: 李欢; 李超; 袁晓冬; 熊璋
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2011-03-09
Filing date: 2011-03-09
Publication date: 2011-08-24

Abstract

The invention relates to a cross-domain video semantic concept detection method based on active learning, comprising the following steps of: (1) taking a Gaussian random field as a standard classifier; (2) selecting and marking unmarked samples by adopting an uncertainty query strategy in an active learning method; (3) updating the standard classifier; and (4) sequentially repeating the steps (2) and (3) until a certain cycle index is completed. In the invention, the Gaussian random field is used for constructing the standard classifier for active learning. Compared with the way that only an original domain sample is taken as a training set of the standard classifier, which is frequently used in other active learning algorithms, the standard classifier in the invention has the advantage that the selected marked samples can reflect distribution of data in a target domain to a greater extent. Weight of a newly selected marked sample from the target domain is increased, thus the standard classifier can rapidly adapt to characteristic distribution of the data in the target domain. The invention also provides an algorithm for rapidly updating a standard model, the complexity of the algorithm is effectively reduced, and the applicability of the algorithm is improved.

Description

A kind of based on the cross-domain video semanteme concept detection method of initiatively learning

Technical field

The present invention relates to a kind of cross-domain video semanteme concept detection method, belong to video content analysis and semantic concept detection range based on active study.

Background technology

The video semanteme concept detection is to detect the semantic concept that occurs in the video automatically, for example " automobile ", " people " and " building " etc.But along with the increase of massive video data, increasing video is from different fields, for example news video, Internet video and document class video etc.Because identical semantic concept distributes different in the feature space of the video of different field, so when testing on the video data of semantic concept sorter in another territory of video data that uses a territory, can obtain very poor classifying quality as training set.Simple solution is the new sorter of semantic concept training for each neofield, but because the semantic concept that comprises in the video is numerous, the video data volume is huge, and video data is manually marked with training classifier all to be needed to consume a lot of manpowers and computing time.So present research work major part concentrates on the data of data how to utilize the original domain that all marks and the aiming field that does not mark, by semi-supervised machine learning method, under the least possible situation that the aiming field sample is marked, obtain the high as far as possible sorter of verification and measurement ratio on the aiming field data.

Active learning method in the machine learning can initiatively be selected the big sample of information of classifier amount is marked, thereby makes sorter obtain higher classification accuracy rate under the situation of using less training set.But traditional active learning method hypothesis test set has identical DATA DISTRIBUTION with training set, yet the video data in fresh target territory distributes different with the data of original domain at feature space in cross-domain video semanteme concept detection.In aiming field, select to having most the sample of quantity of information to mark as the sorter of training set the original domain data, to the not too big effect of prediction in fresh target territory.

Summary of the invention

The present invention overcomes the deficiencies in the prior art, provides a kind of based on the cross-domain video semanteme concept detection method of initiatively learning, and this method as the benchmark sorter, is represented with the data of original domain and aiming field Gaussian random field with same non-directed graph.Based on should having this principle of close as far as possible mark at two more little samples of feature space distance, the mark by the original domain data and the sample of two data fields dope the not mark of the aiming field sample of mark at the distance relation of feature space.Adopt the least determinacy selection strategy in the initiatively study that the aiming field data are carried out mark then, and add it to labeled data collection.Because purpose is the sorter that obtains better performances on aiming field,, make the benchmark sorter be adapted to the aiming field data faster so after in the benchmark sorter, obtaining the mark of new aiming field sample, increase all weights to the limit of these samples.Propose a kind of update algorithm of benchmark model fast simultaneously, when the sample of the new mark of each adding in batches, reduce the time complexity of model modification, improved the applicability of algorithm.

Technical solution of the present invention: a kind of cross-domain video semanteme concept detection method based on initiatively study may further comprise the steps:

(1) with Gaussian random field as the benchmark sorter

(A) each sample point with original domain and aiming field is a summit, with between the sample point in the distance of feature space weight as its limit, construct a non-directed graph;

(B) by minimizing the loss function of Gaussian random field, obtain the estimated value of the sample point mark that does not have mark in the non-directed graph, promptly the mark of the sample in the aiming field is predicted.Because the video semanteme concept detection is two classification problems among the present invention, so the scope of mark predicted value is 0～1.

(2) adopting least determinacy query strategy in the learning method initiatively to select not mark sample marks

(A) sample of selecting unlabeled data to concentrate the mark value of prediction least to determine marks, and promptly near 0.5 sample, this class sample point has maximum fault information to the benchmark sorter to predicted value, obtains the true mark of these samples;

(B) sample that will newly mark is concentrated from unlabeled data and is removed, and it is added labeled data collection.

(3) upgrade the benchmark sorter

(A) reset weight for all limits that are connected to the sample of new acquisition mark in the aiming field, and the mark of these new acquisitions and the weight of renewal are joined in the Gaussian random field, promptly upgrade the new weighted value that adds the sample that marks;

(B) according to the Gaussian random field after upgrading, the mark of the sample point that does not have mark that belongs to aiming field in the non-directed graph is predicted, promptly upgrade the gaussian random field model;

(4) repeat (2) step, (3) step in turn, up to finishing certain cycle index.

Wherein, described step (1) is as follows as the method for benchmark sorter with Gaussian random field:

At first, for (A) in the step (1) with each sample point in original domain and the aiming field as a summit among the figure: the visual signature that extracts each sample point, mesh color moment characteristics comprising 225 dimensions, the Gabor textural characteristics of 48 dimensions and the edge orientation histogram feature of 73 dimensions, these features are together in series obtains the proper vector of one 346 dimension;

Secondly, for (A) in the step (1) with between the sample point in the distance of feature space weight as its limit, construct a non-directed graph, its computing method are

W wherein _IjRepresent the distance between two sample points, n is a sample visual signature dimension, x _IdAnd x _JdBe respectively the value of the d dimensional feature of sample i and j, σ is a smooth function.

Once more, according to the Gaussian random field principle, it is as follows that the mark of sample point that not having mark in the non-directed graph is carried out forecast method, the minimum losses function for (B) in the step (1)

Separate and satisfy Δ f _u=0, Δ is Laplce's matrix here, is defined as Δ=D-W, wherein D=diag (d _i), W is a weight matrix; F wherein _lBe the mark value that the sample of mark is arranged, f _uBe the mark value of sample that does not have mark, obtain separate for

Wherein, to adopt least determinacy query strategy in the learning method initiatively to select not mark the step that sample marks as follows for described step (2):

At first, select mark value f for (A) in the step (2) _uApproach 0.5 sample and mark, promptly

Obtain their true mark value;

Secondly, these samples are deleted away from the unlabeled data collection U of aiming field, and they are joined among the labeled data collection L for (B) in the step (2).

Wherein, the step of described step (3) renewal benchmark sorter is as follows:

At first, upgrade the new weighted value that adds the sample of mark, w for (A) in the step (3) _Ij=τ * w _Ij, wherein τ is a weighting factor, and τ＞1;

Secondly, when upgrading the gaussian random field model, adopt a kind of model update method fast to calculate for (B) in the step (3) Behind the mark that increases k new sample, the number that does not mark sample will be decreased to u-k; The sample of remaining not mark is predicted as f ' _u=(Δ ' _Uu) ^-1W ' _UlF ' _l, wherein Δ ' _UuBe Δ _UuRemove the matrix of the capable k row of k;

Problem is converted to the inverse matrix A at known matrix A ^-1Down, how effectively compute matrix A removes the new matrix that obtains behind the capable k row of k

Inverse matrix

Suppose B=S (A, i ₁..., i _k), for the i among the A ₁..., i _kThe element of row moves to 1 ..., k is capable, with i ₁..., i _kThe element of row moves to 1 ..., the k row; As can be known

According to Woodburymatrix identity theorem, convert B to a diagonal blocks matrix

Wherein calculate (B ⁽²⁾) ^-1Time complexity be O (k ³); According to

Calculate as can be known

Time complexity be O (k ³); Like this will be to not marking time complexity that sample labeling predicts from O (u ³) be reduced to O (k ³), u is the number of unmarked sample here, k is the number of each sample that newly is selected into, and k＜＜u.

Wherein, described step (4) repeats (2) step, (3) step in turn, and is as follows up to the step of finishing certain cycle index:

Order repeating step (2) adopts initiatively, and learning algorithm selection sample point marks, and step (3) is upgraded the benchmark sorter, up to finishing certain cycle index, promptly selected the sample point in the abundant aiming field to mark, the detection effect of benchmark sorter in aiming field reach a certain height.

The present invention's advantage compared with prior art is: initiatively learning method is selected not mark the method that sample marks as the certain selection strategy of a kind of basis, can be issued to higher classification accuracy rate in the situation of selecting less sample as far as possible.For this problem that needs to handle massive video data of training video semantic concept, adopting initiatively, learning method can reduce the sample complexity greatly.Consider the characteristics of cross-domain video semanteme concept detection problem itself, the present invention is used to make up the initiatively benchmark sorter of study with Gaussian random field.This benchmark sorter with other initiatively in learning algorithms the original domain sample that only uses commonly used compare as the training set of benchmark sorter, the common structure of the sample that sample that it has marked based on original domain and aiming field data do not mark forms, data according to two territories are predicted the mark of aiming field data in the true distribution of feature space, and select on this basis to mark for the least deterministic sample of benchmark sorter, the sample that selection is marked can reflect the distribution of aiming field data more.And strengthen the weight that obtains the sample that marks newly be selected into from aiming field, the benchmark sorter can be adapted on the characteristic distribution of aiming field data faster.Simultaneously, in order to reduce the algorithm time complexity, the present invention proposes a kind of quick benchmark model update algorithm, has effectively reduced the complexity of algorithm, has improved the applicability of algorithm.

Description of drawings

Fig. 1 is the process flow diagram of cross-domain active learning algorithm of the present invention;

Fig. 2 a, 2b, 2c compare on 36 semantic concepts altogether for the experimental result of the present invention and other models cross-domain semantic concept detection on public database compares, and three subgraphs are respectively the comparative result of 12 semantic concepts.

Embodiment

The present invention is based on initiatively study has proposed a kind of based on the cross-domain video semanteme concept detection method of initiatively learning, use Gaussian random field as the benchmark sorter, utilize original domain data that marked and the aiming field data that do not mark as training data simultaneously, select sample to mark according to the least determinacy principle of initiatively learning in the query strategy, its new mark is added Gaussian random field, model is upgraded, reselect new least deterministic sample then and mark, it specifically may further comprise the steps:

In the process flow diagram of Fig. 1 cross-domain active learning algorithm of the present invention, the present invention mainly divides and adopts Gaussian random field as the benchmark sorter, select sample to mark and upgrade four steps such as benchmark sorter according to determinacy principle least.

The first step: adopt Gaussian random field as the benchmark sorter

With each sample point in original domain and the aiming field as a summit among the figure.Extract the visual signature of each sample point,, extract key frame and represent this camera lens the representative of each camera lens in the video as a video.For three kinds of vision low-level image features of each key-frame extraction, the mesh color moment characteristics that comprises 225 dimensions, the Gabor textural characteristics of 48 dimensions and the edge orientation histogram feature of 73 dimensions, these features are together in series obtains the proper vector of one 346 dimension, and each video will be represented by the feature of one 346 dimension.

When making up non-directed graph, adopt traditional construction method, the data of original domain and aiming field are put together makes up a non-directed graph.Each sample data is represented on each summit of figure, and the limit of figure is the weight between the sample, is calculated by Gauss's distance:

W wherein _IjRepresent the distance between two sample points, n is a sample visual signature dimension, n=346 among the present invention, x _IdAnd x _JdBe respectively the value of the d dimensional feature of sample i and j, σ is a smooth function.

According to the Gaussian random field principle mark value of aiming field sample of mark is not predicted.The Gaussian random field principle is by the minimum losses function The mark value of the sample that is not marked, wherein f _iAnd f _jBe respectively the mark value of sample i and sample j.Minimize this loss function and make limit weight w _IjTwo big more sample points, promptly more little in the feature space distance, have the most close mark value, i.e. f _i-f _jMore little.

Minimize separating of this loss function and satisfy Δ f _u=0, Δ is Laplce's matrix here, is defined as Δ=D-W, wherein D=diag (d _i), W is the weight matrix on limit.Can obtain Δ f by matrix manipulation _u=0 separate is expressed as partitioned matrix with W F is expressed as

F wherein _lBe the mark value that the sample of mark is arranged, f _uBe the mark value of sample that does not have mark, obtain separate for

Second step: the least determinacy query strategy in the employing active learning method is selected not mark sample and is marked

Its mark of sample queries that the selection reference sorter is least determined.The mark that obtains these samples can be at the classification performance of selecting raising benchmark sorter high as far as possible under the limited prerequisite of mark number.Promptly select predicted value f _iMark near 0.5 sample, promptly

Obtain their true mark value.Then these samples are deleted away from the unlabeled data collection U of aiming field, and they are joined among the labeled data collection L.

The 3rd step: upgrade the benchmark sorter

Because upgrade the new weighted value that adds the sample of mark, w _Ij=τ * w _Ij, wherein τ is a weighting factor.I is the numbering of initiate sample, and the span of j is 1 ..., N, N are original domain sample and aiming field number of samples sum.

After renewal has marked sample set and do not marked sample set, predict f again _uThe time, adopt a kind of model update method fast to calculate

Will be to not marking time complexity that sample labeling predicts from O (u ³) be reduced to O (k ³), u is the number of unmarked sample here, k is the number of each sample that newly is selected into, and k＜＜u.Behind the mark that increases k new sample, the number that does not mark sample will be decreased to u-k.The sample of remaining not mark is predicted as f ' _u=(Δ ' _Uu) ^-1W ' _UlF ' _l, wherein Δ ' _UuBe Δ _UuRemove the matrix of the capable k row of k.Problem is converted to the inverse matrix A at known matrix A ^-1Down, how compute matrix A removes the new matrix that obtains behind the capable k row of k Inverse matrix

Suppose B=S (A, i ₁..., i _k), for the i among the A ₁..., i _kThe element of row moves to 1 ..., k is capable, with i ₁..., i _kThe element of row moves to 1 ..., the k row.As can be known

According to the theorem of inverting of matrix, B as can be known ^-1=S (A ^-1, i ₁..., i _k), be about to A ^-1I ₁..., i _kMove to 1 ..., k is capable, with i ₁..., i _kThe element of row moves to 1 ..., the k row.B can be expressed as partitioned matrix

Wherein

B can be converted to a diagonal blocks matrix through the following two-step:

(1), B^{(1)} = [\begin{matrix} I_{k \times k} & 0_{k \times (u - k)} \\ B_{3} & B_{&Not; 1, . . ., k} \end{matrix}] = B + UCV

Here

C=I _{K * k},

Can get (B according to Woodbury matrix identity ⁽¹⁾) ^-1=(B+UCV) ^-1=B ^-1-B ^-1U (C ^-1+ VB ^-1U) ^-1VB ^-1Because at known A ^-1Situation under B ^-1As can be known, calculate (C ^-1+ VB ^-1U) ^-1Time complexity be O (k ³);

(2), B^{(2)} = [\begin{matrix} I_{k \times k} & o_{k \times (u - k)} \\ 0_{(u - k) \times k} & B_{&Not; 1, . . ., k} \end{matrix}] = B^{(1)} + U^{'} {CV}^{'}

Here

Can get (B according to Woodbury matrix identity ⁽²⁾) ^-1=(B ⁽¹⁾+ U ' CV ') ^-1=G-GU ' (C ^-1+ V ' GU ') ^-1V ' G, G=(B here ⁽¹⁾) ^-1According to step (1) (B ⁽¹⁾) ^-1Known, calculate (B ⁽²⁾) ^-1Time complexity be calculating (C ^-1+ V ' GU ') ^-1Time complexity O (k ³).Because B ⁽²⁾Be the diagonal blocks matrix, so

According to step (1) and (2) as can be known, at known B ^-1Situation under calculate

Computation complexity can be reduced to O (k ³).

The 4th step: repeat (2) step, (3) step, up to finishing certain cycle index

The present invention proposes a kind of based on the cross-domain semantic concept detection method of initiatively learning.Consider original domain and the aiming field feature space different characteristics that distribute in the design of active learning algorithm, adopt Gaussian random field as the benchmark sorter, this benchmark sorter is considered the distribution at feature space of original domain and aiming field simultaneously.On selection strategy, choose for the uncertain maximum sample of benchmark sorter and mark, be predicted value near 0.5 sample, the mark that obtains these samples can allow sorter have high as far as possible classification performance under the situation of the least possible sample of mark.Consider in the cross-domain semantic concept detection, the classification performance of the sample in object territory more, so when the new samples that obtains to mark in aiming field joins in the benchmark sorter, increase the weight of this part sample, can allow sorter more be applicable to the distribution of aiming field data.Simultaneously, the present invention proposes a kind of quickening benchmark sorter updating strategy, and the Time Calculation complexity that sorter is upgraded effectively reduces.The experimental result that detects on public database TRECVID2005 and 2007 annual datas at semantic concept shows, with the result who inquires about (GRF_RAND) with Gaussian random field as benchmark sorter, employing picked at random sample, and with commonly used be the benchmark sorter with the support vector machine, adopt least that the result of determinacy query strategy (SVM_AL) compares, the present invention (GRF_AL) is significantly increased 36 notional testing results of video semanteme, referring to Fig. 2.Here adopt average accurate precision ratio as examination criteria, every kind of model carried out ten circulations in experiment, and what Fig. 2 showed is the average precision ratio of the tenth each model of circulation time.Simultaneously, compare with the average of the average precision ratio on 36 semantic concepts in the each circulation of other models, comparative result sees Table 1, and wherein the GRF_AL_NW method is not for changing the weight that adds new samples.Can find out that from experimental result GRF_AL has obtained the better classification performance than GRF_AL_NW.The result of GRF_AL and GRF_AL_NW is more approaching in the circulation several times of back, be because Gaussian random field has added the sample in the abundant aiming field that has marked, even if do not change the weight of new adding sample, this sorter has adapted to the feature space of aiming field sample.

The average of different models average detected rate on 36 semantic concepts in the each circulation of table 1

Claims

1. one kind based on the cross-domain video semanteme concept detection method of study initiatively, it is characterized in that may further comprise the steps:

(1) with Gaussian random field as the benchmark sorter

(B) by minimizing the loss function of Gaussian random field, obtain the estimated value of the sample point mark that does not have mark in the non-directed graph, promptly the mark of the sample in the aiming field is predicted;

(B) sample that will newly mark is concentrated from unlabeled data and is removed, and it is added labeled data collection;

(3) upgrade the benchmark sorter

(4) order repeating step (2) adopts the active learning algorithm to select sample point to mark, and step (3) is upgraded the benchmark sorter, up to finishing certain cycle index, promptly selected the sample point in the abundant aiming field to mark, the detection effect of benchmark sorter in aiming field reach a certain height.

2. according to claim 1 a kind of based on the cross-domain video semanteme concept detection method of initiatively learning, it is characterized in that: described step (1) is as follows as the method for benchmark sorter with Gaussian random field:

W wherein _IjRepresent the distance between two sample points, n is a sample visual signature dimension, x _IdAnd x _JdBe respectively the value of the d dimensional feature of sample i and j, σ is a smooth function;

Separate and satisfy Δ f _u=0, Δ is Laplce's matrix here, is defined as Δ=D-W, wherein D=diag (d _i),

W is a weight matrix; F wherein _lBe the mark value that the sample of mark is arranged, f _uBe the mark value of sample that does not have mark, obtain separate for

3. according to claim 1 a kind of based on the cross-domain video semanteme concept detection method of study initiatively, it is characterized in that: it is as follows that described step (2) adopts the least determinacy query strategy in the learning method initiatively to select not mark the step that sample marks:

Obtain their true mark value;

4. according to claim 1 a kind of based on the cross-domain video semanteme concept detection method of initiatively learning, it is characterized in that: the step that described step (3) is upgraded the benchmark sorter is as follows:

Secondly, when upgrading the gaussian random field model, adopt a kind of model update method fast to calculate for (B) in the step (3)

Behind the mark that increases k new sample, the number that does not mark sample will be decreased to u-k; The sample of remaining not mark is predicted as f ' _u=(Δ ' _Uu) ^-1W ' _UlF ' _l, wherein Δ ' _UuBe Δ _UuRemove the matrix of the capable k row of k;

Inverse matrix A at known matrix A ^-1Down, compute matrix A removes the new matrix that obtains behind the capable k row of k Inverse matrix

According to Woodbury matrixidentity theorem, convert B to a diagonal blocks matrix Wherein calculate (B ⁽²⁾) ^-1Time complexity be O (k ³); According to

Calculate as can be known