CN104217442B

CN104217442B - Aerial video moving object detection method based on multiple model estimation

Info

Publication number: CN104217442B
Application number: CN201410431932.1A
Authority: CN
Inventors: 张艳宁; 杨涛; 仝小敏
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2014-08-28
Filing date: 2014-08-28
Publication date: 2017-01-25
Anticipated expiration: 2034-08-28
Also published as: CN104217442A

Abstract

The invention relates to an aerial video moving object detection method based on multiple model estimation. The aerial photography video moving object detection method comprises the following steps: firstly, utilizing a Mean shift color segmentation method to segment a scene into a plurality of color blocks; then, utilizing dense pyramid luminous flux characteristics, and adopting a RANSAC (Random Sample Consensus) method to calculate an affine transformation model corresponding the color blocks with a large area; managing smaller color blocks, and analyzing point movement consistency in the blocks so as to carry out multiple model membership degree calculation, wherein the color blocks with the big membership degree are moving blocks, and otherwise, the color blocks are false alarm targets; and finally, combining and denoising the moving blocks, and outputting and displaying a detection result. The invention obtains a 5% detection error rate of a test in a public aerial video database, and the error rate is lowered by 5% than a traditional 10% error rate.

Description

Video moving object detection method of taking photo by plane based on multiple-model estimator

Technical field

The present invention relates to Detection for Moving Target, more particularly to a kind of video motion of taking photo by plane based on multiple-model estimator Object detection method.

Background technology

Video frequency motion target of taking photo by plane detection is the important subject of computer vision field.Existing video motion of taking photo by plane Object detection method is based primarily upon the moving object detection framework that single background model is estimated.Document " moving object Detection in aerial video based on spatiotemporal saliency, chinese journal of Aeronautics, 2013,26 (5): 1211-1217 " propose a kind of video motion mesh of taking photo by plane based on time and space significance analysis Mark detection algorithm.The method first passes through the marking area that estimation background model and frame difference obtain on time dimension, as slightly extracts Candidate region, carry out significance analysis in space dimension afterwards, obtain the appearance details of target in candidate region, finally by the time Significant characteristics spatially combine, and obtain accurate moving object detection result.But the method testing result is fine or not Depend critically upon the complexity of scene, once there is overhead building constructions, electric pole, overpass etc. in scene not in the back of the body Static object in scape plane, will occur false-alarm, detect error rate averagely about 10%.

Content of the invention

Technical problem to be solved

In order to avoid video moving object detection method of taking photo by plane of the prior art is easily affected by scene complexity, on the scene There are multiple background models in scape, overhead static state building, electric pole etc. cause parallax object in the case of false alarm rate very Height, thus leading to detect mistake, the present invention proposes a kind of video moving object detection method of taking photo by plane based on multiple-model estimator.

Technical scheme

A kind of video moving object detection method of taking photo by plane based on multiple-model estimator is it is characterised in that step is as follows:

Step 1: using pyramid mean shift method, Color Segmentation is carried out to current frame image, take area to be more than given The colored block of threshold value thresh is background block { patchb_i| i=1,2 ..., bnum }, remaining is foreground blocks { patchf_j| j=1, 2 ..., fnum }, wherein bnum represents background block number in frame segmentation result, i-th piece of patchb_i={ area_i,pset_i, pset_i', area_iRepresent that i-th piece of area is comprised to count, pset_iRepresent i-th piece of coordinate set being comprised a little, pset_i' represent pset_iThe coordinate set of middle corresponding point a little in consecutive frame；Fnum represents foreground blocks number, jth block patchf_j={ area_j,pset_j,pset_j'},area_jRepresent that the area of j-th foreground blocks is comprised to count, pset_jRepresent J-th foreground blocks are comprised coordinate set a little, pset_j' represent pset_jThe coordinate set of middle corresponding point a little in consecutive frame Close；

Step 2: using pyramid optical flow algorithm to the point set pset in background block_iCarry out dense optical flow sign extraction, meter Calculate pset_iPixel (x₀,y₀) in consecutive frame, the coordinate of corresponding point is (x₀',y₀'):

x₀'=x₀+u(x₀,y₀)

y₀'=y₀+v(x₀,y₀)

u(x₀,y₀),v(x₀,y₀) represent pixel (x respectively₀,y₀) horizontal direction at place, the light stream of vertical direction；Using Same method calculates foreground blocks Point Set pset_jPixel (x₁,y₁) in consecutive frame, the coordinate of corresponding point is (x₁',y₁'):

x₁'=x₁+u(x₁,y₁)

y₁'=y₁+v(x₁,y₁)

Step 3: calculate background block patchb using ransac method_iAffine Transform Model af_i, calculate pset_iPixel Point (x₀,y₀) corresponding projection error

(\begin{matrix} {err}_{x_{0}} \\ {err}_{y_{0}} \end{matrix}) = {af}_{i} \cdot (\begin{matrix} x_{0} \\ y_{0} \\ 1 \end{matrix}) - (\begin{matrix} {x_{0}}^{'} \\ {y_{0}}^{'} \end{matrix})

IfThen (x₀,y₀) for meeting affine Transform Model af_iBackground dot, otherwise for the back of the body Noise spot in scape；Multi-model set is designated as af={ af_i| i=1,2 ..., bnum }, af_i=[r_i|t_i], wherein r_iAnd t_iPoint Wei not spin matrix and translation matrix；

Step 4: calculate current j-th foreground blocks patchf_jIn k-th point of (x_j,k,y_j,k) under i-th background model Motion vector:

(\begin{matrix} v_{j, k} (x) \\ v_{j, k} (y) \end{matrix}) = (\begin{matrix} {x_{j, k}}^{'} \\ {y_{j, k}}^{'} \end{matrix}) - {af}_{i} \times (\begin{matrix} x_{j, k} \\ y_{j, k} \end{matrix})

Wherein (x_j,k',y_j,k') represent (x_j,k,y_j,k) in pset_j' in corresponding point coordinate, v_j,k(x) and v_j,k(y) difference Represent movement velocity both horizontally and vertically；

Calculate j-th foreground blocks degree of membership to i-th background model:

m (i, j) = \frac{σ_{k = 1}^{{area}_{j}} (v_{j, k} (x) - \overset{&overbar;}{v_{j} (x)}) (v_{j, k} (y) - \overset{&overbar;}{v_{j} (y)})}{\sqrt{σ_{k = 1}^{{area}_{j}} {(v_{j, k} (x) - \overset{&overbar;}{v_{j} (x)})}^{2} \cdot σ_{k = 1}^{{area}_{j}} {(v_{j, k} (y) - \overset{&overbar;}{v_{j} (y)})}^{2}}}

\overset{&overbar;}{v_{j} (x)} = \frac{σ_{k = 1}^{{area}_{j}} v_{j, k} (x)}{{area}_{j}}

\overset{&overbar;}{v_{j} (y)} = \frac{σ_{k = 1}^{{area}_{j}} v_{j, k} (y)}{{area}_{j}}

Step 5: according to degree of membership calculating Subject Matrix p:

p (i, j) = \{\begin{matrix} 1 & m (i, j) > θ \\ 0 & o t h e r w i s e \end{matrix}

Wherein θ is degree of membership threshold value, ifThen determine that j-th foreground blocks is moving mass；Moving mass is entered Row merges:

A) initialization model integrates queue q as null, and object set queue o is null；

B) count the background model that current j-th foreground blocks are subordinate to, the set of these background models is designated as s_j:

s_{j} = {{af}_{i} | \forall i, p (i, j) = 1}

C) travel through Models Sets queue q, ifMake q (m) ∩ s_j≠ null, then by s_jIt is incorporated to q (m), q (m)=q (m) ∪ s_j, j-th foreground blocks is incorporated to corresponding object set queue o, otherwise execution step d)；

D) by s_jAdd Models Sets queue q as newcomer, corresponding foreground blocks add object set queue o as newcomer；

Step 6: traversal object set queue o, if there are two block centre distances in current membership to be less than first threshold, enters Row merges, and carries out noise remove after merging, area in testing result is less than to the moving mass of Second Threshold, as mistake inspection Survey is removed, finally by testing result display output.

Described threshold value thresh=0.01 × imgwidth × imgheight, imgwidth and imgheight table respectively Show width and the height of inputted video image.

Described dis takes 1.

Described θ=0.6.

Described first threshold takes 1.5 times of two block radius sums.

Described Second Threshold takes 100 pixels.

Beneficial effect

A kind of video moving object detection method of taking photo by plane based on multiple-model estimator proposed by the present invention, using Color Segmentation Scene partitioning is multiple models, and extract pyramid Optical-flow Feature multiple-model estimator is carried out to scene so that testing result not Uniform background is depended on to estimate again and registering result, thus improving detection algorithm robustness；Additionally, by target to multi-model Degree of membership calculate, can effectively remove the false-alarm targets such as building constructions, electric pole, tall and big trees, detection lower error rate is extremely 5%.

Specific embodiment

In conjunction with embodiment, the invention will be further described:

1st, mean shift Color Segmentation

One section of video sequence of taking photo by plane of input, is to carry out coloured silk using pyramid mean shift method to current frame image first Color is split.In order to ensure computational efficiency and degree of accuracy simultaneously, originally apply in example and take the pyramid number of plies to be 3, color window width is 10, space Position window width is 10.The colored block that area is more than given threshold value thresh is taken to be background block, remaining is foreground blocks, originally applies in example and takes Thresh=0.01 × imgwidth × imgheight, imgwidth and imgheight represent the width of inputted video image respectively And height.{patchb_i| i=1,2 ..., bnum } represent background block in present frame segmentation result, wherein bnum represents frame segmentation knot Background block number in fruit, i-th piece of patchb_i={ area_i,pset_i,pset_i', area_iRepresent that i-th piece of area is comprised Points, pset_iRepresent i-th piece of coordinate set being comprised a little, pset_i' represent pset_iMiddle corresponding point a little in consecutive frame Coordinate, be set to null.{patchf_j| j=1,2 ..., fnum } represent foreground blocks in present frame segmentation result, fnum Represent foreground blocks number, each foreground blocks equally also comprises area and point set patchf_j={ area_j,pset_j,pset_j'}, area_jRepresent that the area of j-th foreground blocks is comprised to count, pset_jRepresent that j-th foreground blocks are comprised coordinate set a little, pset_j' represent pset_jThe coordinate of middle corresponding point a little in consecutive frame, is set to null.

2nd, dense optical flow signature tracking

In view of ageing and accuracy, using pyramid optical flow algorithm, dense optical flow body is carried out to present frame and consecutive frame Levy extraction, calculate the light stream vector of each point pixel-by-pixel, that is, predict position in consecutive frame for each pixel, originally apply in example and take Pyramid coefficient is 0.5, and the pyramid number of plies is 3, window width 15, and iterationses are 3, smoothing windows a width of 5, and variance is 1.2.u(x, Y), v (x, y) represents the horizontal direction at pixel (x, y) place, the light stream of vertical direction respectively, then pixel (x, y) is adjacent In frame, the coordinate of corresponding point is (x', y').

X '=x+u (x, y) (3)

Y '=y+v (x, y)

So far, background block Point Set pset can be calculated_iCorresponding point set pset in consecutive frame_i':There is (x₀',y₀')∈pset_i', wherein x₀'=x₀+u(x₀,y₀),y₀'=y₀+v(x₀,y₀).

Foreground blocks Point Set pset can also be calculated_jCorresponding point set pset in consecutive frame_j':Have (x₁',y₁')∈pset_j', wherein x₁'=x₁+u(x₁,y₁),y₁'=y₁+v(x₁,y₁).

3rd, multiple background model estimation

Each background block patchb in segmentation result_iA corresponding affine Transform Model.Using ransac method and background Pixel point set pset in block_i、pset_i' to the affine Transform Model af calculating between present frame and next frame_i, and by pset_iIn Institute is a little according to affine model af_iProjection error be divided into interior point and exterior point, interior point as meets transformation model af_iBackground Point, exterior point is then the noise spot in this background.(x₀,y₀) corresponding projection errorIt is calculated as follows:

(\begin{matrix} {err}_{x_{0}} \\ {err}_{y_{0}} \end{matrix}) = {af}_{i} \cdot (\begin{matrix} x_{0} \\ y_{0} \\ 1 \end{matrix}) - (\begin{matrix} {x_{0}}^{'} \\ {y_{0}}^{'} \end{matrix}) - - - (4)

IfThen (x₀,y₀) it is interior point, otherwise for exterior point.Originally apply in example and take distance threshold dis For 1, confidence level takes 0.99.Multi-model set is designated as af={ af_i| i=1,2 ..., bnum }.

af_i=[r_i|t_i] (5)

Wherein r_iAnd t_iIt is respectively spin matrix and translation matrix.

4th, degree of membership calculates

Calculate subordinated-degree matrix m_bnum×fnum, the degree of membership to i-th background model for m (i, j) j-th foreground blocks of expression, this In degree of membership be defined as the correlation coefficient of point motion vector in block, i.e. point Movement consistency tolerance in block.Theoretically analyze, very Positive target all point motion vectors in its background model more consistent it should have larger degree of membership, and building construction, The decoys such as electric pole trees then do not possess above-mentioned attribute.So current j-th foreground blocks patchf_jIn k-th point of (x_j,k, y_j,k) the motion vector computation formula under i-th background model is as follows:

(\begin{matrix} v_{j, k} (x) \\ v_{j, k} (y) \end{matrix}) = (\begin{matrix} {x_{j, k}}^{'} \\ {y_{j, k}}^{'} \end{matrix}) - {af}_{i} \times (\begin{matrix} x_{j, k} \\ y_{j, k} \end{matrix}) - - - (6)

Wherein (x_j,k',y_j,k') represent (x_j,k,y_j,k) in pset_j' in corresponding point coordinate, v_j,k(x) and v_j,k(y) difference Represent movement velocity both horizontally and vertically.The average of j-th foreground blocks movement velocity is:

\overset{&overbar;}{v_{j} (x)} = \frac{σ_{k = 1}^{{area}_{j}} v_{j, k} (x)}{{area}_{j}} - - - (7)

\overset{&overbar;}{v_{j} (y)} = \frac{σ_{k = 1}^{{area}_{j}} v_{j, k} (y)}{{area}_{j}} - - - (8)

So j-th foreground blocks are as follows to the degree of membership computing formula of i-th background model:

m (i, j) = \frac{σ_{k = 1}^{{area}_{j}} (v_{j, k} (x) - \overset{&overbar;}{v_{j} (x)}) (v_{j, k} (y) - \overset{&overbar;}{v_{j} (y)})}{\sqrt{σ_{k = 1}^{{area}_{j}} {(v_{j, k} (x) - \overset{&overbar;}{v_{j} (x)})}^{2} \cdot σ_{k = 1}^{{area}_{j}} {(v_{j, k} (y) - \overset{&overbar;}{v_{j} (y)})}^{2}}} - - - (9)

M (i, j) ∈ [0,1], and closer to 1, m (i, j) shows that j-th foreground blocks is subordinate to i-th foreground model Degree is bigger, is that the probability of moving target in this background is bigger.

5th, testing result management and output

First Subject Matrix p is calculated according to degree of membership as follows:

p (i, j) = \{\begin{matrix} 1 & m (i, j) > θ \\ 0 & o t h e r w i s e \end{matrix} - - - (10)

Wherein θ it is contemplated that influence of noise and edge's light stream mistake, originally applies in example and takes θ=0.6 for degree of membership threshold value.IfThen determine that j-th foreground blocks is moving mass.In view of moving target itself because of color inconsistent and It is divided into multiple moving mass, need to carry out the merging of moving mass.

Initialization model integrates queue q as null, and object set queue o is null.Traversal Subject Matrix p holds line by line from top to bottom Row is following to be operated:

A) count the background model that current j-th foreground blocks are subordinate to, the set of these background models is designated as s_j:

s_{j} = {{af}_{i} | \forall i, p (i, j) = 1} - - - (11)

B) travel through Models Sets queue q, ifMake q (m) ∩ s_j≠ null, then by s_jIt is incorporated to q (m), q (m)=q (m) ∪ s_j, j-th foreground blocks is incorporated to corresponding object set queue o, otherwise execution step c)；

C) by s_jAdd Models Sets queue q as newcomer, corresponding foreground blocks add object set queue o as newcomer.

So far obtain Models Sets queue and corresponding object set queue, carry out the merging of foreground blocks below.Traversal target Collection queue, if there are two block centre distances in current membership and (originally applying and take the 1.5 of two block radius sums in example less than given threshold value Times) then merge.Carry out noise remove after merging, area in testing result (is originally applied in example less than given threshold value 100 pixels) moving mass, be removed as error detection.Finally by testing result display output.

Claims

1. a kind of video moving object detection method of taking photo by plane based on multiple-model estimator is it is characterised in that step is as follows:

Step 1: using pyramid mean shift method, Color Segmentation is carried out to current frame image, take area to be more than given threshold value The colored block of thresh is background block { patchb_i| i=1,2 ..., bnum }, remaining is foreground blocks { patchf_j| j=1, 2 ..., fnum }, wherein bnum represents background block number in frame segmentation result, i-th piece of patchb_i={ area_i,pset_i, pset_i', area_iRepresent that i-th piece of area is comprised to count, pset_iRepresent i-th piece of coordinate set being comprised a little, pset_i' represent pset_iThe coordinate set of middle corresponding point a little in consecutive frame；Fnum represents foreground blocks number, jth block patchf_j={ area_j,pset_j,pset_j'},area_jRepresent that the area of j-th foreground blocks is comprised to count, pset_jRepresent J-th foreground blocks are comprised coordinate set a little, pset_j' represent pset_jThe coordinate set of middle corresponding point a little in consecutive frame Close；

Step 2: using pyramid optical flow algorithm to the point set pset in background block_iCarry out dense optical flow sign extraction, calculate pset_iPixel (x₀,y₀) in consecutive frame, the coordinate of corresponding point is (x₀',y₀'):

x₀'=x₀+u(x₀,y₀)

y₀'=y₀+v(x₀,y₀)

x₁'=x₁+u(x₁,y₁)

y₁'=y₁+v(x₁,y₁)

Step 3: calculate background block patchb using ransac method_iAffine Transform Model af_i, calculate pset_iPixel (x₀,y₀) corresponding projection error

(\begin{matrix} e r r_{x_{0}} \\ {err}_{y_{0}} \end{matrix}) = {af}_{i} \cdot (\begin{matrix} x_{0} \\ y_{0} \\ 1 \end{matrix}) - (\begin{matrix} {x_{0}}^{'} \\ {y_{0}}^{'} \end{matrix})

IfThen (x₀,y₀) for meeting affine Transform Model af_iBackground dot, otherwise in background Noise spot；Multi-model set is designated as af={ af_i| i=1,2 ..., bnum }, af_i=[r_i|t_i], wherein r_iAnd t_iIt is respectively and revolve Torque battle array and translation matrix；

Step 4: calculate current j-th foreground blocks patchf_jIn k-th point of (x_j,k,y_j,k) motion under i-th background model Vector:

(\begin{matrix} v_{j, k} (x) \\ v_{j, k} (y) \end{matrix}) = (\begin{matrix} {x_{j, k}}^{'} \\ {y_{j, k}}^{'} \end{matrix}) - {af}_{i} \times (\begin{matrix} x_{j, k} \\ y_{j, k} \end{matrix})

Wherein (x_j,k',y_j,k') represent (x_j,k,y_j,k) in pset_j' in corresponding point coordinate, v_j,k(x) and v_j,kY () represents respectively Movement velocity both horizontally and vertically；

Calculate j-th foreground blocks degree of membership to i-th background model:

m (i, j) = \frac{σ_{k = 1}^{{area}_{j}} (v_{j, k} (x) - \overset{&overbar;}{v_{j} (x)}) (v_{j, k} (y) - \overset{&overbar;}{v_{j} (y)})}{\sqrt{σ_{k = 1}^{{area}_{j}} {(v_{j, k} (x) - \overset{&overbar;}{v_{j} (x)})}^{2} \cdot σ_{k = 1}^{{area}_{j}} {(v_{j, k} (y) - \overset{&overbar;}{v_{j} (y)})}^{2}}}

\overset{&overbar;}{v_{j} (x)} = \frac{σ_{k = 1}^{{area}_{j}} v_{j, k} (x)}{{area}_{j}}

\overset{&overbar;}{v_{j} (y)} = \frac{σ_{k = 1}^{{area}_{j}} v_{j, k} (y)}{{area}_{j}}

Step 5: according to degree of membership calculating Subject Matrix p:

p (i, j) = \{\begin{matrix} 1 & m (i, j) > θ \\ 0 & o t h e r w i s e \end{matrix}

Wherein θ is degree of membership threshold value, ifThen determine that j-th foreground blocks is moving mass；Moving mass is closed And:

s_{j} = {{af}_{i} | \forall i, p (i, j) = 1}

C) travel through Models Sets queue q, ifMake q (m) ∩ s_j≠ null, then by s_jIt is incorporated to q (m), q (m)=q (m) ∪ s_j, will J-th foreground blocks is incorporated to corresponding object set queue o, otherwise execution step d)；

Step 6: traversal object set queue o, if there are two block centre distances in current membership to be less than first threshold, is closed And, carry out noise remove after merging, area in testing result is less than to the moving mass of Second Threshold, enters as error detection Row removes, finally by testing result display output.

2. according to claim 1 based on multiple-model estimator take photo by plane video moving object detection method it is characterised in that Described threshold value thresh=0.01 × imgwidth × imgheight, imgwidth and imgheight represents that input regards respectively The width of frequency image and height.

3. according to claim 1 based on multiple-model estimator take photo by plane video moving object detection method it is characterised in that Described dis takes 1.

4. according to claim 1 based on multiple-model estimator take photo by plane video moving object detection method it is characterised in that Described θ=0.6.

5. according to claim 1 based on multiple-model estimator take photo by plane video moving object detection method it is characterised in that Described first threshold takes 1.5 times of two block radius sums.

6. according to claim 1 based on multiple-model estimator take photo by plane video moving object detection method it is characterised in that Described Second Threshold takes 100 pixels.