CN107330432B

CN107330432B - Multi-view vehicle detection method based on weighted Hough voting

Info

Publication number: CN107330432B
Application number: CN201710554766.8A
Authority: CN
Inventors: 李冬梅; 李涛; 向涛; 朱晓珺; 张栋梁; 曲豪; 汪伟; 郭航宇
Original assignee: Yancheng Chantu Intelligent Technology Co ltd
Current assignee: YANCHENG CHANTU INTELLIGENT TECHNOLOGY Co.,Ltd.
Priority date: 2017-07-07
Filing date: 2017-07-07
Publication date: 2020-08-18
Anticipated expiration: 2037-07-07
Also published as: CN107330432A

Abstract

The invention provides a weighted Hough voting-based multi-view vehicle detection method, which comprises the following steps of: step A: defining a training sample image set; and B: carrying out visual angle subclass division on a positive sample set in a training sample image set; and C: calculating the contribution weight of each positive sample to different view subclasses; step D: determining the voting score of the image block at the candidate position by using a weighted Hough voting method; step E: determining a vehicle detection frame in the test image; according to the method, automatic division of subclasses of different viewing angles of the vehicle is realized by utilizing the LLE and the k-means, voting weights of a positive sample set under different viewing angles in the Hough voting process are defined by utilizing the division, and the Hough voting for accurate positioning is carried out by combining the voting weights, so that the accurate detection of the vehicle under multiple viewing angles is realized; compared with the prior art, the method greatly improves the detection speed, and effectively utilizes the shared information among the subclasses with different visual angles, thereby further improving the accuracy of vehicle detection.

Description

Multi-view vehicle detection method based on weighted Hough voting

Technical Field

The invention relates to the field of vehicle detection in a video traffic environment, in particular to a weighted Hough voting-based multi-view vehicle detection method.

Background

With the fact that automobiles increasingly become the most important tool in daily life, vehicle detection also becomes an important component of an intelligent traffic system in a smart city, but in a real scene, multi-view vehicle detection is still a difficult problem of vehicle detection, and due to the fact that vehicles are presented in different views in images when the vehicles move or shooting positions are different, appearance characteristics of the vehicles are greatly different, and accordingly accuracy of vehicle detection is sharply reduced.

Vehicle detection for multiple viewing angles in the prior art is mainly classified into three major categories, first: dividing an image training set into different subclasses by using a manual method or based on sample Aspect Ratio (Aspect Ratio), wherein each subclass comprises a certain range of visual angle change, and establishing a detection model for each subclass independently; secondly, the method comprises the following steps: based on an automatic subclassification method, or embedding an unsupervised clustering process in the process of learning the detector; thirdly, the method comprises the following steps: embedding 3D visual angle information in the model, and estimating a target visual angle; the three methods solve the multi-view vehicle detection problem from different angles, but all have obvious defects or limitations, such as ignoring the characteristic commonality of multi-view targets or difficult acquisition of 3D view information, and the like, thereby causing inaccurate multi-view vehicle detection results.

Disclosure of Invention

The invention aims to provide a multi-view vehicle detection method based on weighted Hough voting, which can effectively solve the problem that in the prior art, the multi-view vehicle detection result is inaccurate due to the fact that the common characteristics of multi-view targets are ignored or 3D view information is ignored.

In order to achieve the purpose, the invention adopts the following technical scheme:

a multi-view vehicle detection method based on weighted Hough voting comprises the following steps:

step A: defining a set of training sample images

The image size is 128 × 64, where f_iFor image feature expression, HOG features are adopted when multi-view division is carried out, and multi-channel pixel features are adopted when visual words are trained and Hough voting is carried out; y is_i∈ { -1, +1} is the training sample label, y_iWhen is-1 time I_iAs a background sample, y_i1 time I ═ 1 time_iIs a target sample; n is the size of the training sample set;

and B: for training sample image set

Set of positive samples in (1)

Performing view subclassing, wherein N⁺Representing the number of positive samples;

and C: calculating the contribution weight of each positive sample to different view subclasses;

step D: determining the voting score of the image block at the candidate position by using a weighted Hough voting method;

step E: a vehicle detection frame is determined in the test image.

The step B comprises the following steps:

step B1: using LLE algorithm to set D positive samples⁺Embedding a positive sample image expressed by the HOG characteristics into a two-dimensional space;

step B2: selecting a central point of a ring formed by sample point distribution in a two-dimensional space, and regularizing all samples to a circle based on relative angles from the sample points to the central point;

step B3: clustering samples on the circle by using a k-means algorithm, and collecting a positive sample set D⁺Divided into K viewsThe horn class.

The step C specifically adopts the following method:

in LLE embedding space, the cluster center of the kth view subclass sample set is defined as o_kK ∈ {1, 2.., K }, positive sample set D⁺Mean sample image_jThe expression in the LLE embedding space is f_jAll samples are positive_jContribution weight w to view subclass k_jkIs defined as:

wherein d (f'_j,o_k) Is LLE embedded in space f'_jAnd o_kThe distance between them; to ensure the correctness of the calculation, the sample image needs to be aligned_jContribution weight w under each view subclass_jkNormalization is performed to ensure that the sum of weights is 1, i.e.

The step D comprises the following steps:

step D1: definition and image block p_tThe matched visual word is L, and the set of L containing the offset vectors of the positive sample image blocks is E_LAnd obtaining the classification probability C by counting the proportion of the image blocks containing the positive samples in the L_LThen image block p_tThe vote score at candidate position h is:

wherein E is_LThe voting of each voting unit E to the candidate position h is estimated by using a Gaussian Parzen window, | E_LI denotes set size, q_tFor image block p_tThe center position of (a) is the standard deviation of a gaussian Parzen window; after the visual word L is generated, its corresponding classification probability C_LDetermined, image block p_tThe voting score at the candidate position h depends mainly on the set of offset vectors E_LVoting in (1)Unit e, using the linear accumulation property of Hough voting score, can convert V (h | p)_t) Is defined by the accumulation of E_LThe voting score of each voting unit to the candidate position h is rewritten into an accumulation sum E_LPositive sample image associated with middle voting unit_jThe form of voting score on candidate position h, namely:

wherein the content of the first and second substances,

wherein the content of the first and second substances,

represents E_LThe voting unit e in (a) is from the positive sample set D⁺Mean sample image_j(ii) a Go through the image block p in the test image G_tThe final vote score at candidate position h is:

step D2: due to the positive sample set D⁺The final Hough map generated by voting is disordered due to overlarge difference of visual angles of the images of all the samples, bright spots are not concentrated enough, and the candidate position h cannot be accurately determined, so that the scheme is

A perspective variable K ∈ {1, 2., K } is introduced into the defined voting model to define the voting under the same perspective so as to ensure the perspective consistency of the voting on the candidate position h, that is:

the formula calculation flow is as follows: firstly, the multi-view subclassing method described in step B is used as a positive sample set D⁺Each sample image is labeled with a view angle sub-class K ∈ {1,2

Calculating a view angle contribution weight w for each sample image_jk,j∈{1,2,...,N⁺}; positive sample set D with view angle and view angle contribution weight finally calibrated⁺Then, then

Is redefined as:

wherein W is W_jkThe capital of which is N⁺× K.

The step E comprises the following steps:

step E1: firstly, the test image is subjected to scale decomposition, and the scale space of the test image is defined as

M is the number of discrete scales;

step E2: in the dimension λ_mThe same size image blocks in the test image are densely sampled, the visual word matching each image block is found, and then the above formula is used

Obtaining a Hough voting graph under the scale;

step E3: at (h, λ)_m) In the formed three-dimensional Hough space, the mean-shift algorithm and a judgment threshold value are utilized to determine the final target center position (h ', lambda'm), and the final target center position is in the original test chart

Position mark size of

The detection frame of (1).

The invention has the beneficial effects that:

compared with the prior art, the weighted Hough voting-based multi-view vehicle detection method has the advantages that automatic division of subclasses of different views of the vehicle is achieved through Local Linear Embedding (LLE) and k-means, voting weights of a positive sample set under different views in the Hough voting process are defined through the division, and accurate positioning Hough voting is conducted through the combination of the voting weights, so that accurate detection of the vehicle under multiple views is achieved; compared with the prior art, the method greatly improves the detection speed, and effectively utilizes the shared information among the subclasses with different visual angles, thereby further improving the accuracy of vehicle detection.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a schematic diagram of a portion of the results of the assay using the present invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention relates to a weighted Hough voting-based multi-view vehicle detection method, which comprises the following steps of:

step A: defining a set of training sample images

The image size is 128 × 64, where f_iExpressing image features (HOG features are adopted when multi-view division is carried out, and multi-channel pixel features are adopted when visual words are trained and Hough voting is carried out); y is_i∈ { -1, +1} is a training sample label (y)_iWhen is-1 time I_iAs a background sample, y_i1 time I ═ 1 time_iIs a target sample); n is the size of the training sample set;

in the process of collecting the training sample, the background images and the target images are consistent in size and close in number, and the diversity of the training sample should be ensured as much as possible, that is, the background images should include various scenes that the target may appear, and the target images should include various visual angle forms that the target may present.

And B: for training sample image set

Set of positive samples in (1)

Performing multi-view subclassing, wherein N⁺Representing the number of positive samples; the method specifically comprises the following steps:

step B1: using LLE algorithm to set D positive samples⁺Embedding a positive sample image (multi-view vehicle image) expressed by the HOG characteristic into a two-dimensional space; the result shows that after the multi-view vehicle image represented by the HOG feature is embedded into the two-dimensional space, the sample points are distributed to form a ring, and the vehicle view angle changes steadily along the ring;

step B2: from embedded samples (sample set D)⁺A positive sample image expressed by a medium HOG feature), selecting a central point of a ring formed by the distribution of sample points in a two-dimensional space, and regularizing all samples on a circle based on the relative angle from the sample points to the central point, wherein samples in the similar area on the circle have a similar visual angle;

step B3: as shown in fig. 1: clustering samples on the circle by using a K-means algorithm, dividing samples with similar visual angles into the same arc of the circle, wherein each arc on the circle represents a subclass with similar visual angles and is divided into K visual angle subclasses;

and C: calculating the contribution weight of each sample to different view subclasses, specifically adopting the following method:

when training the sample image set D, the positive sample set D⁺After the K visual angle subclasses are divided and determined, calculating the contribution weight of each positive sample to each visual angle subclass so as to fully utilize the sharing and difference of information among the visual angle subclasses; the calculation method is as follows:

in LLE embedding space, the cluster center of a k-th view subclass sample set is defined as o_kK ∈ {1, 2.., K }, positive sample set D⁺Mean sample image_jExpression in the LLE embedding space is f ″_jThen the sample image is positive_jContribution weight w to view subclass k_jkIs defined as:

wherein d (f'_j,o_k) Is LLE embedded in space f'_jAnd o_kThe distance between them; to ensure the correctness of the calculation, the aligned sample is needed_jContribution weight w under each view sub-class_jkNormalization is performed to ensure that the sum of weights is 1, i.e.

Step D: determining the voting score of the image block at the candidate position by using a weighted Hough voting method, which specifically comprises the following steps:

step D1: defining and testing local image blocks p_tThe matched visual word is L, and the set of L containing the offset vector of the image block of the normal sample is E_LObtaining the classification probability C by counting the proportion of the image blocks of the normal sample in L_LThen image block p_tVoting value at candidate position h is

Wherein E is_LThe vote of each offset vector E for candidate position h is estimated using a Gaussian Parzen window, | E_LI denotes set size, q_tFor image block p_tThe center position of (a) is the standard deviation of a gaussian Parzen window; after the visual word L is generated, its corresponding classification probability C_LDetermined, local image block p_tThe voting score at the candidate position h depends mainly on the set of offset vectors E_LThe voting unit e in (1) can convert V (h | p) into V (h | p) by utilizing the linear accumulation characteristic of Hough voting scores_t) Is defined by the accumulation of E_LIn the method, the voting score of each voting unit to the candidate position h is rewritten into an accumulation E_LPositive sample image associated with middle voting unit_jThe form of voting score on candidate position h, namely:

wherein the content of the first and second substances,

wherein the content of the first and second substances,

represents E_LThe voting unit e in (a) is from the positive sample set D⁺Mean sample image_j(ii) a Traversing all local image blocks p in test image G_tThe final vote score for candidate position h is:

the formula calculation flow is as follows: firstly, a visual angle subclassing method is utilized to obtain a positive sample set D⁺Each image labeled with a view angle sub-class K ∈ {1, 2.., K }, and then using the above formula

Calculating a view contribution weight w for each image_jk,j∈{1,2,...,N⁺}; finally, a sample set D is corrected through a sample set with a view angle and a view angle contribution weight calibrated⁺Then, then

Is redefined as:

wherein W is W_jkThe capital of which is N⁺× K.

Step E: determining a final target detection frame in the vehicle detection image under the multiple view angles, wherein the steps are specifically described as follows:

M is the number of discrete scales;

Obtaining a Hough voting graph under the scale;

step E3: at (h, λ)_m) Forming a three-dimensional Hough space (h ═ h)_x,h_y) Including the horizontal and vertical coordinate positions in the image), the final target center position (h ', lambda ') is determined by using the mean-shift algorithm and the judgment threshold value '_m) And in the original test pattern

Position mark size of

The detection frame of (1).

The first embodiment is as follows:

the embodiment comprises a plurality of image samples, and when the vehicle position in the image samples is detected by using the weighted Hough voting-based multi-view vehicle detection method, the following steps are adopted:

step A: defining a set of training sample images

The sample image sizes used were all normalized to 128 × 64, where f_iExpressing image features (HOG features are adopted when multi-view division is carried out, and multi-channel pixel features are adopted when visual words are trained and Hough voting is carried out); y is_i∈ { -1, +1} is a training sample label (y)_iWhen is-1 time I_iAs a background sample, y_i1 time I ═ 1 time_iIs a target sample); n is the size of the training sample set;

in the process of collecting the training sample images, the background images and the target images are consistent in size and close in number, and the diversity of the training sample images should be ensured as much as possible, that is, the background images should include various scenes in which the target may appear, and the target images should include various forms in which the target may appear.

And B: for training sample image set

Set of positive samples in (1)

Performing multi-view subclassing, wherein N⁺Representing the number of positive samples; in this embodiment, the viewing angle sub-number is defined as 8; the method specifically comprises the following steps:

step B1: using LLE algorithm to set D positive samples⁺Embedding a positive sample image (multi-view vehicle image) expressed by the HOG characteristic into a two-dimensional space, wherein sample points are distributed to form a ring, and the vehicle view angle changes smoothly along the ring;

step B2: selecting a central point of a ring formed by the distribution of the sample points in the step B1, and regularizing all samples on a circle O based on the relative angles of the sample points to the central point, wherein the samples in the similar areas on the circle O have close visual angles;

step B3: clustering samples on the circle O by using a k-means algorithm, dividing the samples with similar visual angles to the same arc of the circle O, wherein each arc represents a subclass with similar visual angles and is divided into 8 visual angle subclasses;

and C: calculating the contribution weight of each sample to 8 different view subclasses, specifically adopting the following method:

defining the cluster center of the divided kth view subclass sample set as o in LLE embedding space_kK ∈ {1, 2.., 8}, positive sample set D⁺Mean sample image_jExpression in LLE embedding space is f_jAll samples are positive_jContribution weight w to view subclass k_jkIs defined as:

wherein d (f'_j,o_k) Is LLE embedded in space f'_jAnd o_kThe distance between them; to ensure the correctness of the calculation, the aligned sample is needed_jContribution weight w under each view subclass_jkNormalization is performed to ensure that the sum of weights is 1, i.e.

Step D: determining image block p by using weighted Hough voting method_tThe voting score at the candidate position h specifically comprises the following steps:

step D1: definition and image block p_tThe matched visual word is L, and the set of L containing the offset vector of the image block of the normal sample is E_LObtaining the classification probability C by counting the proportion of the image blocks of the normal sample in L_LThen image block p_tVoting value at candidate position h is

Wherein E is_LThe vote of each offset vector E for candidate position h is estimated using a Gaussian Parzen window, | E_LI denotes set size, q_tFor image block p_tThe center position of (a); local image block p_tThe voting score at the candidate position h depends mainly on the set of offset vectors E_LThe voting unit e in (1) can convert V (h | p) into V (h | p) by utilizing the linear accumulation characteristic of Hough voting scores_t) Is defined by the accumulation of E_LThe voting score of each voting unit to the candidate position h is rewritten into an accumulation sum E_LPositive sample image associated with middle voting unit_jThe form of voting score on candidate position h, namely:

wherein the content of the first and second substances,

wherein the content of the first and second substances,

Is redefined as:

wherein W is W_jkThe capital of which is N⁺× K;

step E: determining a final target detection frame in a vehicle detection image under multiple visual angles, and specifically comprising the following steps:

M is the number of discrete scales;

step E2: in the dimension λ_mIn the test image of (2), the image blocks of the same size are densely sampled, and in the present embodiment, the scale is λ_mRandomly extracting 50 image blocks with the size of 16 × 16 from each sample image in the test image, calibrating the visual angle class of each image block by using the step B, generating a visual word L by using a random forest, wherein the number of trees in the forest is 20, the maximum depth of the forest is 15, the minimum number of image blocks in splitting nodes is 20, the maximum class purity is 99.5%, the deviation square error of the offset vector of the positive sample image block in the node is 30, the scale space of each test image is set on 20 scales from 0.1 to 0.8, and then using the formula in the step D

Obtaining a Hough voting graph under the scale;

Position mark size of

The position of the detection frame is the position of the vehicle in the target image; the partial detection results are shown in FIG. 2.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A multi-view vehicle detection method based on weighted Hough voting is characterized by comprising the following steps:

step A: defining a set of training sample images

and B: for training sample image set

Set of positive samples in (1)

the step C adopts the following method:

in LLE embedding space, the cluster center of the kth view subclass sample set is defined as o_kK ∈ {1, 2.., K }, positive sample set D⁺Mean sample image_jExpression in the LLE embedding space is f ″_jThen positive sample_jContribution weight w to view subclass k_jkIs defined as:

the step D comprises the following steps:

wherein E is_LFor each voting unit e to candidate bitThe vote to set h is estimated using a Gaussian Parzen window, | E_LI denotes set size, q_tFor image block p_tThe center position of (a); after the visual word L is generated, its corresponding classification probability C_LDetermined, image block p_tThe voting score at the candidate position h depends mainly on the set of offset vectors E_LThe voting unit e in (1) can convert V (h | p) into V (h | p) by utilizing the linear accumulation characteristic of Hough voting scores_t) Is defined by the accumulation of E_LThe voting score of each voting unit to the candidate position h is rewritten into an accumulation sum E_LPositive sample image associated with middle voting unit_jThe form of voting score on candidate position h, namely:

wherein the content of the first and second substances,

wherein, is the standard deviation of a Gaussian Parzen window,

Introduction of perspective variable k ∈ into defined voting model{1, 2.. K }, the vote is restricted to the same view angle to guarantee view angle consistency of the vote for candidate position h, namely:

Is redefined as:

wherein W is W_jkThe capital of which is N⁺× K;

step E: a vehicle detection frame is determined in the test image.

2. The weighted Hough voting-based multi-view vehicle detection method according to claim 1, wherein the step B comprises the following steps of:

step B3: feeding samples on a circle by using a k-means algorithmLine clustering, positive sample set D⁺Into K view sub-classes.

3. The weighted Hough voting-based multi-view vehicle detection method according to claim 1, wherein the step E comprises the following steps:

M is the number of discrete scales;

Obtaining a Hough voting graph under the scale;

step E3: at (h, λ)_m) Forming three-dimensional Hough space h ═ (h)_x,h_y) Determining the final target center position (h ', lambda ') by utilizing mean-shift algorithm and judgment threshold value '_m) And in the original test pattern

Position mark size of

The detection frame of (1).