CN107301376A

CN107301376A - A kind of pedestrian detection method stimulated based on deep learning multilayer

Info

Publication number: CN107301376A
Application number: CN201710385952.3A
Authority: CN
Inventors: 李玺; 李健
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2017-05-26
Filing date: 2017-05-26
Publication date: 2017-10-27
Anticipated expiration: 2037-05-26
Also published as: CN107301376B

Abstract

The invention discloses a kind of pedestrian detection method stimulated based on deep learning multilayer, for after given video monitoring and the target that need to be detected, marking the position that target occurs in video.Specifically include following steps：Pedestrian's data set for training objective detection model is obtained, and defines algorithm target；Position deviation and apparent semanteme to pedestrian target are modeled；Modeling result in step S2, which sets up pedestrian's multilayer, stimulates network model；The pedestrian position in monitoring image is detected using the detection model.Pedestrian detection of the present invention suitable for real video monitoring image, has preferably effect and robustness in face of all kinds of complex situations.

Description

A kind of pedestrian detection method stimulated based on deep learning multilayer

Technical field

The invention belongs to computer vision field, a kind of particularly pedestrian detection stimulated based on deep learning multilayer Method.

Background technology

Since 20 end of the centurys, with the development of computer vision, intelligent video treatment technology is widely paid close attention to and ground Study carefully.Pedestrian detection is the important and challenging task of one of which, and its target is accurately to detect in video monitoring image The position of pedestrian.The problem has very high application value in the field such as video monitoring and intelligent robot, is largely senior regard The basis of feel task.But same, the problem has larger challenge, and one is how to express target area information；How two be The extraction of candidate region and target classification unified Modeling are optimized, these challenges are proposed to the performance and robustness of respective algorithms Higher requirement.

General pedestrian detection algorithm is divided into three parts：1st, the candidate region that target is included in input picture is found out.2nd, base In candidate region manual extraction target signature.3rd, Detection task is realized using sorting algorithm to feature.This kind of method is primarily present Following problem：1) it is based on traditional visual signature, and these visual signatures can only express the visual information of lower level, but row People's Detection task needs model to possess the semantic understanding ability of higher level of abstraction；2) extraction of candidate region and the classification of feature do not have End-to-end study optimization；3) combination is not stimulated by multilayer based on the feature that deep learning is extracted, target signature is not abstract enough It is abundant.

The content of the invention

To solve the above problems, it is an object of the invention to provide a kind of pedestrian detection stimulated based on deep learning multilayer Method, for detecting the pedestrian position in given monitoring image.This method is based on deep neural network, the depth stimulated using multilayer Spend visual signature and characterize target area information, pedestrian detection is modeled using Faster R-CNN frameworks, can better adapt to true Complex situations in real video monitoring scene.

To achieve the above object, the technical scheme is that：

A kind of pedestrian detection method stimulated based on deep learning multilayer, is comprised the following steps：

S1, pedestrian's data set for training objective detection model is obtained, and define algorithm target；

S2, the position deviation to pedestrian target and apparent semanteme are modeled；

S3, the modeling result in step S2, which set up pedestrian's multilayer, stimulates network model；

S4, use the detection model detection monitoring image in pedestrian position.

Further, in step S1, described pedestrian's data set for training objective detection model, including pedestrian image X_train, the pedestrian position B manually marked；

Defining algorithm target is：Detect the pedestrian position P in a width monitoring image X.

Further, in step S2, position deviation and apparent semanteme to pedestrian target are modeled and specifically included：

S21, according to pedestrian's data set X_trainPosition deviation is modeled with pedestrian position P：

Wherein, x, y are the middle point coordinates of pedestrian's box label, and w, h is the width and length of pedestrian's box label, x_a,y_aIt is pedestrian The coordinate of candidate frame, w_a,h_aIt is the width and length of pedestrian candidate frame；t_xFor pedestrian's frame x coordinate relative to callout box x coordinate Deviation correspondence mark width of frame ratio, t_yFor deviation correspondence callout box of the y-coordinate relative to callout box y-coordinate of pedestrian's frame The ratio of length, t_wFor ratio of the width relative to mark width of frame of pedestrian's frame, t_hFor pedestrian's frame length relative to callout box The ratio of length；

S22, according to pedestrian's data set X_trainApparent semanteme is modeled with pedestrian position P：

S=<w,d>

Wherein s represents projection values of the feature d on projection vector w, and w is pedestrian's weight projection vector, and d is that pedestrian's feature is retouched State son,<.,.>It is that inner product operation is accorded with, and p (C=k | d) it is softmax functions, represent to belong to the probable value of kth class；s_jIt is characterized d Projection value on j-th of projection vector w；C is the discrete random variable that value number is k；J is whole projection vector w jth Individual w index.

Further, in step S3, the modeling result in step S2, which sets up pedestrian's multilayer, stimulates network model specific Including：

S31, set up multilayer and stimulate convolutional neural networks, the input of neutral net is that a width monitoring image X and pedestrian mark Frame B, is output as the probable value p of correspondence pedestrian candidate frame, and the pedestrian position deviation O in X；The representation of neutral net is Map X → (p, O)；

S32, sub- mapping X → p use soft maximum Softmax loss functions, are expressed as

L_cls(X,Y；θ)=- ∑_jY_jLogp (C | d) formula (3)

Wherein Y is binary set, if belonging to kth class, respective value is 1, and remaining is 0；L_cls(X,Y；θ) represent whole instruction Practice the softmax loss functions of data set；

S33, sub- mapping X → O use Euclid's loss function, are expressed as

L_loc(t, v)=∑_ismooth(t_i,v_i)

Wherein t_iIt is pedestrian position deviation label, v_iIt is pedestrian position deflection forecast value；I represents i-th of training sample；

S34, the loss function of whole multilayer stimulation neutral net are

L=L_cls+L_locFormula (5)

Whole neutral net is trained under loss function L using stochastic gradient descent and back-propagation algorithm.

Further, in step S4, the pedestrian position in detection monitoring image includes：Monitoring image X to be detected is defeated Enter the neutral net trained, the candidate frame probable value according to its output determines whether pedestrian, finally according to the position of prediction Deviation O corrections obtain pedestrian position P.

The present invention is applied to the pedestrian detection method of video monitoring scene, compared to existing pedestrian detection method, has Following beneficial effect：

First, pedestrian detection method of the invention sets up model based on depth convolutional neural networks.The present invention is by candidate regions The generation in domain and the classification of feature, which are unified in same network frame, learns optimization, improves the final effect of method.

Secondly, multilayer proposed by the present invention stimulate algorithm can more feature-rich abstracting power, while the Algorithm Learning The feature gone out is so that grader learns the classifying rules of more robust.

The present invention is applied to the pedestrian detection method of video monitoring scene, has well in intelligent video analysis system Application value, can effectively improve efficiency and the degree of accuracy of pedestrian detection.For example, in traffic video monitoring, row of the invention People's detection method can quickly and correctly detect all pedestrian positions, and pedestrian's search mission for after provides data, greatly Release human resources.

Brief description of the drawings

Fig. 1 is the schematic flow sheet of the pedestrian detection method applied to video monitoring scene of the present invention；

Fig. 2 stimulates the loss function schematic diagram of neutral net for the whole multilayer of the present invention.

Embodiment

In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.

On the contrary, the present invention covers any replacement done in the spirit and scope of the present invention being defined by the claims, repaiied Change, equivalent method and scheme.Further, in order that the public has a better understanding to the present invention, below to the thin of the present invention It is detailed to describe some specific detail sections in section description.Part without these details for a person skilled in the art Description can also understand the present invention completely.

With reference to Fig. 1, in the preferred embodiment, a kind of pedestrian detection side stimulated based on deep learning multilayer Method, comprises the following steps：

First, pedestrian's data set for training objective detection model, including pedestrian image X are obtained_train, manually mark Pedestrian position B；

Secondly, position deviation and apparent semanteme to pedestrian target are modeled and specifically included：

The first step, according to pedestrian's data set X_trainPosition deviation is modeled with pedestrian position P：

Second step, according to pedestrian's data set X_trainApparent semanteme is modeled with pedestrian position P：

S=<w,d>

Afterwards, according to the detection model of appeal modeling result pre-training billboard target.Specifically include：

The first step, setting up multilayer stimulates convolutional neural networks, and the input of neutral net is that a width monitoring image X and pedestrian mark Frame B is noted, the probable value p of correspondence pedestrian candidate frame, and the pedestrian position deviation O in X is output as；So as to the knot of neutral net Structure can be expressed as mapping X → (p, O)；

Second step, sub- mapping X → p is expressed as using soft maximum (Softmax) loss function

L_cls(X,Y；θ)=- ∑_jY_jLogp (C | d) formula (3)

3rd step, sub- mapping X → O uses Euclid's loss function, is expressed as

L_loc(t, v)=∑_ismooth(t_i,v_i)

Wherein t_iIt is pedestrian position deviation label, v_iIt is pedestrian position deflection forecast value, i represents i-th of training sample.

4th step, with reference to Fig. 2, the loss function of whole multilayer stimulation neutral net is

L=L_cls+L_locFormula (5)

Finally, the pedestrian in monitoring image is detected using the detection model trained.Specifically include：Will pretreatment Good image, which is put into multilayer, stimulates calculating in detection framework.Multilayer stimulates detection framework to extract candidate frame with 3 RPN networks, The characteristic information that each RPN networks are utilized is different, and candidate frame size and yardstick are also different obtained from.First obtain each The candidate frame of RPN network extractions, 300 candidate regions are filtrated to get according to respective confidence level size.Then by 3 RPN networks In candidate region merge, obtain 900 candidate regions.Arrange, be filtrated to get most from big to small then according to classification confidence 300 whole object candidate areas.Whether the candidate frame class probability value according to its output, which is more than given threshold value, is filtered candidate frame, Overlapping detection block is eliminated using non-maxima suppression algorithm simultaneously, is corrected finally according to the position deviation O of prediction To pedestrian position P.

In above-described embodiment, the position deviation and apparent semanteme of pedestrian detection method of the invention first to pedestrian target are entered Row modeling.On this basis, former problem is converted into multi-task learning problem, and pedestrian detection is set up based on deep neural network Model.Finally, the pedestrian position in monitoring image is detected using the detection model trained.

By above technical scheme, the embodiment of the present invention has been developed a kind of many based on deep learning based on depth learning technology The pedestrian detection algorithm that layer is stimulated.The present invention can effectively model the position deviation and apparent semantic information of target simultaneously, so that Detect accurate pedestrian position.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention Any modifications, equivalent substitutions and improvements made within refreshing and principle etc., should be included in the scope of the protection.

Claims

1. a kind of pedestrian detection method stimulated based on deep learning multilayer, it is characterised in that comprise the following steps：

S4, use the detection model detection monitoring image in pedestrian position.

2. the pedestrian detection method as claimed in claim 1 stimulated based on deep learning multilayer, it is characterised in that step S1 In, described pedestrian's data set for training objective detection model, including pedestrian image X_train, the pedestrian position manually marked B；

3. the pedestrian detection method as claimed in claim 2 stimulated based on deep learning multilayer, it is characterised in that step S2 In, position deviation and apparent semanteme to pedestrian target are modeled and specifically included：

Wherein, x, y are the middle point coordinates of pedestrian's box label, and w, h is the width and length of pedestrian's box label, x_a,y_aIt is pedestrian candidate The coordinate of frame, w_a,h_aIt is the width and length of pedestrian candidate frame；t_xIt is inclined relative to callout box x coordinate for the x coordinate of pedestrian's frame The ratio of difference correspondence mark width of frame, t_yFor deviation correspondence callout box length of the y-coordinate relative to callout box y-coordinate of pedestrian's frame Ratio, t_wFor ratio of the width relative to mark width of frame of pedestrian's frame, t_hFor pedestrian's frame length relative to callout box length Ratio；

S=<w,d>

Wherein s represents projection values of the feature d on projection vector w, and w is pedestrian's weight projection vector, and d is the description of pedestrian's feature Son,<.,.>It is that inner product operation is accorded with, and p (C=k | d) it is softmax functions, represent to belong to the probable value of kth class；s_jD is characterized to exist Projection value on j-th of projection vector w；C is the discrete random variable that value number is k；J is j-th of whole projection vector w W index.

4. the pedestrian detection method as claimed in claim 3 stimulated based on deep learning multilayer, it is characterised in that step S3 In, the modeling result in step S2, which sets up pedestrian's multilayer, stimulates network model to specifically include：

S31, set up multilayer and stimulate convolutional neural networks, the input of neutral net is a width monitoring image X and pedestrian callout box B, It is output as the probable value p of correspondence pedestrian candidate frame, and the pedestrian position deviation O in X；The representation of neutral net is mapping X→(p,O)；

<mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>C</mi> <mo>=</mo> <mi>k</mi> <mo>|</mo> <mi>d</mi> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mi>e</mi> <msub> <mi>s</mi> <mi>k</mi> </msub> </msup> <mo>/</mo> <msub> <mi>&Sigma;</mi> <mi>j</mi> </msub> <msup> <mi>e</mi> <msub> <mi>s</mi> <mi>j</mi> </msub> </msup> </mrow>

L_cls(X,Y；θ)=- ∑_jY_jLogp (C | d) formula (3)

Wherein Y is binary set, if belonging to kth class, respective value is 1, and remaining is 0；L_cls(X,Y；θ) represent whole training number According to the softmax loss functions of collection；

S33, sub- mapping X → O use Euclid's loss function, are expressed as

L_loc(t, v)=∑_ismooth(t_i,v_i)

S34, the loss function of whole multilayer stimulation neutral net are

L=L_cls+L_locFormula (5)

5. the pedestrian detection method as claimed in claim 4 stimulated based on deep learning multilayer, it is characterised in that step S4 In, the pedestrian position in detection monitoring image includes：The neutral net that monitoring image X inputs to be detected are trained, foundation Its candidate frame probable value exported determines whether pedestrian, and the position deviation O corrections finally according to prediction obtain pedestrian position P.