CN108898047A

CN108898047A - The pedestrian detection method and system of perception are blocked based on piecemeal

Info

Publication number: CN108898047A
Application number: CN201810393658.1A
Authority: CN
Inventors: 雷震; 张士峰; 庄楚斌
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2018-04-27
Filing date: 2018-04-27
Publication date: 2018-11-27
Anticipated expiration: 2038-04-27
Also published as: CN108898047B

Abstract

The invention belongs to mode identification technologies, and in particular to a kind of pedestrian detection method and system that perception is blocked based on piecemeal, it is intended to solve pedestrian be blocked and caused by the lower technical problem of pedestrian detection accuracy rate.For this purpose, the pedestrian detection method in the present invention includes：Based on the pedestrian detection model constructed in advance, and according to pedestrian image to be measured, the corresponding characteristics of image in each preset human testing region is obtained；Fusion Features are carried out to acquired characteristics of image, obtain the global feature of corresponding pedestrian；According to global feature, multiple testing result frames of pedestrian image to be measured are obtained；Choose the testing result frame for meeting preset screening conditions in acquired multiple testing result frames.Based on above-mentioned steps, the pedestrian being blocked in testing image can be effectively detected.Meanwhile the pedestrian detecting system in the present invention is able to carry out and realizes the above method.

Description

The pedestrian detection method and system of perception are blocked based on piecemeal

Technical field

The invention belongs to mode identification technologies, and in particular to a kind of pedestrian detection method that perception is blocked based on piecemeal And system.

Background technique

Pedestrian detection technology is a kind of technology of position and size for searching for pedestrian automatically in any input picture, wide The fields, such as automatic Pilot, video monitoring and living things feature recognition such as the general computer vision being applied to and pattern-recognition etc..

Under complex environment in real life, the occlusion issue of pedestrian be the ultimate challenge that faces of current pedestrian detection it One, especially under crowd scene, how to carry out the hot spot and difficult point that efficiently and accurately pedestrian detection is even more studied.For this Problem, current most of pedestrian detection methods all use the model based on piecemeal, by learning a series of piecemeal detectors, and it is comprehensive The result of each detector is closed for finally being positioned to pedestrian.But these methods are the detection window for requiring each prediction It is as closely as possible to pedestrian's callout box, without in view of the inner link between them.Therefore, the property of these pedestrian detectors Can setting for non-maxima suppression method (Non Maximum Suppression) threshold value it is very sensitive, especially for depositing In extensive crowded scene, influence of the non-maxima suppression method threshold value to detector performance is bigger.

Summary of the invention

In order to solve the above problem in the prior art, in order to solve pedestrian be blocked and caused by pedestrian detection it is accurate The lower technical problem of rate, an aspect of of the present present invention provide a kind of pedestrian detection method that perception is blocked based on piecemeal, packet It includes：

Based on the pedestrian detection model constructed in advance, and according to pedestrian image to be measured, each preset human testing is obtained The corresponding characteristics of image in region；

Fusion Features are carried out to acquired characteristics of image, obtain the global feature of corresponding pedestrian；

According to the global feature, multiple testing result frames of the pedestrian image to be measured are obtained；

Choose the testing result frame for meeting preset screening conditions in acquired multiple testing result frames；

Wherein, the pedestrian detection model is the model based on the building of Faster R-CNN neural network, and described Anchor point frame is associated in the high convolutional layer of Faster R-CNN neural network.

Further, it " based on the pedestrian detection model constructed in advance, and according to pedestrian image to be measured, is obtaining each pre- If the corresponding characteristics of image in human testing region " before, the method also includes：

The processing of data augmentation is carried out to the preset training image, obtains training sample；

Anchor point frame is matched with pedestrian's callout box in the training sample, and is drawn anchor point frame according to matching result It is divided into positive sample and negative sample；The positive sample is to be and pedestrian's mark with the matched anchor point frame of pedestrian's callout box, the negative sample Infuse the not matched anchor point frame of frame；

The negative sample of preset first quantity is chosen using difficult negative sample method for digging；

Loss function value is calculated according to the positive sample and selected negative sample, and according to the update of loss function value Faster R-CNN neural network；Network training is re-started to updated Faster R-CNN neural network, until it is full The preset condition of convergence of foot.

Further, the Faster R-CNN neural network includes RPN module；" based on the pedestrian's inspection constructed in advance Model is surveyed, and according to pedestrian image to be measured, obtains the corresponding characteristics of image in each preset human testing region " before, it is described Method further includes：

Based on preset training image, and according to the following formula shown in loss function, to the RPN module carry out network instruction Practice：

Wherein,For pedestrian's Classification Loss function,For polymerization losses function, i is indicated The label of anchor point frame, p_iAnd t_iRespectively indicate the prediction probability prediction coordinate corresponding with the pedestrian that i-th of anchor point frame is pedestrian；WithIt respectively indicates and the associated object category label of i-th of anchor point frame and corresponding calibration coordinate, α₁The first to surpass ginseng Number；

Pedestrian's Classification Loss function is：

Wherein, N_clsFor the anchor point frame sum during RPN module classification；

The polymerization losses function：

Wherein,To return loss function,For compactedness loss function, β is Second hyper parameter；

The recurrence loss function is：

Wherein, N_regFor return stage anchor point frame sum,It is the detection window t about prediction_iL₁Lose letter Several penalty values；

The compactedness loss function is：

Wherein, N_comTo there is the pedestrian intersected sum with anchor point frame, | Φ_i| it is anchor associated with i-th of calibration pedestrian Point frame sum, j are anchor point collimation mark number, t_jFor the corresponding coordinate of j-th of anchor point frame pedestrian of prediction, p is pedestrian's window with calibration The associated anchor point collimation mark of mouth remembers sequence number, Φ_pFor anchor point collimation mark associated with pedestrian's window of calibration note.

Further, the Faster R-CNN neural network further includes Fast R-CNN module；" based on building in advance Pedestrian detection model obtain the corresponding characteristics of image in each preset human testing region and according to pedestrian image to be measured " it Before, the method also includes：

Based on preset training image, according to the following formula shown in loss function, net is carried out to the Fast R-CNN module Network training：

Wherein,For pedestrian's Classification Loss function,For polymerization losses function,To block treatment loss function, i indicates the label of anchor point frame, p_iAnd t_iI-th of anchor point frame is respectively indicated as row The prediction probability of people prediction coordinate corresponding with the pedestrian；WithRespectively indicate object type associated with i-th of anchor point frame Other label and corresponding calibration coordinate, α₃For third hyper parameter, λ is the 4th hyper parameter；

Pedestrian's Classification Loss function is：

The polymerization losses function：

The recurrence loss function is：

Wherein, N_regFor return stage anchor point frame sum,It is the detection window t about prediction_iL₁Loss The penalty values of function；

The compactedness loss function is：

Further, the step of " matching to anchor point frame with pedestrian's callout box in the training sample " is specifically wrapped It includes：

The friendship for calculating each anchor point frame and each pedestrian's callout box removes simultaneously overlap ratio；

Choose with the friendship of each pedestrian's callout box except and the maximum anchor point frame of overlap ratio, and by selected each anchor point frame with Each corresponding face callout box is matched；

After the selected anchor point frame of judgement removal, remaining each anchor point frame removes and be overlapped with the friendship of each pedestrian's callout box Than whether being greater than preset first threshold：If more than then being matched；

The face callout box that anchor point frame number of matches is less than preset second quantity is obtained, and is chosen and each pedestrian The friendship of callout box removes all anchor point frames that simultaneously overlap ratio is greater than preset second threshold；The preset first threshold is greater than default Second threshold；

The simultaneously descending sequence of overlap ratio is removed according to the friendship of selected all anchor point frames, chooses preset third quantity Anchor point frame matched with corresponding pedestrian's callout box；The value of the preset third quantity is that anchor point frame number of matches is big In or equal to preset second quantity pedestrian's callout box anchor point frame Mean match quantity.

Another aspect of the present invention additionally provides a kind of pedestrian detecting system that perception is blocked based on piecemeal, including：

Characteristics of image obtains module, is configured to the pedestrian detection model constructed in advance, and scheme according to pedestrian to be measured Picture obtains the corresponding characteristics of image in each preset human testing region；

Multi-features module is configured to obtain described image feature the progress spy of characteristics of image acquired in module Sign fusion obtains the global feature of corresponding pedestrian；

Testing result frame obtains module, is configured to the global feature obtained according to described image Fusion Features module, obtains Take multiple testing result frames of the pedestrian image to be measured；

Testing result frame screening module, be configured to choose in acquired multiple testing result frames meet it is preset The testing result frame of screening conditions；

Further, the pedestrian detecting system further includes model training module, which includes：

Training image processing unit is configured to carry out the processing of data augmentation to the preset training image, be instructed Practice sample；

Positive negative sample division unit is configured to pedestrian's callout box progress in anchor point frame and the training sample Match, and anchor point frame is divided by positive sample and negative sample according to matching result；The positive sample is matched with pedestrian's callout box Anchor point frame, the negative sample are and the not matched anchor point frame of pedestrian's callout box；

Negative sample screening unit is configured to choose the negative sample of preset first quantity using difficult negative sample method for digging This；

Network updating unit is configured to calculate loss function value according to the positive sample and selected negative sample, and The Faster R-CNN neural network is updated according to loss function value；Again to updated Faster R-CNN neural network Network training is carried out, until it meets the preset condition of convergence.

Further, the Faster R-CNN neural network includes RPN module；In the case, the model training Module is further configured to perform the following operations：

Wherein,For pedestrian's Classification Loss function,For polymerization losses function, i table Show the label of anchor point frame, p_iAnd t_iThe prediction probability prediction corresponding with the pedestrian that i-th of anchor point frame is pedestrian is respectively indicated to sit Mark；WithIt respectively indicates and the associated object category label of i-th of anchor point frame and corresponding calibration coordinate, α₁The first to surpass Parameter；

Pedestrian's Classification Loss function is：

The polymerization losses function：

The recurrence loss function is：

The compactedness loss function is：

Further, the Faster R-CNN neural network includes Fast R-CNN module；In the case, the mould Type training module is further configured to perform the following operations：

Based on preset training image, and according to the following formula shown in loss function, to the Fast R-CNN module carry out Network training：

Pedestrian's Classification Loss function is：

The polymerization losses function：

The recurrence loss function is：

The compactedness loss function is：

Further, the positive negative sample division unit includes：

Hand over except and overlap ratio computation subunit, the friendship for being configured to calculate each anchor point frame and each pedestrian's callout box, which removes, lays equal stress on Folded ratio；

First coupling subelement is configured to choose the friendship with each pedestrian's callout box and removes the simultaneously maximum anchor point of overlap ratio Frame, and selected each anchor point frame is matched with each corresponding face callout box；

Second coupling subelement is configured to after the selected anchor point frame of judgement removal, remaining each anchor point frame and every The friendship of a pedestrian's callout box removes and whether overlap ratio is greater than preset first threshold：If more than then being matched；

Third coupling subelement is configured to obtain the face mark that anchor point frame number of matches is less than preset second quantity Frame, and choose the friendship with each pedestrian's callout box and remove all anchor point frames that simultaneously overlap ratio is greater than preset second threshold；Institute Preset first threshold is stated greater than preset second threshold；

4th coupling subelement is configured to remove according to the friendship of selected all anchor point frames and overlap ratio is descending Sequentially, the anchor point frame for choosing preset third quantity is matched with corresponding pedestrian's callout box；The preset third quantity Value be anchor point frame number of matches be greater than or equal to preset second quantity face callout box anchor point frame Mean match number Amount.

Compared with the immediate prior art, above-mentioned technical proposal is at least had the advantages that：

1, a kind of pedestrian detection method that perception is blocked based on piecemeal provided by the invention, according to Faster R-CNN nerve The pedestrian detection model of network struction is obtained the characteristics of image of pedestrian by preset human testing area dividing, then again to obtaining The characteristics of image taken is merged, and can effectively detect the pedestrian being blocked in testing image.

2, high convolutional layer is associated with anchor point frame in pedestrian detection model provided by the invention, and Gao Juan base can extract deeper The semantic information of level improves the precision of pedestrian detection.

3, above-mentioned base may be implemented in a kind of pedestrian detecting system that perception is blocked based on piecemeal provided by the invention, the system The pedestrian detection method of perception is blocked in piecemeal.

Detailed description of the invention

Fig. 1 is a kind of key step signal for the pedestrian detection method that perception is blocked based on piecemeal in the embodiment of the present invention Figure；

Fig. 2 is the primary structure schematic diagram that a kind of piecemeal blocks the perception pond ROI unit in the embodiment of the present invention；

Fig. 3 is the primary structure schematic diagram for blocking processing unit that a kind of piecemeal blocks perception in the embodiment of the present invention；

Fig. 4 is a kind of primary structure signal for the pedestrian detecting system that perception is blocked based on piecemeal in the embodiment of the present invention Figure.

Specific embodiment

The preferred embodiment of the present invention described with reference to the accompanying drawings.It will be apparent to a skilled person that this A little embodiments are used only for explaining technical principle of the invention, it is not intended that limit the scope of the invention.

Pedestrian is easily blocked in large-scale crowded environment, is difficult to have pedestrian during such pedestrian detection Effect detection.Based on this, the present invention provides a kind of pedestrian detection method for blocking perception based on piecemeal, this method can be in complexity It carries out efficiently and accurately pedestrian detection, and for there is the situation blocked on a large scale, capable of being still satisfied under environment Testing result.

With reference to the accompanying drawing, a kind of pedestrian detection method blocking perception based on piecemeal provided by the invention is said It is bright.

Fig. 1 illustrates the implementation stream that one of the present embodiment blocks the pedestrian detection method of perception based on piecemeal Journey, as shown in Figure 1, the pedestrian detection method for blocking perception based on piecemeal in the present embodiment may include following step：

Step S101：Based on the pedestrian detection model constructed in advance, and according to pedestrian image to be measured, obtain each preset The corresponding characteristics of image in human testing region.

Step S102：Fusion Features are carried out to acquired characteristics of image, obtain the global feature of corresponding pedestrian.

Step S103：According to global feature, multiple testing result frames of pedestrian image to be measured are obtained.

Step S104：Choose the testing result frame for meeting preset screening conditions in acquired multiple testing result frames；

Specifically, the pedestrian detection model in the present embodiment is the model based on the building of Faster R-CNN neural network, And anchor point frame is associated in the high convolutional layer of Faster R-CNN neural network.The anchor of pedestrian detection model is described in detail below The size and associated layers of point frame, and the basic network frame of design.

During size and associated layers to anchor point frame are designed, semanteme possessed by the characteristic pattern that different convolutional layers extract Information is different with the abundant degree of spatial information, it is contemplated that in the case where blocking on a large scale, the characteristic information of target pedestrian It can become difficult to extract because of the presence blocked, need the support of more semantic informations.And picture in practical applications, is also not present The pedestrian target of the such very small dimensions of Face datection, this requirement to spatial information greatly reduce.Shallow-layer neural network bottom The semantic information that feature includes is shallower, and since receptive field is smaller, insufficient for the identification capability of large scale object；And due to The shallow-layer feature of extraction lacks enough semantic informations, therefore in the case where the interference such as blocking, will since the extraction of feature is more difficult It will lead to resolving device performance to substantially reduce, robustness is inadequate；And deep-neural-network layer can extract deeper time under comparing relatively Semantic information and global information, although can lose a part spatial information complex environment is especially in the presence of and is blocked In the case where, these features of deep layer convolutional layer can effectively overcome the problems, such as because feature extraction is insufficient caused by blocking.

Therefore, the convolutional layer (i.e. high convolutional layer) for top being chosen in the present embodiment to be associated with anchor point frame.For example, choosing Selecting VGG-16 model is basic framework, and the high-rise convolutional layer of selection is conv5_3, then one having a size of 1000 × 600 ground to Pedestrian image is surveyed, character pair figure size is 60 × 40.In order to realize the detection to size pedestrians different in image, for this Each position of characteristic pattern has densely been laid with 11 kinds of various sizes of anchor point frames：Area is respectively (32²,43²,58², 78²,106²,144²,194²,261², 353², 477², 643²), the aspect ratio value of all anchor point frames is that 0.41 (human body substantially compares Example), to realize various sizes of pedestrian detection in image.

For erroneous detection, missing inspection problem because caused by being blocked between pedestrian, the network of pedestrian detection model in the present embodiment In frame, the perception pond RoI unit is blocked based on a piecemeal to substitute original Fast R-CNN module in pedestrian detection model In the pond RoI layer, for by the structured message of human body different location carry out it is comprehensive after, be input to Fast R-CNN module In, and the situation of blocking is estimated through a small-sized neural network.

Refering to attached drawing 2, Fig. 2 illustrates a kind of piecemeal in the present embodiment and blocks the main of the perception pond ROI unit Structure.As shown in Fig. 2, human region is divided into five parts first, to each part using the pond RoI layer come to feature into Row sampling becomes the small characteristic pattern (wide and high be 7) of a fixed size.Then, the spy based on obtained different human body region Sign figure, estimates the visibility of each part using processing unit is blocked.Refering to attached drawing 3, Fig. 3 illustrates this hair A kind of piecemeal blocks the primary structure schematic diagram for blocking processing unit of perception in bright embodiment, as shown in figure 3, it is single to block processing Member is followed by one softmax layers by three convolutional layers and forms, and using log loss function come to block processing unit carry out parameter Training.Specifically, it is assumed that c_i,jIndicate j-th of part of i-th of candidate window, o_i,jIndicate the visibility score of corresponding prediction,For the true visibility score of corresponding calibration.If c_i,jMore than half part be visible, thenOtherwise it is 0.Mathematically, i.e., if c_i,jFriendship and ratio between corresponding calibration window are greater than or equal to 0.5, thenIt otherwise is 0.Formula (1) is shown based on the public affairs blocking processing unit and scoring the visibility of each part Formula,

Wherein, Ω () is areal calculation function, U (c_i,j) it is c_i,jRegion,ForTrue calibration region, θ For setting friendship and than threshold value, be set as 0.5 here, if indicating, more than half part is visible,Otherwise it is 0.Therefore, the present embodiment is defined the loss function for blocking processing unit by formula (2)：

Wherein, i is the label of anchor point frame, t_iFor the corresponding coordinate of i-th of anchor point frame pedestrian of prediction,For i-th of anchor point The calibration coordinate of the associated object of frame.

And then to the characteristic pattern of each human body and corresponding prediction visibility carry out dot product operations obtain it is final Feature, characteristic dimension are 512 × 7 × 7.Finally, being added one by one to the characteristic pattern at five positions of human body by element again, it to be used for Fast The classification of R-CNN module and window return.

Further, in this embodiment pedestrian detection method shown in FIG. 1 can be according to preset training image, to pedestrian Detection model carries out network training, obtains the pedestrian detection model for meeting the preset condition of convergence.

Specifically, network training can be carried out to pedestrian detection model as steps described below in the present embodiment：

Step S201：The processing of data augmentation is carried out to preset training image, obtains training sample.

The processing of data augmentation is carried out to training image in the present embodiment, may include colour dither operation, random cropping behaviour Make, flip horizontal operation and change of scale operate：

Firstly, carrying out colour dither operation to training image, specially：Respectively with 0.6 probability, randomly adjusting training The parameters such as brightness, contrast and the saturation degree of image.

Secondly, carrying out random cropping operation to the training image after colour dither operates, specially：Random cropping 6 Open the subgraph of square.Wherein, 1 subgraph is maximum square subgraph, remaining 4 subgraph in the training image Side length be 0.4~1.0 times of training image short side.1 subgraph in 5 subgraphs is randomly selected as final training Sample.

Again, flip horizontal operation is carried out to the training sample of selection, specially：Water can be carried out at random with 0.6 probability Flat turning operation.

Finally, carrying out change of scale operation to the training sample after flip horizontal operates, specially：By the training sample It is scaled 1000 × 600 image.

In the present embodiment successively to training image carry out colour dither operation, random cropping operation, flip horizontal operation and Change of scale operation can increase data volume, can improve the generalization ability of model in the case where not changing image category.

Step S202：Anchor point frame is matched with pedestrian's callout box in training sample, and according to matching result by anchor Point frame is divided into positive sample and negative sample；Wherein, positive sample is to be and pedestrian with the matched anchor point frame of pedestrian's callout box, negative sample The not matched anchor point frame of callout box.

Specifically, in order to solve under existing matching strategy, part pedestrian cannot be matched to sufficient anchor point frame, and this is asked Topic, the present invention take certain compensation policy to callout box.To pedestrian's callout box progress in anchor point frame and training sample With the step of it is as follows：

Firstly, the friendship for calculating each anchor point frame and each pedestrian's callout box removes simultaneously overlap ratio；

Secondly, choose with the friendship of each pedestrian's callout box except and the maximum anchor point frame of overlap ratio, and by selected each anchor Point frame is matched with each corresponding pedestrian's callout box；

Again, judge after removing selected anchor point frame, remaining each anchor point frame and the friendship of each pedestrian's callout box remove And whether overlap ratio is greater than preset first threshold：If more than then being matched；In the present embodiment, first threshold 0.4 needs Illustrate, be matched to here all pedestrian's callout box of enough anchor point frames matched anchor point frame quantity average value be N_p。

Once again, obtain pedestrian's callout box that anchor point frame number of matches is less than preset second quantity, and choose with it is each The friendship of pedestrian's callout box removes all anchor point frames that simultaneously overlap ratio is greater than preset second threshold；Wherein, preset first threshold is big In preset second threshold；In the present embodiment, which is to look into the scale compensating operation of scarce leak repairing, and second threshold is set as 0.1, For being not matched to pedestrian's callout box of enough anchor point frames, select all friendships with pedestrian's callout box except and overlap ratio it is big In 0.1 anchor point frame.Formula (3), which is shown, to be handed over except simultaneously overlap ratio is greater than 0.1 all anchor point frame sequences：

[a₁,a₂,a₃,...,a_N] (3)

Wherein, a_NPosition and size including anchor point frame.

Finally, according to selected all anchor point frames friendship except and the descending sequence of overlap ratio, choose preset the The anchor point frame of three quantity is matched with corresponding pedestrian's callout box；In this implementation, according to the friendship of they and pedestrian's callout box And than size, descending sort is carried out by formula (4),

[A₁,A₂,A₃,...,A_N] (4)

Finally, N before choosing_pA anchor point frame, the anchor point frame being matched to as pedestrian's callout box.Wherein N_pFor adjustable parameter, Default is set as the Mean match quantity of pedestrian's callout box.

Wherein, the value of preset third quantity is the people that anchor point frame number of matches is greater than or equal to preset second quantity The anchor point frame Mean match quantity of face callout box.

Step S203：The negative sample of preset first quantity is chosen using difficult negative sample method for digging

Specifically, for all negative samples, their classification is calculated and predict brought error amount, and according to error amount Descending sort is carried out, negative sample of the maximum a collection of negative sample of error amount as training dataset is chosen, remaining negative sample is whole It abandons, guarantees that the quantitative proportion of positive sample and negative sample is 1:3.There is the quantity for comparing balance to close between negative sample positive in this way System, is conducive to the steady progress of network training.

Step S204：Loss function value is calculated according to positive sample and selected negative sample, and more according to loss function value New Faster R-CNN neural network；Network training is re-started to updated Faster R-CNN neural network, until its Meet the preset condition of convergence.

Specifically, in order to reduce the erroneous detection problem because caused by blocking mutually between adjacent pedestrian, it is desirable that candidate window is answered This is closer to the pedestrian position associated therewith demarcated in data set.Traditional Faster R-CNN detection framework is by two Network (RPN) module and Fast R-CNN module are suggested in a part composition, respectively region.The former is used to generate the time of high quality Window is selected, and the latter is then used to carry out object classification and return to these candidate windows to calculate more preferably to position object Position.

For erroneous detection problem caused by due to adjacent pedestrian is blocked, network (RPN) module is suggested to region in the present embodiment Loss function adjusted, and redefine, region is suggested shown in the loss function such as formula (5) of network (RPN) module：

Wherein, i is the label of anchor point frame, p_iAnd t_iIt is corresponding with the pedestrian for prediction probability that i-th of anchor point frame is pedestrian Predict coordinate；With(to be one here with the associated object category label of i-th of anchor point frame and corresponding calibration coordinate Two classification problems, pedestrian's classification are 1,0) background classification is；α₁To introduce the first hyper parameter, for the two loss functions into The adjustment of row weight.For pedestrian's Classification Loss function, andFor polymerization losses function.

Classification Loss is estimated that function definition is as shown in formula (6) with log loss function：

Wherein, N_clsFor the anchor point frame sum in assorting process.

In order to enable RPN module more efficiently to generate correct candidate window, the present invention introduces one in the module New loss function, referred to as polymerization losses function (aggregation loss).The loss function can not only make candidate window More accurately position the labeling position for the pedestrian being associated, moreover it is possible to reduce associated with same pedestrian candidate window it Between distance.Shown in the definition of the polymerization losses function such as formula (7)：

Wherein,It, can more adjunction for constraining candidate window to return loss function It is bordering on the calibration window of target；AndFor compactedness loss function, constraining candidate window can more step up Position the position of target designation object with gathering；β is the second hyper parameter, is adjusted for the weight to the two loss functions.

The present invention uses smooth L₁Loss function definition returns loss functionFor to prediction The accuracy of detection window measures, specific as shown in formula (8)：

Wherein, N_regFor return stage anchor point frame sum,It is the detection window t about prediction_iL₁Lose letter Several penalty values.

Compactedness loss functionFor to the identical associated all candidate windows of mark pedestrian Confidence level evaluated.Specifically, it is assumed thatBe calibration pedestrian's sequence, these pedestrians demarcate window have with Associated anchor point frame, that is, exist at least one anchor point frame with demarcate window intersect；{Φ₁,...,Φ_pIt is row with calibration The associated anchor point frame flag sequence of people's window, i.e., for marked as Φ_kAnchor point frame for, it be with marked asPedestrian It is associated.Here, using smooth L₁The position letter that loss function is demarcated the location information predicted anchor point frame and actually Error between breath measures, compact for describing the compactedness between the detection window of prediction and actual calibration window Property loss function concrete form such as formula (9) shown in：

Wherein, N_co_mTo there is the pedestrian intersected sum with anchor point frame, | Φ_i| it is anchor associated with i-th of calibration pedestrian Point frame sum, t_jFor the corresponding coordinate of j-th of anchor point frame pedestrian of prediction, p is anchor point associated with pedestrian's window of calibration Collimation mark remembers sequence number, Φ_pFor anchor point collimation mark associated with pedestrian's window of calibration note.

At the same time, in order to further increase the accuracy that window returns, strengthen model and examined for blocking the pedestrian of environment Survey ability, the present invention equally introduce polymerization losses item in the loss function of Fast R-CNN module, and loss function is such as public Shown in formula (10)：

Wherein, α₃For third hyper parameter, λ is the 4th hyper parameter, Classification Loss functionWith polymerization losses letter NumberDefinition with RPN network,To block shown in treatment loss function such as formula (2).It is logical It crosses in the RPN module and Fast R-CNN module of pedestrian detector while introducing polymerization losses item, detection window can be strengthened Stationkeeping ability, to promote whole detection performance.

Stochastic gradient descent method, reverse propagated error are utilized later, and iteration updates network parameter, until training is restrained or reached To the maximum frequency of training of setting, final network model parameter is obtained.

In test phase, test image is inputted trained network model and carries out pedestrian detection, output test result frame. Since the quantity of the detection block of output is very more, first by confidence threshold value T=0.05, most detection block is screened out, is connect Preceding N is selected according to confidence level_a=400 detection blocks.Then duplicate detection block, and root are removed using non-maxima suppression method Preceding N is selected according to confidence level_b=200 detection blocks to get arrive final testing result.

The present invention is directed to the pedestrian detection problem blocked under environment on a large scale, blocks perception R-CNN model by introducing To improve the accuracy rate of pedestrian detection.Specifically, the present invention devises a new polymerization losses function to reduce adjacent rows Between people because caused by overlapping erroneous detection problem, and keep candidate window more compact and be precisely positioned to target pedestrian It sets；At the same time, in order to solve the test problems because caused by blocking, the present invention devises a piecemeal and blocks the perception pond RoI Change unit to replace the pond ROI layer used in traditional Fast R-CNN, which passes through comprehensive human body different parts can Degree of opinion predicted value is to reduce the influence blocked to pedestrian detection.In training convolutional neural networks, need to match pedestrian's callout box With anchor point frame, but under existing matching strategy, pedestrian's callout box of some scales cannot be matched to enough anchor point frames, The present invention takes certain compensation to these callout box, solves this problem very well.The final present invention, which realizes, to be based on dividing Block blocks the pedestrian detection method of perception, can carry out efficiently to the pedestrian in image and accurately detect, especially significantly mention The pedestrian detection ability blocked under environment on a large scale is risen.

The present invention also provides a kind of pedestrian detecting systems that perception is blocked based on piecemeal, and with reference to attached drawing 4, Fig. 4 is exemplary A kind of pedestrian detecting system schematic diagram that perception is blocked based on piecemeal in the present embodiment is shown, the system as shown in Figure 4 includes：

Multi-features module is configured to melt characteristics of image progress feature acquired in characteristics of image acquisition module It closes, obtains the global feature of corresponding pedestrian；

Testing result frame obtains module, is configured to the global feature obtained according to multi-features module, obtain to Survey multiple testing result frames of pedestrian image；

Wherein, pedestrian detection model is the model based on the building of Faster R-CNN neural network, and the Faster Anchor point frame is associated in the high convolutional layer of R-CNN neural network.

In the preferred embodiment of the above-mentioned pedestrian detecting system for blocking perception based on piecemeal, pedestrian detecting system is also wrapped Model training module is included, which includes：

Training image processing unit is configured to carry out the processing of data augmentation to preset training image, obtains training sample This；

Positive negative sample division unit, is configured to match anchor point frame with pedestrian's callout box in training sample, and Anchor point frame is divided into positive sample and negative sample according to matching result；Positive sample be with the matched anchor point frame of pedestrian's callout box, bear Sample be and the not matched anchor point frame of pedestrian's callout box；

In the preferred embodiment of the above-mentioned pedestrian detecting system for blocking perception based on piecemeal, Faster R-CNN nerve Network includes RPN module；In the case, model training module is further configured to perform the following operations：

Based on preset training image, and according to loss function shown in formula (11), network is carried out to the RPN module Training：

Shown in pedestrian's Classification Loss function such as formula (12)：

Shown in polymerization losses function such as formula (13)：

It returns shown in loss function such as formula (14)：

Shown in compactedness loss function such as formula (15)：

In the preferred embodiment of the above-mentioned pedestrian detecting system for blocking perception based on piecemeal, Faster R-CNN nerve Network includes Fast R-CNN module；In the case, model training module is further configured to perform the following operations：

Based on preset training image, and according to loss function shown in formula (16), Fast R-CNN module is carried out Network training：

Shown in pedestrian's Classification Loss function such as formula (17)：

Shown in polymerization losses function such as formula (18)：

It returns shown in loss function such as formula (19)：

Shown in compactedness loss function such as formula (20)：

In the preferred embodiment of the above-mentioned pedestrian detecting system for blocking perception based on piecemeal, positive negative sample division unit Including：

Third coupling subelement is configured to obtain the face mark that anchor point frame number of matches is less than preset second quantity Frame, and choose the friendship with each pedestrian's callout box and remove all anchor point frames that simultaneously overlap ratio is greater than preset second threshold；It is described pre- If first threshold be greater than preset second threshold；

4th coupling subelement is configured to remove according to the friendship of selected all anchor point frames and overlap ratio is descending Sequentially, the anchor point frame for choosing preset third quantity is matched with corresponding pedestrian's callout box；Preset third quantity takes Value is the anchor point frame Mean match quantity for the face callout box that anchor point frame number of matches is greater than or equal to preset second quantity.

Those skilled in the art should be able to recognize that, system described in conjunction with the examples disclosed in the embodiments of the present disclosure System and method and step, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate electronics The interchangeability of hardware and software generally describes each exemplary composition and step according to function in the above description Suddenly.These functions are executed actually with electronic hardware or software mode, and the specific application and design depending on technical solution are about Beam condition.Those skilled in the art can use different methods to achieve the described function each specific application, but It is that such implementation should not be considered as beyond the scope of the present invention.

Term " first ", " second " etc. are to be used to distinguish similar objects, rather than be used to describe or indicate specific suitable Sequence or precedence.

Term " includes " or any other like term are intended to cover non-exclusive inclusion, so that including a system Process, method, article or equipment/device of column element not only includes those elements, but also including being not explicitly listed Other elements, or further include the intrinsic element of these process, method, article or equipment/devices.

So far, it has been combined preferred embodiment shown in the drawings and describes technical solution of the present invention, still, this field Technical staff is it is easily understood that protection scope of the present invention is expressly not limited to these specific embodiments.Without departing from this Under the premise of the principle of invention, those skilled in the art can make equivalent change or replacement to the relevant technologies feature, these Technical solution after change or replacement will fall within the scope of protection of the present invention.

Claims

1. a kind of pedestrian detection method for blocking perception based on piecemeal, it is characterised in that including：

Based on the pedestrian detection model constructed in advance, and according to pedestrian image to be measured, each preset human testing region is obtained Corresponding characteristics of image；

Wherein, the pedestrian detection model is the model based on the building of Faster R-CNN neural network, and the Faster Anchor point frame is associated in the high convolutional layer of R-CNN neural network.

2. the pedestrian detection method according to claim 1 for blocking perception based on piecemeal, which is characterized in that " based on pre- The pedestrian detection model first constructed, and according to pedestrian image to be measured, obtain the corresponding image in each preset human testing region Before feature ", the method also includes：

Anchor point frame is matched with pedestrian's callout box in the training sample, and is divided into anchor point frame according to matching result Positive sample and negative sample；The positive sample is to be and pedestrian's callout box with the matched anchor point frame of pedestrian's callout box, the negative sample Not matched anchor point frame；

3. the pedestrian detection method according to claim 2 for blocking perception based on piecemeal, which is characterized in that the Faster R-CNN neural network includes RPN module；It " based on the pedestrian detection model constructed in advance, and according to pedestrian image to be measured, is obtaining Take the corresponding characteristics of image in each preset human testing region " before, the method also includes：

Based on preset training image, and according to the following formula shown in loss function, to the RPN module carry out network training：

Wherein,For pedestrian's Classification Loss function,For polymerization losses function, i indicates anchor point The label of frame, p_iAnd t_iRespectively indicate the prediction probability prediction coordinate corresponding with the pedestrian that i-th of anchor point frame is pedestrian；WithIt respectively indicates and the associated object category label of i-th of anchor point frame and corresponding calibration coordinate, α₁For the first hyper parameter；

Pedestrian's Classification Loss function is：

The polymerization losses function：

The recurrence loss function is：

Wherein, N_regFor return stage anchor point frame sum,It is the detection window t about prediction_iL₁Loss function Penalty values；

The compactedness loss function is：

Wherein, N_comTo there is the pedestrian intersected sum with anchor point frame, | Φ_i| it is anchor point frame associated with i-th of calibration pedestrian Sum, j are anchor point collimation mark number, t_jFor the corresponding coordinate of j-th of anchor point frame pedestrian of prediction, p is pedestrian's window phase with calibration Associated anchor point collimation mark remembers sequence number, Φ_pFor anchor point collimation mark associated with pedestrian's window of calibration note.

4. the pedestrian detection method according to claim 2 for blocking perception based on piecemeal, which is characterized in that the Faster R-CNN neural network further includes Fast R-CNN module；" based on the pedestrian detection model constructed in advance, and according to row to be measured People's image obtains the corresponding characteristics of image in each preset human testing region " before, the method also includes：

Based on preset training image, according to the following formula shown in loss function, to the Fast R-CNN module carry out network instruction Practice：

Pedestrian's Classification Loss function is：

The polymerization losses function：

The recurrence loss function is：

The compactedness loss function is：

5. the pedestrian detection method that the piecemeal according to any one of claim 2-4 blocks perception, which is characterized in that " right Anchor point frame is matched with pedestrian's callout box in the training sample " the step of specifically include：

Choose with the friendship of each pedestrian's callout box except and the maximum anchor point frame of overlap ratio, and by selected each anchor point frame with it is each right The face callout box answered is matched；

After the selected anchor point frame of judgement removal, the friendship of remaining each anchor point frame and each pedestrian's callout box removes and overlap ratio is It is no to be greater than preset first threshold：If more than then being matched；

The face callout box that anchor point frame number of matches is less than preset second quantity is obtained, and chooses and is marked with each pedestrian The friendship of frame removes all anchor point frames that simultaneously overlap ratio is greater than preset second threshold；The preset first threshold is greater than preset the Two threshold values；

The simultaneously descending sequence of overlap ratio is removed according to the friendship of selected all anchor point frames, chooses the anchor of preset third quantity Point frame is matched with corresponding pedestrian's callout box；The value of the preset third quantity be anchor point frame number of matches be greater than or Equal to the anchor point frame Mean match quantity of pedestrian's callout box of preset second quantity.

6. a kind of pedestrian detecting system for blocking perception based on piecemeal, it is characterised in that including：

Characteristics of image obtains module, is configured to the pedestrian detection model constructed in advance, and according to pedestrian image to be measured, obtain Take the corresponding characteristics of image in each preset human testing region；

Multi-features module is configured to melt characteristics of image progress feature acquired in described image feature acquisition module It closes, obtains the global feature of corresponding pedestrian；

Testing result frame obtains module, is configured to the global feature obtained according to described image Fusion Features module, obtains institute State multiple testing result frames of pedestrian image to be measured；

Testing result frame screening module, is configured to choose in acquired multiple testing result frames and meets preset screening The testing result frame of condition；

7. the pedestrian detecting system according to claim 6 for blocking perception based on piecemeal, which is characterized in that pedestrian's inspection Examining system further includes model training module, which includes：

Training image processing unit is configured to carry out the processing of data augmentation to the preset training image, obtains training sample This；

Positive negative sample division unit, is configured to match anchor point frame with pedestrian's callout box in the training sample, and Anchor point frame is divided into positive sample and negative sample according to matching result；The positive sample be and the matched anchor point of pedestrian's callout box Frame, the negative sample are and the not matched anchor point frame of pedestrian's callout box；

Negative sample screening unit is configured to choose the negative sample of preset first quantity using difficult negative sample method for digging；

Network updating unit is configured to calculate loss function value according to the positive sample and selected negative sample, and according to Loss function value updates the Faster R-CNN neural network；Updated Faster R-CNN neural network is re-started Network training, until it meets the preset condition of convergence.

8. the pedestrian detecting system according to claim 7 for blocking perception based on piecemeal, which is characterized in that the Faster R-CNN neural network includes RPN module；In the case, the model training module is further configured to perform the following operations：

Pedestrian's Classification Loss function is：

The polymerization losses function：

The recurrence loss function is：

The compactedness loss function is：

Wherein, N_co_mTo there is the pedestrian intersected sum with anchor point frame, | Φ_i| it is anchor point frame associated with i-th of calibration pedestrian Sum, j are anchor point collimation mark number, t_jFor the corresponding coordinate of j-th of anchor point frame pedestrian of prediction, p is pedestrian's window phase with calibration Associated anchor point collimation mark remembers sequence number, Φ_pFor anchor point collimation mark associated with pedestrian's window of calibration note.

9. the pedestrian detecting system according to claim 7 for blocking perception based on piecemeal, which is characterized in that the Faster R-CNN neural network includes Fast R-CNN module；In the case, the model training module is further configured to execute such as Lower operation：

Wherein,For pedestrian's Classification Loss function,For polymerization losses function,To block treatment loss function, i indicates the label of anchor point frame, p_iAnd t_iI-th of anchor point frame is respectively indicated as row The prediction probability of people prediction coordinate corresponding with the pedestrian；WithRespectively indicate object associated with i-th of anchor point frame Category label and corresponding calibration coordinate, α₃For third hyper parameter, λ is the 4th hyper parameter；

Pedestrian's Classification Loss function is：

The polymerization losses function：

The recurrence loss function is：

The compactedness loss function is：

10. blocking the pedestrian detecting system of perception based on piecemeal according to any one of claim 7-9, feature exists In the positive negative sample division unit includes：

Hand over except and overlap ratio computation subunit, be configured to calculate each anchor point frame and removed with the friendship of each pedestrian's callout box and be overlapped Than；

First coupling subelement is configured to choose the friendship with each pedestrian's callout box and removes the simultaneously maximum anchor point frame of overlap ratio, and Selected each anchor point frame is matched with each corresponding face callout box；

Second coupling subelement is configured to after the selected anchor point frame of judgement removal, remaining each anchor point frame and each row The friendship of people's callout box removes and whether overlap ratio is greater than preset first threshold：If more than then being matched；

Third coupling subelement is configured to obtain the face callout box that anchor point frame number of matches is less than preset second quantity, And it chooses the friendship with each pedestrian's callout box and removes all anchor point frames that simultaneously overlap ratio is greater than preset second threshold；It is described pre- If first threshold be greater than preset second threshold；

4th coupling subelement, be configured to friendship according to selected all anchor point frames except and descending suitable of overlap ratio Sequence, the anchor point frame for choosing preset third quantity are matched with corresponding pedestrian's callout box；The preset third quantity Value is the anchor point frame Mean match quantity for the face callout box that anchor point frame number of matches is greater than or equal to preset second quantity.