CN107463892A

CN107463892A - Pedestrian detection method in a kind of image of combination contextual information and multi-stage characteristics

Info

Publication number: CN107463892A
Application number: CN201710624030.3A
Authority: CN
Inventors: 李革; 孔伟杰; 李楠楠; 臧祥浩; 王文敏; 王荣刚
Original assignee: Peking University Shenzhen Graduate School
Current assignee: Peking University Shenzhen Graduate School
Priority date: 2017-07-27
Filing date: 2017-07-27
Publication date: 2017-12-12

Abstract

The invention discloses a kind of based on deep learning and with reference to image context information and the pedestrian detection method of image multi-stage characteristics, target detection depth model Faster R CNN are applied in pedestrian detection field, with reference to the contextual information input feature vector grader around pedestrian；Then the multi-stage characteristics of depth characteristic extraction model VGG16 in Faster R CNN are combined, the high-rise coarse feature feature group fine with low layer are combined together so that feature includes more abundant information, can preferably detect small size pedestrian；The false drop rate of the present invention is low, has wide applicability, is applicable to the detection of intelligent monitor system or unmanned middle pedestrian, has important application value.

Description

Pedestrian detection method in a kind of image of combination contextual information and multi-stage characteristics

Technical field

The present invention relates to image analysis technology field, more particularly to one kind based on deep learning and to combine image context letter The pedestrian detection method of breath and image multi-stage characteristics.

Background technology

Pedestrian detection technology refers to allow computer combination image procossing and related machine learning algorithm, by image or The analysis of video content, can be to wherein judging, if there is pedestrian, it is also necessary to the row in image with the presence or absence of pedestrian People carries out positioning mark exactly.Because video has by a frame group of picture into for the performance of the pedestrian detection technology of image Naturally the performance of the pedestrian detection technology for video is decide, so the present invention is mainly to the static video after being converted by video Image carries out pedestrian detection.This kind of image is often shot using vehicle-mounted camera and found a view in street, and its background is complicated, and illumination is strong Weak to differ, pedestrian's dressing, posture vary, and pedestrian's situation that is blocked also happens occasionally so that still deposit in pedestrian detection field In many challenges, because being had very important significance in pedestrian's detection field, the analysis for this kind of video image.

At present according to the difference of pedestrian's feature extraction mode, existing pedestrian detection model can be divided into two classes：

The first kind is the pedestrian detection method based on manual feature.Compared to deep learning method in recent years, this side Method is also referred to as conventional method, and this method is directed to a certain region of image, first by pre-designed manual feature extraction algorithm To extract pedestrian's feature, then feature is transported in support vector machines or adaptive enhancing AdaBoost graders Row is constantly trained, classifies and positioned, and reaches the purpose according to feature detection pedestrian.Conventional manual feature has Haar- Like features, HOG features, DPM features and ICF features etc..For full of challenges video image, manual feature is all based on bottom Layer feature, although these methods have good performance under certain assumed condition, for from reality scene In the video image with complex background, these low-level image features can not be effectively by the feature extraction of pedestrian in image and table Sign comes out.

Second class is the pedestrian detection method based on deep learning.As deep learning in recent years is in image, voice, text Outstanding achievement in research is achieved Deng field, emerges many pedestrian detection methods based on deep learning.These methods utilize Depth model learns pedestrian's feature automatically, is constantly trained by substantial amounts of data, it is possible to achieve automatic from a large amount of high dimensional datas Learn to the feature for including thousands of parameters, then obtained feature is classified and positioned, can equally reach pedestrian The purpose of detection.At present, the pedestrian detection method performance based on deep learning remote hyper-base in the pedestrian detection side of manual feature Method, and by designing more preferable depth detection model, effectively improve performance.

Existing target detection model includes Faster R-CNN, and still, Faster R-CNN models have two in itself Shortcoming：First, the feature classifiers in Faster R-CNN are past merely with the feature of pedestrian, the peripheral region of pedestrian in classification Toward comprising more grader being helped to do the useful information adjudicated；Second, Faster R-CNN can not be well in detection image Small size pedestrian, cause Faster R-CNN poor performance, false drop rate on pedestrian's test problems higher.

The content of the invention

In order to overcome the above-mentioned deficiencies of the prior art, the present invention provides the pedestrian detection method in a kind of new image, base In deep learning and image context information and image multi-stage characteristics are combined, realize the pedestrian detection in image.The inventive method The detection of pedestrian in image or video after being caught to camera is can be applied to during intelligent monitor system is either unmanned, with This obtains pedestrian position that may be present in image or video, is easy to system subsequent analysis and operation.

The present invention principle be：The inventive method is based on deep learning and combines image context information and the multistage spy of image Sign, realizes the pedestrian detection in image.This method uses for reference research of the deep learning in object detection field first, by one at present Outstanding target detection model faster convolutional neural networks Faster R-CNN (Ren, Shaoqing, et based on region al."Faster R-CNN:Towards real-time object detection with region proposal Networks. " Advances in neural information processing systems.2015) it is applied to pedestrian's inspection In survey field, reach more good Detection results；Then, Faster is helped with reference to the image context information around pedestrian Feature classifiers " seeing " in R-CNN must be more extensive, and makes more correct judgement；Finally, we combine Faster Depth characteristic extraction model VGG16 (Simonyan, Karen, and Andrew Zisserman. " Very deep in R-CNN convolutional networks for large-scale image recognition."arXiv preprint arXiv:1409.1556 (2014)) multi-stage characteristics, the high-rise coarse feature feature group fine with low layer is combined together, So that feature includes more abundant information, Faster R-CNN are helped to detect small size pedestrian well.

Present invention is primarily based on depth targets detection model Faster R-CNN to carry out pedestrian detection, and combines above and below image Literary information provides pedestrian's ambient condition information for grader, provides what is more enriched with reference to VGG16 multi-stage characteristics for grader Feature.The inventive method is tested in 4024 test pictures of Caltech data sets, and its false drop rate is most of less than current Method.

Technical scheme provided by the invention is：

A kind of pedestrian detection method of combination image context information and image multi-stage characteristics, to imagery exploitation VGG16 moulds Type (have and fix 13 convolutional layers) carries out feature extraction, and internal memory is stored in from characteristic pattern caused by each layer of convolutional layer In, extracted region network (Region Proposal Network, RPN) is performed on last layer of characteristic pattern conv5_3 to obtain Many high quality area-of-interests (Region of Interest, RoI) that may include pedestrian are obtained, that is, frame are preselected, for every One RoI, we extract a contextual information feature in relevant position to conv5_3 first, then extract the RoI and exist The feature and composition multi-stage characteristics of relevant position on tri- characteristic patterns of conv3_3, conv4_3, conv5_3, by contextual information Link together to be transported in grader by channel dimension with multi-stage characteristics and classified and positioned, by constantly training, i.e., It can reach the purpose accurately detected to pedestrian in image.Specifically comprise the following steps：

1) input：One static video image to be detected；

2) feature extraction：The picture of input is carried out using the VGG16 depth convolutional network models with 13 layers of convolutional layer Feature extraction, each convolutional layer can produce a characteristic pattern；Last layer of characteristic pattern is conv5_3；

3) frame extraction is preselected：Using a size be n × n spatial window on last layer of characteristic pattern conv5_3 with The speed that step-length is 1 often slides into a position, while predict and produce k different scale, different length-width ratios along long and wide slip Reference block (being referred to as anchor box)；Each pre-selection frame predicts one point according to the possibility for wherein including target Number, sorts from high to low according to fraction, retains the most possible pre-selection frame RoI for including pedestrian of TopN (such as preceding 2000)；

4) image context information extracts：For each RoI, we are in last layer of characteristic pattern conv5_3, corresponding Position is utilized equivalent to l (l>1) the RoI pondizations operation of times RoI pre-selection frame areas, to extract the RoI above and below the position Literary information characteristics；

5) image multi-stage characteristics extract：For each RoI, we are respectively to conv3_3, conv4_ caused by VGG16 3rd, conv5_3 three-levels characteristic pattern, features at different levels are extracted respectively using the operation of RoI pondizations in relevant position；

6) feature connects：L is carried out respectively to the image contextual characteristics and image multi-stage characteristics extracted₂Normalization operations And zoom operations, then feature is linked together along channel dimension, i.e., contextual feature is combined with multi-stage characteristics, made Feature includes more information；

7) detect：The feature combined is sent in grader classified and callout box return, testing result for should Pre-selection frame is classified as the possibility score value of pedestrian's classification and the pre-selection frame coordinate value after callout box returns, according to score value 0.01 is set a threshold to, pre-selection frame and its corresponding coordinate position output of threshold value are will be greater than, so as to reach pedestrian detection Purpose.

Compared with prior art, the beneficial effects of the invention are as follows：

The invention provides pedestrian detection method in a kind of new image, deep learning is used for reference and has been obtained in object detection field The outstanding achievement in research obtained, and applied it in pedestrian detection field, a more good Detection results can be reached； Secondly, the present invention is in order to solve existing Faster R-CNN models disadvantage itself, respectively in connection with image context letter Breath and image multi-stage characteristics so that feature has more abundant information, helps grader preferably to classify and position.This Inventive method can be applied to during intelligent monitor system is either unmanned pedestrian in image or video after being caught to camera Detection, pedestrian position that may be present in image or video is obtained with this, is easy to system subsequent analysis and operation.

Compared with prior art, the present invention is in current the most widely used pedestrian detection data set Caltech data sets Test data part tested in assessment, by testing and assessing, its false drop rate is less than current most methods, with the best way Also about 6% point is differed only by, illustrates the technological merit of the inventive method.

Brief description of the drawings

Fig. 1 is the general frame figure of pedestrian detection method provided by the invention；

Wherein, 1.-input image to be detected；2.-utilize VGG16 progress feature extractions；3.-carried out using RPN networks Preselect frame RoI extractions；4.-utilize l (l>1) RoI pondizations operation extraction contextual feature again；5.-utilize 1 times of RoI ponds Change operation and be extracted in different characteristic figure epigraph multi-stage characteristics；6.-L is carried out respectively to multi-stage characteristics₂Normalization operations and contracting Put；7.-contextual feature carries out L₂Normalization operations and scaling；8.-by contextual feature and multi-stage characteristics according to passage Dimension connects；9.-dimensionality reduction is carried out to the feature combined；10.-feature after dimensionality reduction is transported in full articulamentum and divided Class；- feature after dimensionality reduction is transported in full articulamentum be labeled frame recurrence.

Fig. 2 is the method flow diagram of pedestrian detection method provided by the invention.

Fig. 3 is that image context information extracts schematic diagram in the present invention；

Wherein,- VGG16 multilayer feature figures；- pedestrian feature RoI ponds；- pedestrian image contextual feature Chi Hua；- RoI pondizations operate.

Fig. 4 is that the partial results after being detected in present invention specific implementation to some images in Caltech test sets are shown Illustration.

Embodiment

Below in conjunction with the accompanying drawings, the present invention, the model of but do not limit the invention in any way are further described by embodiment Enclose.

Pedestrian detection is the subproblem of object detection field, its intelligent monitor system, intelligent transportation system and nobody The fields such as driving play the technical support effect of key.Because video has by a frame group of picture into for pedestrian's inspection of image The performance of survey technology decides the performance of the pedestrian detection technology for video naturally, so the present invention by video mainly to being converted Static video image afterwards proposes a kind of novel pedestrian detection method.This method is first by object detection field one outstanding Depth model Faster R-CNN are applied in pedestrian detection field, reach more good Detection results；Then we combine It is more extensive that image context information around pedestrian helps grader " seeing " to obtain；Then we combine VGG16 multistage spy Sign, the high-rise coarse feature feature group fine with low layer is combined together so that feature includes more abundant information, helps Faster R-CNN can preferably detect small size pedestrian；Finally tested on Caltech data sets, as a result show ours Method false drop rate reaches 14.0%, less than current most of advanced algorithms.

Fig. 1 is the general frame figure of pedestrian detection method provided by the invention, and Fig. 2 is pedestrian detection side provided by the invention The method flow diagram of method, specifically comprises the following steps：

First, image to be detected is inputted, if input is video data, we need first to locate video data in advance Manage as multiple still images, still image is inputted to detection respectively.

Second, feature extraction.We are entered using the good ImageNet image classification depth models of pre-training to input picture Row feature extraction, currently used model have tool to be of five storeys ZF models (Zeiler, Matthew D., the and Rob of convolutional layer Fergus."Visualizing and understanding convolutional networks."European Conference on computer vision.Springer, Cham, 2014.), there is the VGG16 and tool of 13 layers of convolutional layer There are residual error network ResNet101 (He, Kaiming, et al. " the Deep residual learning for of 101 layers of convolutional layer image recognition."Proceedings of the IEEE conference on computer vision and Pattern recognition.2016), due to VGG16 have in training speed and precision aspect it is more good, here I From VGG16 as feature extraction network.The parameter of 13 layers of convolutional layer is as shown in table 1 in VGG16.

Convolutional layer is set with pond layer parameter in the VGG16 of table 1

Layer title	Type	Convolution kernel size	Step-length	Padding	Convolution kernel number
						conv1_1	Convolutional	3	1	1	64
conv1_2	Convolutional	3	1	1	64
						pool1	Pooling	2	2	0	128
conv2_1	Convolutional	3	1	1	128
						conv2_2	Convolutional	3	1	1	128
pool2	Pooling	2	2	0	256
						conv3_1	Convolutional	3	1	1	256
conv3_2	Convolutional	3	1	1	256
						conv3_3	Convolutional	3	1	1	256
pool3	Pooling	2	2	0	512
						conv4_1	Convolutional	3	1	1	512
conv4_2	Convolutional	3	1	1	512
						conv4_3	Convolutional	3	1	1	512
pool4	Pooling	2	2	0	512
						conv5_1	Convolutional	3	1	1	512
conv5_2	Convolutional	3	1	1	512
						conv5_3	Convolutional	3	1	1	512
pool5	Pooling	2	2	0	512

3rd, pre-selection frame extraction.The extraction of pre-selection frame is to be referred to as extracted region network by one in Faster R-CNN RPN module is completed.The module utilizes the space that a size is n × n on the characteristic pattern conv5_3 of last layers of VGG16 Window using step-length as 1 speed along it is long and it is wide slide, our pre-set p kinds yardstick and q kind length-width ratios then often slide into One position, we predict the reference block (being referred to as anchor box) for producing k=p × q fixed qty simultaneously, for Size is w × h characteristic pattern, can produce w × h × k reference block altogether.By experiment, we set as shown in table 2 10 kinds of yardsticks and a kind of length and width, so each position can produce 10 various sizes of reference blocks.We give each reference block A fraction is distributed, the possibility for including target in the pre-selection frame corresponding to the reference block is represented, is arranged from high to low according to fraction Sequence, 2000 most possibly preselect frame RoI comprising pedestrian before reservation.

The setting of the reference block yardstick of table 2 and length-width ratio

Yardstick	Length-width ratio	Yardstick	Length-width ratio
				2.0²	2.44	5.7122²	2.44
2.6²	2.44	7.4259²	2.44
				3.38²	2.44	9.653²	2.44
3.5431²	2.44	12.5497²	2.44
				4.394²	2.44	16.3146²	2.44

4th, image context information extraction.Fig. 3 is that image context information extracts schematic diagram in the present invention, for every One RoI, we are first by its RoI area with middle auxocardia l (l>1) again, then in last layer of characteristic pattern conv5_ 3, the operation of RoI pondizations is carried out to the RoI after expansion in relevant position, to extract contextual information features of the RoI in the position, Here l we be arranged to 1.5.Due to the feature classifiers in Faster R-CNN classification when merely with pedestrian feature, OK The peripheral region of people, which usually contains, more can help grader to do the useful information adjudicated, thus expand after RoI include than The former more information of RoI, so as to help grader " seeing " to obtain more extensively a bit, and make more correct judgement.

Wherein RoI pondizations operation is completed by RoI ponds layer, for being solid by the Feature Conversion in any effective RoI Wide H × the W of fixed length smaller characteristic pattern (for example, 7 × 7), wherein H and W are independently of any specific RoI layer hyper parameter.This In RoI refer to that RPN caused pre-selection frame, each RoI on characteristic pattern define x, y, w, h by 4 dimensional vectors, represent its upper left corner (x, y) and length and width (h, w).For RoI ponds layer by the way that h × w RoI windows are divided into H × W grid, each grid is size ForSubwindow, then to each subwindow carry out maximum extraction, as the output of respective window, equivalent to standard Maximum pond layer.

5th, the extraction of image multi-stage characteristics.For each RoI, we respectively to conv3_3 caused by VGG16, Conv4_3, conv5_3 three-level characteristic pattern, features at different levels are extracted respectively to the RoI using the operation of RoI pondizations in relevant position.By Feature is only extracted on conv5_3 in FasterR-CNN acquiescences, for small size pedestrian, the feature on conv5_3 is very thick It is rough, cause its can not detection image small-medium size pedestrian well, cause Faster R-CNN performances on pedestrian's test problems Not good enough, false drop rate is higher, so by the way that the fine feature of low layer and high-rise coarse feature are combined together, enables to small The feature of size pedestrian also can more be enriched, so as to improve Detection results of the Faster R-CNN to small pedestrian.

6th, feature connection.Due to extract image contextual characteristics, each have different models between multi-stage characteristics Number and yardstick, simply they link together along channel dimension can be so that detection performance declines, because numerical value is big Feature can occupy leading position, so in order to solve this problem, we are first to four groups of features (3 grades of feature+contextual features) L is carried out according to formula 1₂Normalization.

Assuming that being characterized as that d is tieed up, x=(x are designated as₁,x₂,…,x_d), we are using formula 1 come regular this feature：

Wherein, It is characterized x normalized value.

But only regular feature can change the yardstick of each feature and can slow down pace of learning, it is therefore necessary to be Each passage of input introduces a zoom factor γ_i, the normalized value being so scaled isIn training rank Section, we are that each feature x and each zoom factor γ individually learn using back-propagation algorithm and chain rule, so By training each feature to have similar norm and yardstick.

By the way that feature is carried out into L₂After normalization operations and zoom operations, training can be caused more stable and can be with Improving performance., will be then by L finally along channel dimension₂Contextual feature after normalization operations and zoom operations with it is more Level feature links together, and forms final pedestrian's feature, is classified for grader.

7th, detection.Final pedestrian's feature after combination is compressed first with 1 × 1 convolutional layer, by feature Boil down to 512 × 7 × 7 tie up, be then sent in grader classified and callout box return, set threshold value, will be greater than The positive sample of threshold value and its corresponding coordinate position output, so as to reach the purpose of pedestrian detection.

It is the specific embodiment party that the present invention combines image context information and multi-stage characteristics carry out pedestrian detection above Case.The training and test of above-described embodiment are all carried out on single scale image, without selecting scale pyramid strategy, sheet Method is trained on Caltech-10x data sets, is trained 60,000 times with learning rate lr=0.001, then with learning rate lr= 0.0001 training 20,000 times, then has in Caltech data sets and is tested on the test set of 4024 images, and Assessed under the conditions of Reasonable (pedestrian level is more than 50 pixels, be blocked region be no more than 35%), evaluation criteria choosing With FPPI- false drop rates, table 3 is the inventive method and other assessments of six algorithms on Caltech data sets in contrast As a result, all score value all represents average false drop rate here, and score value is lower to represent that algorithm performance is better.As a result show, the present invention The false drop rate of method is less than most methods, and 6% point only higher than the false drop rate of current effect best method, it is sufficient to says The superiority of bright this method performance.

Test evaluation result of the distinct methods of table 3 on Caltech data sets

Algorithm title	VJ[1]	HOG[2]	ACF++[3]	Checkboards[4]	LDCF[5]	RPN+BF[6]	F-DNN[7]	Ours
									False drop rate	94.7%	68.5%	17.7%	17.1%	15.0%	9.6%	8.2%	14.0%

Finally, Fig. 4 is more visual after being detected using the present invention to parts of images in Caltech test sets shows Example.

The existing method for being used to contrast in table 3 is documented in following corresponding document respectively：

[1]Viola,Paul,and Michael J.Jones."Robust real-time face detection." International journal of computer vision 57.2(2004):137-154.

[2]Dalal,Navneet,and Bill Triggs."Histograms of oriented gradients for human detection."Computer Vision and Pattern Recognition,2005.CVPR 2005.IEEE Computer Society Conference on.Vol.1.IEEE,2005.

[3]Ohn-Bar,Eshed,and Mohan M.Trivedi."To boost or not to boostOn the limits of boosted trees for object detection."Pattern Recognition(ICPR),2016 23rd International Conference on.IEEE,2016.

[4]Zhang,Shanshan,Rodrigo Benenson,and Bernt Schiele."Filtered channel features for pedestrian detection."Computer Vision and Pattern Recognition(CVPR),2015IEEE Conference on.IEEE,2015.

[5]Nam,Woonhyun,Piotr Dollár,and Joon Hee Han."Local decorrelation for improved pedestrian detection."Advances in Neural Information Processing Systems.2014.

[6]Zhang,Liliang,et al."Is faster r-cnn doing well for pedestrian detection."European Conference on Computer Vision.Springer International Publishing,2016.

[7]Du,Xianzhi,et al."Fused DNN:A deep neural network fusion approach to fast and robust pedestrian detection."Applications of Computer Vision (WACV),2017IEEE Winter Conference on.IEEE,2017.

It should be noted that the purpose for publicizing and implementing example is that help further understands the present invention, but the skill of this area Art personnel are appreciated that：Do not departing from the present invention and spirit and scope of the appended claims, various substitutions and modifications are all It is possible.Therefore, the present invention should not be limited to embodiment disclosure of that, and the scope of protection of present invention is with claim The scope that book defines is defined.

Claims

1. the pedestrian detection method of a kind of combination image context information and image multi-stage characteristics, to image to be detected, utilizes figure As the progress feature extraction of depth of assortment model, characteristic pattern caused by each layer of convolutional layer in depth model is stored in internal memory In, extracted region network RPN is performed on last layer of characteristic pattern, obtains multiple high quality region of interest that may include pedestrian Domain RoI, that is, preselect frame；For each RoI, contextual information spy is extracted in relevant position to last layer of characteristic pattern first Sign, image multi-stage characteristics extraction is then carried out, extract the RoI feature of relevant position and the multistage spy of composition on multiple characteristic patterns Sign；Contextual information is linked together with multi-stage characteristics by channel dimension, is transported in grader and is carried out classification based training and determine Position detection；By constantly training identification pedestrian, thus reach the purpose accurately detected to pedestrian in image.

2. pedestrian detection method as claimed in claim 1, it is characterized in that, described image depth of assortment model is by document (Simonyan,Karen,and Andrew Zisserman."Very deep convolutional networks for large-scale image recognition."arXiv preprint arXiv:1409.1556 (the 2014)) tool recorded There are the VGG16 depth convolutional network models of 13 layers of convolutional layer；Last layer of characteristic pattern is conv5_3.

3. pedestrian detection method as claimed in claim 2, it is characterized in that, pre-selection frame extraction is specifically：Using a size be n × N spatial window, slided on last layer of characteristic pattern conv5_3 using the speed that step-length is 1 along long and width, often slide into one Position, while predict and produce k different scale, the reference block anchor box of different length-width ratios；For each pre-selection frame, according to The possibility comprising target predicts a fraction in the pre-selection frame；Sorted from high to low according to fraction, reservation is above multiple most to be had The pre-selection frame RoI of pedestrian may be included.

4. pedestrian detection method as claimed in claim 2, it is characterized in that, contextual information is extracted in relevant position to conv5_3 Feature, it is specifically：For each pre-selection frame RoI, in last layer of characteristic pattern conv5_3, using equivalent to l (l>1) again The RoI pondizations operation of the pre-selection frame RoI areas, extraction obtain contextual information features of the RoI in relevant position.

5. pedestrian detection method as claimed in claim 2, it is characterized in that, the extraction of image multi-stage characteristics is specific to each pre-selection Frame RoI, respectively to conv3_3, conv4_3, conv5_3 three-level characteristic pattern caused by VGG16, RoI ponds are utilized in relevant position Changing operation, extraction obtains features at different levels respectively, and forms multi-stage characteristics.

6. pedestrian detection method as claimed in claim 2, it is characterized in that, feature connection is：To the image context spy extracted Sign carries out L respectively with image multi-stage characteristics₂Normalization operations and zoom operations, then feature is connected to one along channel dimension Rise, i.e., combined contextual feature with multi-stage characteristics so that the feature of acquisition includes more information；Specifically include following process：

1) L is carried out according to formula 1 to 3 grades of features and contextual feature₂Normalization：

If being characterized as that d is tieed up, x=(x are designated as₁,x₂,…,x_d), using formula 1 come regular this feature：

Wherein, It is characterized x normalized value；

2) each passage for input introduces a zoom factor γ_i, the normalized value being scaled is

3) along channel dimension, by L₂Contextual feature after normalization operations and zoom operations links together with multi-stage characteristics, Final pedestrian's feature is formed, is classified for grader；

4) it is that each feature x and each zoom factor γ are mono- using back-propagation algorithm and chain rule in the training stage Solely study so that by training each feature to have similar norm and yardstick.

7. pedestrian detection method as claimed in claim 2, it is characterized in that, when detecting pedestrian, the feature combined is sent to classification Classification is carried out in device and callout box returns, testing result is the possibility score value and process that the pre-selection frame is classified as pedestrian's classification Pre-selection frame coordinate value after callout box recurrence, sets point threshold, will be greater than the pre-selection frame of threshold value and corresponding coordinate position Output, thus reach the purpose of pedestrian detection.

8. pedestrian detection method as claimed in claim 7, it is characterized in that, it is right before the feature combined is sent in grader The convolutional layer of characteristic use 1 × 1 is compressed, and is 512 × 7 × 7 dimensions by Feature Compression.

9. pedestrian detection method as claimed in claim 7, it is characterized in that, point threshold is arranged to 0.01.