CN108399362A

CN108399362A - A kind of rapid pedestrian detection method and device

Info

Publication number: CN108399362A
Application number: CN201810069322.XA
Authority: CN
Inventors: 林倞; 尹森堂; 张冬雨; 王青
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2018-01-24
Filing date: 2018-01-24
Publication date: 2018-08-14
Anticipated expiration: 2038-01-24
Also published as: WO2019144575A1; CN108399362B

Abstract

The invention discloses a kind of rapid pedestrian detection method and devices, and described method includes following steps：Step S1 builds the configurable depth model based on convolutional neural networks, learns the network parameter for structure using training sample, obtains the model for test process；Step S2, input test sample, the changing rule for being perceived domain using neural network by trained model is detected the target object within the scope of different scale using different middle layers, predict the block diagram of target object in image, the present invention perceives the changing rule in domain by using neural network, the target object within the scope of particular dimensions is detected using different middle layers, the relationship in perception domain and article size has preferably been adapted to, has effectively increased testing result.

Description

A kind of rapid pedestrian detection method and device

Technical field

The present invention relates to pedestrian detection technology fields, more particularly to a kind of embedded system based on deep learning Rapid pedestrian detection method and device.

Background technology

As a part for target detection in computer vision, pedestrian detection has important meaning in the application of real world Justice, with the decline of the maturation and memory technology cost of image acquisition technology, more and more video cameras are deployed in public field Institute, on the other hand, with automatic Pilot, the implementation of intelligent transportation, vehicle-mounted camera also produces the video resource of magnanimity.Tradition Artificial screening and processing, not only inefficiency, expends a large amount of manpower and materials, and may introduce some human factors, causes Some deviations.In recent years, deep learning obtains unprecedented breakthrough in computer vision field, and not only efficiency far wins manpower, Accuracy is in many fields also above the mankind.Therefore, the project that the method for efficiently using deep learning carries out pedestrian detection receives Concern.

People is one of most important target in video monitoring or automatic Pilot, and the top priority of pedestrian detection is exactly to identify The presence of human body, and corresponding markup information is provided.Since the picture quality captured in real world is irregular, for The detection of wisp, the object blocked is always the difficult point of pedestrian detection, and on the other hand, vehicle-mounted camera also often captures Some fuzzy images, there is also the objects that a large amount of similar pedestrians are not but pedestrian in such image.And specific to embedded System, since the strong large-scale neural network model of recognition capability is generally difficult to efficient operate in the limited insertion of computing resource In formula equipment, and be real-time for the application demand of embedded device, thus take into account Detection accuracy and efficiency be towards The most important thing of the quick pedestrian detection of embedded system.

Invention content

In order to overcome the deficiencies of the above existing technologies, one of present invention is designed to provide a kind of quick pedestrian detection side Method and device perceive the changing rule in domain by using neural network, using different middle layers within the scope of particular dimensions Target object is detected, and has preferably been adapted to the relationship in perception domain and article size, has been effectively increased testing result.

Another object of the present invention is to provide a kind of rapid pedestrian detection method and device, by adjusting and training VGG- 16 network obtains adapting to the squeeze VGG-16 networks that embedded system requires, and effectively reduces the parameter amount of network model And accelerate computational efficiency.

A further object of the present invention is to provide a kind of rapid pedestrian detection method and device, passes through the method pair deconvoluted The characteristic pattern of particular network layer is amplified, and enhances the detection to wisp, compared to the method for conventional pictures amplification, almost Do not increase video memory and calculation amount.

The further object of the present invention is to provide a kind of rapid pedestrian detection method and device, by using target object The region of 1.5 times of sizes increases to as background semantic feature in network, the detection for fuzzy objective and remote wisp, There is splendid performance.

In view of the above and other objects, the present invention proposes a kind of rapid pedestrian detection method, include the following steps：

Step S1 builds the configurable depth model based on convolutional neural networks, learns structure using training sample Network parameter, obtain for test process model；

Step S2, input test sample, the changing rule for being perceived domain using neural network by trained model are used Different middle layers is detected the target object within the scope of different scale, predicts the block diagram of target object in image.

Preferably, step S1 further comprises：

The configurable depth model based on convolutional neural networks of structure；

Input training sample；

Initialize every layer of weight connected and biasing in convolutional neural networks and its parameter, including network layer；

Using propagated forward algorithm and Back Propagation Algorithm, learns the network parameter for structure using training sample, that is, use In the model of test process.

Preferably, the described depth model includes multiple dimensioned target candidate network and target detection network, the target Candidate network proposes the otherness of feature based on convolutional neural networks different layers, is generated respectively to different scale target in middle layer The candidate block diagram of object；The target detection network carries out essence on the basis of the candidate block diagram that the target candidate network exports The classification and detection of refinement.

Preferably, the convolutional neural networks are folded and are formed by convolutional layer, down-sampled layer, up-sampling layer heap.The convolutional layer Refer to that convolution algorithm is carried out on two-dimensional space to the image or characteristic pattern of input, extracts stratification feature；It is described down-sampled Layer is operated using the max-pooling not being overlapped, which is used to extract shape and deviates constant feature, while reducing spy Figure size is levied, computational efficiency is improved；The up-sampling layer refers to deconvoluting on two-dimensional space to the characteristic pattern of input Operation, to increase the pixel of characteristic pattern.

Preferably, the depth model uses Squeeze VGG-16 convolutional neural networks as backbone network, described 12 layer Fire module layer of the Squeeze VGG-16 convolutional neural networks using conv1-1 layers and followed by are characterized extraction Network structure.

Preferably, the target candidate network is on the basis of the Squeeze VGG-16 convolutional neural networks, according to volume Lamination feature, in Fire9, Fire12, conv6 and pooling layers increased, generation network branches, to carry out different scale Detect the recurrence of the candidate frame of object.

Preferably, the target detection network presets object candidate area on the basis of the object candidate area Background semantic information of the picture region of multiple size as target, Fire9 layers of characteristic pattern is once up-sampled, as Enhance the information perceived to wisp, and background semantic information is consolidated with up-sampling information by the pondization of area-of-interest Determine the feature of size, increase by one layer of full articulamentum later, carries out the recurrence of classification and final candidate frame.

Preferably, the training sample includes the markup information of pedestrian area in rgb image data and image, hands-on Image data is the small patch cut according to pedestrian region.

Preferably, the Back Propagation Algorithm need to first find out the target block diagram and image realistic objective of forward-propagating prediction The loss function of block diagramThen its gradient to parameter W is acquired, the algorithm that gradient declines is used to update W to minimize Loss functionIt is assumed that middle layer has M branch that can export object candidate area, l^mIndicate the loss function of branch m, α_mTable Show l^mThe weight of function, S={ S¹, S²..., S^MRefer to the target object of corresponding scale, then loss functionIt may be defined as：

In order to achieve the above objectives, the present invention also provides a kind of quick pedestrian detecting systems, including：

Training unit is learnt for building the configurable depth model based on convolutional neural networks using training sample Go out the network parameter of structure, obtains the model for test process；

Detection unit is used for input test sample, and the variation for being perceived domain using neural network by trained model is advised Rule is detected the target object within the scope of different scale using different middle layers, predicts the frame of target object in image Figure.

Compared with prior art, the method that a kind of rapid pedestrian detection method of the present invention and device use for reference compression network, is adjusted Whole and training VGG-16 network obtains adapting to the squeeze VGG-16 networks that embedded system requires, and effectively reduces network The parameter amount of model simultaneously accelerates computational efficiency；On the other hand, differ with article size for perception domain in traditional detection method The problem of cause, (i.e. neural net layer is deeper, and perception domain is bigger, suitable using the changing rule in neural network perception domain by the present invention Detect larger target object), the target object within the scope of particular dimensions is detected using different middle layers, more preferably Adaptation perception domain and article size relationship, effectively increase testing result；In addition, in order to enhance the inspection to wisp It surveys, the present invention is amplified the characteristic pattern of particular network layer using the method deconvoluted, compared to the side of conventional pictures amplification Method hardly increases video memory and calculation amount；In order to enhance the detection for fuzzy objective, on the characteristic pattern of this layer, mesh is used The region of mark 1.5 times of sizes of object increases to as background semantic feature in network, for fuzzy objective and remote wisp Detection, have splendid performance.

Description of the drawings

Fig. 1 is a kind of step flow chart of rapid pedestrian detection method of the present invention；

Fig. 2 is Squeeze VGG-16 neural network structure schematic diagrames in the specific embodiment of the invention；

Fig. 3 is the schematic diagram of Fire modules in the specific embodiment of the invention；

Fig. 4 is the structural schematic diagram of target candidate network in the specific embodiment of the invention；

Fig. 5 is the structural schematic diagram of target detection network in the specific embodiment of the invention；

Fig. 6 is the process schematic of quick pedestrian detection in the specific embodiment of the invention；

Fig. 7 is a kind of system architecture diagram of quick pedestrian detection device of the present invention；

Fig. 8 is the detail structure chart of training unit in the specific embodiment of the invention；

Fig. 9 is the detail structure chart of detection unit in the specific embodiment of the invention.

Specific implementation mode

Below by way of specific specific example and embodiments of the present invention are described with reference to the drawings, those skilled in the art can Understand the further advantage and effect of the present invention easily by content disclosed in the present specification.The present invention can also pass through other differences Specific example implemented or applied, details in this specification can also be based on different perspectives and applications, without departing substantially from Various modifications and change are carried out under the spirit of the present invention.

Fig. 1 is a kind of step flow chart of rapid pedestrian detection method of the present invention.As shown in Figure 1, the present invention is a kind of quickly Pedestrian detection method includes the following steps：

Step S1 builds the configurable depth model based on convolutional neural networks, learns structure using training sample Network parameter, obtain for test process model.In the specific embodiment of the invention, the depth model is by two sub- networks Composition：First sub-network is multiple dimensioned target candidate network, for extracting character features and providing candidate region, specifically Ground, the target candidate network are proposed the otherness of feature based on convolutional neural networks different layers, are generated respectively to not in middle layer With the candidate block diagram of scale pedestrian；Second sub-network is target detection network, enhances the effect of detection, with target candidate Network share parameter, the classification and detection refined on the basis of candidate block diagram.Specifically, step S1 is further wrapped It includes：

Step S100 builds the configurable depth model based on convolutional neural networks.

The convolutional neural networks are folded and are formed by convolutional layer, down-sampled layer, up-sampling layer heap, and the convolutional layer refers to defeated The image or characteristic pattern entered carries out convolution algorithm on two-dimensional space, extracts stratification feature；The down-sampled layer uses The max-pooling operations not being overlapped, which is used to extract shape and deviates constant feature, while it is big to reduce characteristic pattern It is small, improve computational efficiency；The up-sampling layer refers to the behaviour deconvoluted on two-dimensional space to the characteristic pattern of input Make, to increase the pixel of characteristic pattern, is mainly used for target detection network, detection result is promoted, in the specific embodiment of the invention In, using Squeeze VGG-16 convolutional neural networks as backbone network, as shown in Fig. 2, the Squeeze VGG-16 convolution 12 layer Fire module of the neural network using conv1-1 layers and followed by are as convolutional layer, to extract feature；It is therein Pool1-pool5 is down-sampled layer；Using on ImageNet data sets advance trained model as initialization.That is this hair It is bright to train Squeeze VGG-16 as netinit in advance first with ImageNet data sets.

Fig. 3 is the structural schematic diagram of Fire modules in the specific embodiment of the invention.As shown in figure 3, Fire modules are by two The convolutional layer composition that the convolutional layer and a convolution kernel size that convolution kernel size is 1 × 1 are 3 × 3, it is therefore intended that with 1 × 1 volume Product core replaces 3 × 3 convolution kernel, to make parameter amount reduce 9 times, but in order to not influence the characterization ability of network, is not all of It substitutes, but a part is the convolution kernel with 1 × 1, a part uses 3 × 3 convolution kernel, another benefit done so is to subtract The input channel of few 3 × 3 convolution kernels, while the effect for reducing parameter amount is played, specifically, Fire modules use 1 × 1 before this Convolutional layer carries out dimensionality reduction operation to input layer, referring next to GoogLeNet structures, is extracted using 1 × 1 and 3 × 3 convolutional layer special Sign, finally connects two parts feature, such mode greatly reduces calculation amount and model parameter.

Fig. 4 is the configuration diagram of target candidate network in the specific embodiment of the invention.In the specific embodiment of the invention, The target candidate network is on the basis of Squeeze VGG-16 convolutional neural networks, according to convolutional layer feature, Fire9, Fire12, conv6 and increased pooling layers of 4 layers total, generation network branches, branch's progress different scale detect object The recurrence of the candidate frame of body.But for Fire-9 layers, the low layer of its relatively core network, compared to other layers to the shadow of gradient Ringing can be very big, and learning process is unstable, therefore more buffer (buffering) layer, as shown in det-conv layers in Fig. 4, Buffer layers avoid the gradient of detection branches from arriving trunk layer by direct back-propagated (backpropagation).

The present invention using neural network perception domain changing rule (i.e. neural net layer is deeper, perception domain it is bigger, be suitble to examine Survey larger target object), the target object within the scope of particular dimensions is detected using different middle layers, preferably The relationship for having adapted to perception domain and article size, effectively increases testing result.

Fig. 5 is the configuration diagram of target detection network in the specific embodiment of the invention.The target detection network and mesh Candidate network shared parameter is marked, the candidate frame of target candidate network is summarized, to enhance area of the monitoring network to object and background The ability of dividing.In the specific embodiment of the invention, the target detection network waits target on the basis of object candidate area Background semantic information of the picture region of 1.5 times of sizes of favored area as target；Fire9 layers of characteristic pattern adopted on primary Background semantic information is passed through the pond of area-of-interest by sample as the information that enhancing perceives wisp with up-sampling information (ROI pooling) obtains the feature of fixed size, increases by one layer of full articulamentum later, carries out time of classification and final candidate frame Return, specifically, the node of one proposals of cnn layers of connection of trunk, for summarizing the obtained candidate frame of target candidate network Information；On the other hand, the characteristic pattern for fire9 layers, W and H are the width and height for inputting picture, and cube 1 represents object The mapping in characteristic pattern in region, and cube 2 represents mapping of the regions context on characteristic pattern, the regions context are about 1.5 times of object area, while in order to reinforce the detection to wisp, then Fire9 layers are once up-sampled, Zhi Houyu Faster RCNN algorithms are similar, and the feature of fixed size is obtained using the pondization of area-of-interest；By Fire9 layers, treated Feature connect (concat) with the feature that proposals summarizes to together, increases by one layer of full articulamentum afterwards, carries out classification and final The recurrence of candidate frame, it will not be described here.

Step S101 inputs training sample.

Training process needs to provide the corresponding frame that personage is referred in image, while in order to accelerate to train, and training process will It cuts out to come from original image containing the image with reference to personage, forms patch (image block) one by one, patch is compared to original Beginning image smaller effectively accelerates training process to training.Specifically, in the present invention, the training sample of input includes The image data of the markup information of pedestrian area in rgb image data and image, hands-on is according to pedestrian region Cut obtained small patch (picture block).It is indicated with mathematical linguistics, training sampleWherein X_iIt indicates One patch of training picture；In practical applications, in addition to this classification of pedestrian, also other classifications, for example, background, ride from The K classifications such as the people of driving vehicle, the people being seated, therefore labeled data Y_i=(y_i, b_i) by class label y_i∈ 0,1,2 ..., K } and block diagram coordinate pointsComposition, whereinFor the origin coordinates point in the block diagram upper left corner,For block diagram width and height.

Step S102 initializes every layer of weight connected and biasing in convolutional neural networks and its parameter, including network layer. Specifically, the present invention trains Squeeze VGG-16 convolutional neural networks initial as network in advance using ImageNet data sets Change.

Step S103 learns the network for structure using training sample using propagated forward algorithm and Back Propagation Algorithm Parameter is used for the model of test process.

In the present invention, the size normalization of input picture is first 3 × 480 × 640 by the propagated forward algorithm, The input as convolutional neural networks of patch and corresponding markup information for intercepting 3 × 448 × 448 sizes, by convolutional layer, Down-sampled layer and linear elementary layer (ReLU Nonlinearity Layer) is corrected, at Fire9 layers, characteristics of image figure size is 512×60×80；At Fire12 layers, characteristic pattern size is 512 × 30 × 40, and two branching characteristic figure sizes are successively later 512 × 15 × 20 and 512 × 8 × 10.On different characteristic figure, four coordinate points of target block diagram are obtained by the way of convolution And classification information, for Fire9 layers, it is assumed that only detection pedestrian and background, then it is 6 × 60 × 80 that output, which is characterized size, In 6 include background, two classifications of pedestrian and candidate four coordinate points of block diagram.In target detection network, each branch layer is obtained To candidate block diagram summarized in proposals nodes, while with Fire9 layers of background semantic information and up-sampling information warp The obtained feature of pondization operation for crossing area-of-interest is overlapped, and does that last block diagram returns and classification returns.

In the present invention, the Back Propagation Algorithm needs first to find out the target block diagram of positive (i.e. preceding to) propagation forecast With the loss function of image realistic objective block diagramThen its gradient to parameter W is acquired, the algorithm declined using gradient W is updated to minimize loss functionIt is assumed that middle layer has M branch that can export the object candidate area (perception of M scale Domain can approximately detect all target objects in image), l^mIndicate the loss function of branch m, α_mIndicate l^mThe power of function Weight, S={ S¹, S²..., S^MRefer to the target object of corresponding scale, then loss functionIt may be defined as：

The loss function, for specific detection layers m, only target scale is in the range of m can be detected, just to damage It loses function to contribute, therefore loss function is defined as

Wherein, p (X)=(p₀(X) ..., p_K(X)) probability distribution of target category is indicated；λ is coefficient of balance；B is block diagram 4 coordinate points,Refer to the coordinate points that propagated forward obtains；In loss function, defines classification using cross entropy loss function and return Return, i.e.,

L_cls(p (X), y)=- log_y(P(X)) (3)

The recurrence of target block diagram, definition are carried out using smooth manhatton distance standard (smooth L1 criterion) It is as follows

Step S2, the changing rule for perceiving domain using neural network by trained model use different middle layers pair Target object within the scope of different scale is detected, and predicts the block diagram of target object in image (such as pedestrian).

Specifically, step S2 further comprises：

Step S200 is loaded into trained model；

Step S201, input test sample；

Step S202, using trained model, the changing rule that domain is perceived by neural network uses different centres Layer is detected the pedestrian within the scope of different scale, the block diagram of pedestrian in prognostic chart picture.Fig. 6 is in the specific embodiment of the invention The process schematic of quick pedestrian detection utilizes the target candidate network in model in Squeeze VGG-16 convolutional Neural nets On the basis of network, according to convolutional layer feature, in fire9, fire12, conv6 and increased pooling layers total 4 layers generation net Network branch carries out object candidate area (middle layer a, middle layer b, middle layer c) that different scale detects object；Then it utilizes Target detection network, on the basis of object candidate area, using the picture region of 1.5 times of sizes of object candidate area as target Background semantic information, Fire9 layers of characteristic pattern is once up-sampled, as the information that perceive to wisp of enhancing, general Background semantic information obtains the feature of fixed size with up-sampling information by the pondization of area-of-interest, and one layer of increase later is complete Articulamentum carries out the recurrence of classification and final candidate frame.Preferably, in step S202, also using the method deconvoluted to spy The characteristic pattern for determining network layer is amplified.

Pedestrian detection method proposed by the present invention uses for reference both sides evaluation index respectively：Average precision mAP and per second Frame number FPS.MAP be used for evaluate last detection zone and real goal personage region friendship and than the case where, in different friendships and compare The average value of lower precision ratio；FPS, mainly efficiency index refer to manageable number of pictures per second.

Fig. 7 is a kind of system architecture diagram of quick pedestrian detection device of the present invention.As shown in fig. 7, the present invention is a kind of quickly Pedestrian detection device, including：

Training unit 70 utilizes training sample for building the configurable depth model based on convolutional neural networks The network parameter of structure is practised out, the model for test process is obtained.In the specific embodiment of the invention, 70 structures of training unit The depth model built is made of two sub- networks：First sub-network is multiple dimensioned target candidate network, for extracting personage Feature simultaneously provides candidate region, and specifically, which proposes the difference of feature based on convolutional neural networks different layers Property, the candidate block diagram to different scale pedestrian is generated respectively in middle layer；Second sub-network is target detection network, enhancing The effect of detection, with target candidate network shared parameter, the classification and detection refined on the basis of candidate block diagram. Specifically, as shown in figure 8, training unit 70 further comprises：

Model construction unit 701, for building the configurable depth model based on convolutional neural networks.

The convolutional neural networks are folded and are formed by convolutional layer, down-sampled layer, up-sampling layer heap, and the convolutional layer refers to defeated The image or characteristic pattern entered carries out convolution algorithm on two-dimensional space, extracts stratification feature；The down-sampled layer uses The max-pooling operations not being overlapped, which is used to extract shape and deviates constant feature, while it is big to reduce characteristic pattern It is small, computational efficiency is improved, the up-sampling layer refers to the behaviour deconvoluted on two-dimensional space to the characteristic pattern of input Make, to increase the pixel of characteristic pattern.In the specific embodiment of the invention, made using Squeeze VGG-16 convolutional neural networks For backbone network.

In the specific embodiment of the invention, the target candidate network is on Squeeze VGG-16 convolutional neural networks basis On, according to convolutional layer feature, in fire9, fire12, conv6 and increased pooling layers of 4 layers total, generation network point Branch, branch carry out the recurrence that different scale detects the candidate frame of object.But for fire-9 layers, its relatively core network Low layer, can be very big compared to influence of other layers to gradient, learning process is unstable, therefore a more buffer (buffering) Layer, buffer layers avoid the gradient of detection branches from arriving trunk layer by direct back-propagated (backpropagation).

The target detection network and target candidate network shared parameter, the candidate frame of target candidate network is summarized, with Enhance separating capacity of the monitoring network to object and background.In the specific embodiment of the invention, the target detection network, in mesh On the basis of marking candidate region, using the picture region of 1.5 times of sizes of object candidate area as the background semantic information of target；It will Fire9 layers of characteristic pattern is once up-sampled, as the information that is perceived to wisp of enhancing, by background semantic information with above adopt Sample information by area-of-interest pondization obtain fixed size feature, later increase by one layer of full articulamentum, carry out classification with The recurrence of final candidate frame, specifically, the subnet of one proposal of cnn layers of connection of trunk, W and H are the width for inputting picture And height, cube 1 represents the pooling of object area, and cube 2 represents the pooling in the regions context, context Region is about 1.5 times of object area, while in order to reinforce the detection to wisp, then is once up-sampled to Fire9 layers, It is similar with faster RCNN algorithms later, the feature of fixed size is obtained using the pondization of area-of-interest, increases by one layer later Full articulamentum carries out the recurrence of classification and final candidate frame.

Training sample input unit 702, for inputting training sample.

Specifically, training sampleWherein X_iIndicate a patch of training picture, labeled data Y_i=(y_i, b_i) by class label y_iWith block diagram coordinate pointsComposition.

Initialization unit 703, the power for initializing every layer of connection in convolutional neural networks and its parameter, including network layer Weight and biasing.Specifically, the present invention trains Squeeze VGG-16 convolutional neural networks to make in advance using ImageNet data sets For netinit.

Sample training unit 704 learns for using propagated forward algorithm and Back Propagation Algorithm using training sample The network parameter of structure is used for the model of test process.

The Back Propagation Algorithm needs the target block diagram for first finding out forward-propagating prediction and image realistic objective block diagram Loss functionThen its gradient to parameter W is acquired, the algorithm that gradient declines is used to update W to minimize loss letter NumberIt is assumed that middle layer has M branch that can export object candidate area, (the perception domain of M scale can approximately detect All target objects in image), l^mIndicate the loss function of branch m, α_mIndicate l^mThe weight of function, S={ S¹, S²..., S^MRefer to The target object of corresponding scale, then loss functionIt may be defined as：

Wherein, p (X)=(p₀(X) ..., p_K(X)) it is the probability distribution of target category.In loss function, cross entropy is used Loss function defines classification recurrence, i.e.,

L_cls(p (X), y)=- log_y(P(X))

The recurrence that target block diagram is carried out using smooth L1 criterion, is defined as follows

Detection unit 71 is used for input test sample, perceives the variation in domain using neural network by trained model Rule is detected the target object (such as pedestrian) within the scope of different scale using different middle layers, predicts mesh in image Mark the block diagram of object (such as pedestrian).

Specifically, as shown in figure 9, detection unit 71 further comprises：

Model is loaded into unit 710, for being loaded into trained model；

Test sample input unit 711 is used for input test sample；

Image prediction unit 712 is perceived by trained model using neural network for utilizing trained model The changing rule in domain is detected the pedestrian within the scope of different scale using different middle layers, the frame of pedestrian in prognostic chart picture Figure.Specifically, image prediction unit 712 is using the target candidate network in model, in Squeeze VGG-16 convolutional Neural nets On the basis of network, according to convolutional layer feature, in Fire9, Fire12, conv6 and increased pooling layers total 4 layers generation net Network branch carries out the object candidate area that different scale detects object；Then target detection network is utilized, in target candidate area On the basis of domain, using the picture region of 1.5 times of sizes of object candidate area as the background semantic information of target, by Fire9 layers Characteristic pattern once up-sampled, as the information that perceive to wisp of enhancing, by background semantic information with up-sample information The feature of fixed size is obtained by the pondization of area-of-interest, increases by one layer of full articulamentum later, carries out classification and is finally waited Select the recurrence of frame.

In conclusion the method that a kind of rapid pedestrian detection method of the present invention and device use for reference compression network, adjusts and instructs The network for practicing VGG-16 obtains adapting to the squeeze VGG-16 networks that embedded system requires, and effectively reduces network model Parameter amount simultaneously accelerates computational efficiency；On the other hand, for perceiving domain in traditional detection method and article size is inconsistent asks Topic, the present invention using neural network perception domain changing rule (i.e. neural net layer is deeper, perception domain it is bigger, be suitble to detection greatly The target object of some), the target object within the scope of particular dimensions is detected using different middle layers, is preferably adapted to The relationship in perception domain and article size, effectively increases testing result；In addition, in order to enhance the detection to wisp, this hair It is bright that the characteristic pattern of particular network layer is amplified using the method deconvoluted, compared to the method for conventional pictures amplification, almost Do not increase video memory and calculation amount；In order to enhance the detection for fuzzy objective, on the characteristic pattern of this layer, target object is used The region of 1.5 times of sizes increases to as background semantic feature in network, the detection for fuzzy objective and remote wisp, There is splendid performance.

The above-described embodiments merely illustrate the principles and effects of the present invention, and is not intended to limit the present invention.Any Field technology personnel can without violating the spirit and scope of the present invention, and modifications and changes are made to the above embodiments.Therefore, The scope of the present invention, should be as listed in the claims.

Claims

1. a kind of rapid pedestrian detection method, includes the following steps：

Step S1 builds the configurable depth model based on convolutional neural networks, learns the net for structure using training sample Network parameter obtains the model for test process；

Step S2, input test sample, the changing rule for perceiving domain using neural network by trained model use difference Middle layer the target object within the scope of different scale is detected, predict the block diagram of target object in image.

2. a kind of rapid pedestrian detection method as described in claim 1, which is characterized in that step S1 further comprises：

Input training sample；

Using propagated forward algorithm and Back Propagation Algorithm, learn the network parameter for structure using training sample, that is, is used to survey The model of examination process.

3. a kind of rapid pedestrian detection method as claimed in claim 2, which is characterized in that the described depth model includes more rulers The target candidate network of degree and target detection network, the target candidate network are based on convolutional neural networks different layers and propose feature Otherness, generate the candidate block diagram to different scale target object respectively in middle layer；The target detection network is described The classification and detection refined on the basis of the candidate block diagram of target candidate network output.

4. a kind of rapid pedestrian detection method as claimed in claim 3, it is characterised in that：The convolutional neural networks are by convolution Layer, down-sampled layer, up-sampling layer heap is folded forms, the convolutional layer refer to the image or characteristic pattern of input on two-dimensional space Convolution algorithm is carried out, stratification feature is extracted；The down-sampled layer is operated using the max-pooling not being overlapped, the operation For extracting shape and deviating constant feature, while characteristic pattern size is reduced, improves computational efficiency；The up-sampling layer is Refer to the operation deconvoluted on two-dimensional space to the characteristic pattern of input, to increase the pixel of characteristic pattern.

5. a kind of rapid pedestrian detection method as claimed in claim 4, it is characterised in that：The depth model uses Squeeze VGG-16 convolutional neural networks are used as backbone network, the Squeeze VGG-16 convolutional neural networks The conv1-1 layers of network structure that extraction is characterized with 12 layers of Fire module layers followed by.

6. a kind of rapid pedestrian detection method as claimed in claim 5, it is characterised in that：The target candidate network is described On the basis of Squeeze VGG-16 convolutional neural networks, according to convolutional layer feature, in Fire9, Fire12, conv6 and increase Pooling layers, generate network branches, to carry out the recurrence that different scale detects the candidate frame of object.

7. a kind of rapid pedestrian detection method as claimed in claim 5, it is characterised in that：The target detection network is described On the basis of object candidate area, believe the picture region of object candidate area preset multiple size as the background semantic of target Breath, Fire9 layers of characteristic pattern is once up-sampled, and as the information that enhancing perceives wisp, and background semantic is believed Breath obtains the feature of fixed size with up-sampling information by the pondization of area-of-interest, increases by one layer of full articulamentum later, into The recurrence of row classification and final candidate frame.

8. a kind of rapid pedestrian detection method as described in claim 1, it is characterised in that：The training sample includes RGB figures As the markup information of pedestrian area in data and image, the image data of hands-on is cut according to pedestrian region The small patch arrived.

9. a kind of rapid pedestrian detection method as described in claim 1, it is characterised in that：The Back Propagation Algorithm, needs elder generation Find out the loss function of the target block diagram and image realistic objective block diagram of propagated forward predictionThen it is acquired to parameter The gradient of W uses the algorithm that gradient declines to update W to minimize loss functionIt is assumed that middle layer has M branch that can export Object candidate area, l^mIndicate the loss function of branch m, α_mIndicate l^mThe weight of function, S={ S¹, S²..., S^MRefer to corresponding ruler The target object of degree, then loss functionIt may be defined as：

10. a kind of quick pedestrian detecting system, including：

Training unit learns structure using training sample for building the configurable depth model based on convolutional neural networks The network parameter built obtains the model for test process；

Detection unit is used for input test sample, perceives the changing rule in domain using neural network by trained model and makes The target object within the scope of different scale is detected with different middle layers, predicts the block diagram of target object in image.