CN110263731A - A kind of single step face detection system - Google Patents
A kind of single step face detection system Download PDFInfo
- Publication number
- CN110263731A CN110263731A CN201910550738.8A CN201910550738A CN110263731A CN 110263731 A CN110263731 A CN 110263731A CN 201910550738 A CN201910550738 A CN 201910550738A CN 110263731 A CN110263731 A CN 110263731A
- Authority
- CN
- China
- Prior art keywords
- convolution module
- depth
- face
- module
- crop box
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of single step face detection systems.The present invention proposes that separating the real-time face that convolution is constituted by depth detects network YOMO, the Fusion Features structure containing multiple forms from top to bottom, and each detection module is only responsible for detecting the Face datection in corresponding range scale.The present invention enables the sample training that each detection module is more sufficient by quantity using the random cropping strategy of multiple scale detecting structure is more met.Oval recurrence device proposed by the present invention, can improve the detection recall rate under ContROC evaluation criteria by a relatively large margin.The detection accuracy of YOMO model proposed by the present invention, while keeping stronger competitiveness, the detection rates to the picture of 544 × 544 resolution ratio are 51FPS, and the EMS memory occupation of model only has 21M.
Description
Technical field
The present invention relates to human face detection tech fields, and in particular to a kind of single step face detection system.
Background technique
Face datection is the key components of smart city focusing on people, is related to identification, personalized clothes
The technologies such as business, pedestrian detection tracking, crowd's counting.Although having obtained extensive research, since there are various challenges, scene is unrestricted
Face datection be still one and open study a question.
The Face datection of early stage is primarily upon and manually designs effective feature, and establishes efficient classifier with this.But
The detection model of suboptimum is generally yielded, and with the variation of application scenarios, detecting accuracy be might have by a relatively large margin
It reduces.In recent years, the Successful utilization that depth learning technology is attracted people's attention in Face datection task, but generate an application
Unrestricted in scene, the real-time face detection model with higher accuracy still has biggish challenge.
Faster R-CNN using area proposed algorithm substitutes sliding window, and by candidate frame generation, feature extraction, frame
It returns and classification is all integrated into a network, be detection rates and the highest model of accuracy in R-CNN series model.But due to
Recommendation network generates more face candidate frame, and biggish computing cost brought by complicated network structure, can not do
It is detected to real-time face.
Another kind of method for detecting human face, such as YOLO, the problem of will test is converted into regression problem, therefore does not include and recommend net
Network returns face frame directly in the characteristic pattern of feature extraction network, has faster detection rates, but detection accuracy has
Wait improve.For improve detect accuracy, SSD utilize positioned at different layers Analysis On Multi-scale Features figure, the classification of associated prediction frame and
Position.Multilayer feature prediction helps to detect the face of different scale, but each stage therein without specialized training, with
Handle the face of particular dimensions range.That is, the face of all scales can produce in each detection module in training
Raw loss.In contrast, each detection module of YOMO is only trained by the face in suitable range scale.
For the small scale Face datection problem of single step detection method, HR utilizes image pyramid, the multiple separation of training
Single scale detector, each detector are responsible for the face of particular dimensions.But in test phase, picture need to be zoomed to multiple rulers
The picture of degree, each scale will pass through very deep network, and the expense of this multistep single scale detector computationally is very high.
And single step multiple scale detecting device, such as S3FD, face is detected using the Analysis On Multi-scale Features of depth convolutional network, is being tested
Stage only needs single to transmit picture to network.But there are still the problems same as SSD by S3FD, i.e., by the spy of each different scale
Sign figure is individually used for predicting, when predicting small scale face using bottom-layer network, due to lacking semantic feature, causes S3FD to small ruler
The detection effect for spending face is still undesirable.
Summary of the invention
For above-mentioned deficiency in the prior art, a kind of single step face detection system provided by the invention solves face inspection
The undesirable problem of examining system detection effect.
In order to achieve the above object of the invention, the technical solution adopted by the present invention are as follows: a kind of single step face detection system, including
Sequentially connected conventional convolution module conv0, depth separate convolution module conv1 from left to right, depth separates convolution mould
Block conv2, depth separate convolution module conv3, depth separates convolution module conv4, depth separates convolution module
Conv5, depth separate convolution module conv6, depth separates convolution module conv7, depth separates convolution module
Conv8, depth separate convolution module conv9, depth separates convolution module conv10, depth separates convolution module
Conv11, depth separate convolution module conv12, depth separates convolution module conv13, depth separates convolution module
Conv14, warp lamination conv15, depth separate convolution module conv16, depth separates convolution module conv17, warp
Lamination conv18, depth separate convolution module conv19 and depth separates convolution module conv20;
The output end that the depth separates convolution module conv14 is connect with detection module det-32, and the depth can divide
Output end from convolution module conv17 is connect with detection module det-16, and the depth separates the defeated of convolution module conv20
Outlet is connect with detection module det-8;
The depth separates the input of the output end and the separable convolution module conv16 of depth of convolution module conv11
End connection, the depth separate convolution module conv16 output end and warp lamination conv15 output end Fusion Features simultaneously
The input terminal that depth separates convolution module conv17 is connected, the depth separates the output end and depth of convolution module conv5
The input terminal connection of separable convolution module conv19 is spent, the depth separates the output end and warp of convolution module conv19
The output end Fusion Features of lamination conv18 simultaneously connect the input terminal that depth separates convolution module conv20.
Further: the conventional convolution module conv0 include from top to bottom sequentially connected 3 × 3 convolutional layer,
BatchNorm layers and LeakyReLU active coating.
Further:
The input picture of the conventional convolution module conv0 selects crop box by the random clipping algorithm of medium-soft
SelectCropbboxIt is cut and is trained, specific steps are as follows:
S1, the crop box Sampled that several length-width ratios are 1 is generated by random clipping algorithmbboxes, after original image is cut
Obtain cut picture, according to the input figure size of network require scaling cut picture, and by equal proportion scaling crop box in have
The true frame of effect, the quantity of each scale face, statistical formula are counted according to face range scale are as follows:
In above formula, NumicFor the number of the c class face scale of i-th of crop box, N is the type of face scale, N=3,
Respectively small scale face, mesoscale face and large scale face, M are the sum of crop box, and 1 () was identifier, and condition is
True duration is 1, is otherwise 0, MinScalecAnd MaxScalecRespectively the boundary minimum value of c class face scale and boundary be most
Big value, bboxkFor the side length of crop box, K is the total quantity of the crop box generated;
S2, face scale classification descending is arranged according to all kinds of face quantity of each crop box are as follows:
Si1≥Si2≥…≥SiN
In above formula, i is crop box serial number, SicFor one kind in i-th of crop box in N class face scale classification;
The quantity of all kinds of face scales when S3, statistics network hands-on, and according to it by face scale classification ascending order
Arrangement are as follows:
A1≤A2≤…≤AN
In above formula, AcFor one kind in N class face scale classification;
S4, in crop box SampledbboxesIn M face scale classification sequence in, searching meet Sic=AcCutting
Frame, random selection one meet the crop box of condition as SelectCropbbox;
S5, when the crop box for meeting step S4 is not found, in crop box SampledbboxesIn M face scale
In classification sequence, searching meets Si1=A1And SiN=ANCrop box, the crop box conduct that random selection one meets condition
SelectCropbbox;
S6, when the crop box for meeting step S5 is not found, in crop box SampledbboxesOne sanction of middle random selection
Frame is cut as SelectCropbbox;
S7, by SelectCropbboxIn face scale of all categories quantity Numsc, update to people all kinds of when hands-on
The quantity of face scaleIn, it may be assumed that
In above formula,For the quantity for all kinds of face scales that preceding primary training obtains, selected by s expression
The crop box serial number selected.
Further: the depth separates convolution module conv1, depth separates convolution module conv2, depth can divide
Convolution module conv4 is separated from convolution module conv3, depth, depth separates the separable volume of convolution module conv5, depth
Volume module conv6, depth separate convolution module conv7, depth separates convolution module conv8, depth separates convolution mould
Block conv9, depth separate convolution module conv10, depth separates convolution module conv11, depth separates convolution module
Conv12, depth separate convolution module conv13, depth separates convolution module conv14, depth separates convolution module
Conv16, depth separate convolution module conv17, depth separates convolution module conv19 and depth separates convolution module
The structure of conv20 is identical, including sequentially connected 3 × 3 convolutional layer from top to bottom, BatchNorm layers, LeakyReLU activation
Layer, 1 × 1 convolutional layer, BatchNorm layers and LeakyReLU active coating.
Further: the depth separates convolution module conv14, depth separates convolution module conv17 and depth
The output channel number of separable convolution module conv20 is 1024.
Further: the detection module det-32 is used for large scale Face datection, and the detection module det-16 is used for
Mesoscale Face datection, the detection module det-8 are used for small scale Face datection.
Further: the detection module det-32, detection module det-16 and detection module det-8 include regular volume
Lamination and output layer;
The output channel quantity of the regular volume lamination is 18;
The centre coordinate of the output layer prediction block and the calculation formula of side length are as follows:
bx=σ (tx)+Cx,by=σ (ty)+Cy
In above formula, (bx,by) be prediction block centre coordinate, bwAnd bhThe respectively width and height of prediction block, txAnd tyRespectively
For the offset of prediction block central point abscissa and ordinate, (Cx,Cy) top left co-ordinate of grid, σ () where Anchor
For sigmoid function, pwAnd phThe respectively width of Anchor and height.
Further: the output end of the detection module det-32, detection module det-16 and detection module det-8 connect
Oval recurrence device is connect, output layer prediction block is converted oval true frame, the meter of the oval really frame by the oval recurrence device
Calculate formula are as follows:
Y=XW+ ε
In above formula, Y is the coordinate vector of oval true frame, including major semiaxis ra, semi-minor axis rb, angle, θ, the horizontal seat of central point
Mark cxWith ordinate cy, X is the coordinate vector of output layer prediction block, the centre coordinate b including prediction blockx、by, prediction block wide bw
With high bh, W is regression coefficient matrix, and ε is random error;
Wherein, the calculation formula of regression coefficient matrix W are as follows:
In above formula, J () indicates that mean square error function, X ' are the normalized coordinates vector of prediction block, and Y ' is true frame
Normalized coordinates vector;
In above formula, UXAnd σXThe respectively mean value and standard deviation of the X of prediction block coordinate vector, UYAnd σYRespectively true frame
The mean value and standard deviation of coordinate vector Y.
The invention has the benefit that
1. the present invention proposes that separating the real-time face that convolution constitutes by depth detects network YOMO, containing it is multiple from upper and
The Fusion Features structure of lower form, each detection module are only responsible for detecting the Face datection in corresponding range scale.
2. the present invention enables each detection module to be counted using the random cropping strategy for more meeting multiple scale detecting structure
Measure more sufficient sample training.
3. oval recurrence device proposed by the present invention, can improve the detection recall rate under ContROC evaluation criteria by a relatively large margin.
4. the detection accuracy of YOMO model proposed by the present invention, while keeping stronger competitiveness, to 544 × 544
The detection rates of the picture of resolution ratio are 51FPS.
Detailed description of the invention
Fig. 1 is structure of the invention figure;
Fig. 2 is assessment result of the present invention in FDDB data set;
Fig. 3 is visualization result figure of the present invention in WIDER FACE data set and FDDB data set.
Specific embodiment
A specific embodiment of the invention is described below, in order to facilitate understanding by those skilled in the art this hair
It is bright, it should be apparent that the present invention is not limited to the ranges of specific embodiment, for those skilled in the art,
As long as various change is in the spirit and scope of the present invention that the attached claims limit and determine, these variations are aobvious and easy
See, all are using the innovation and creation of present inventive concept in the column of protection.
As shown in Figure 1, a kind of single step face detection system, including sequentially connected conventional convolution module from left to right
Conv0, depth separate convolution module conv1, depth separates convolution module conv2, depth separates convolution module
Conv3, depth separate convolution module conv4, depth separates convolution module conv5, depth separates convolution module
Conv6, depth separate convolution module conv7, depth separates convolution module conv8, depth separates convolution module
Conv9, depth separate convolution module conv10, depth separates convolution module conv11, depth separates convolution module
Conv12, depth separate convolution module conv13, depth separates convolution module conv14, warp lamination conv15, depth
Separable convolution module conv16, depth separate convolution module conv17, warp lamination conv18, depth and separate convolution mould
Block conv19 and depth separate convolution module conv20;
The output end that the depth separates convolution module conv14 is connect with detection module det-32, and the depth can divide
Output end from convolution module conv17 is connect with detection module det-16, and the depth separates the defeated of convolution module conv20
Outlet is connect with detection module det-8;
The depth separates the input of the output end and the separable convolution module conv16 of depth of convolution module conv11
End connection, the depth separate convolution module conv16 output end and warp lamination conv15 output end Fusion Features simultaneously
The input terminal that depth separates convolution module conv17 is connected, the depth separates the output end and depth of convolution module conv5
The input terminal connection of separable convolution module conv19 is spent, the depth separates the output end and warp of convolution module conv19
The output end Fusion Features of lamination conv18 simultaneously connect the input terminal that depth separates convolution module conv20.
The output characteristic pattern of conv14, conv17, conv20 compare original image, and down-sampling step-length is respectively 32,16,8.Institute
Detection module det-32 is stated for large scale Face datection, the detection module det-16 is used for mesoscale Face datection, described
Detection module det-8 is used for small scale Face datection, and the face range scale that detection module is responsible for is as shown in table 1.
The face range scale that 1 detection module of table is responsible for
Scale classification | Det-8 (small scale face) | Det-16 (mesoscale face) | Det-32 (large scale face) |
Minimum M inScale | 10 | 40 | 100 |
Maximum value MaxScale | 39 | 99 | 350 |
The present invention is set as the RMSProp gradient optimal method training network of table 2 using parameter.Place 3 detection modules
On the layer of different step-lengths, to enhance the multiple scale detecting ability of model.In training, the loss function of each detection module is
Multitask loss function comprising 5 parts.To make each detection module only be responsible for the face in corresponding range scale, returned in gradient
When biography, detection branches belonging to the maximum anchor of IoU of search and true frame, the only anchor will generate frame and return damage
It loses.To keep training more effective, each true frame will match one and the highest anchor of its IoU.
Table 2 trains file parameters allocation list
base_lr | step_value | gamma | batch_size | iter_size | type | weight_decay | max_iter |
0.001 | 40000 | 0.1 | 9 | 3 | RMSProp | 0.00005 | 200000 |
The multitask loss function of YOMO includes 5 parts, respectively non-targeted loss, the loss of anchor pre-training, mesh
Target positioning loss, the confidence level loss of target, the classification loss of target, as shown in formula (3).
Wherein W, H are respectively the width and height of characteristic pattern, and A is the quantity of Anchor, and t is the number of iterations.1 (x) indicates to differentiate
Symbol, when x is true, value 1, otherwise its value is 0.λnoobj, λprior, λcoord, λobj, λclassFor the weighted value of each point of task,
It is non-target loss weight, Anchor pre-training loss weight, coordinate loss weight, target loss weight, classification loss respectively
Weight.brFor 4 coordinate shift values of neural network forecast, and priorrIt is that frame central point is horizontal respectively for 4 coordinates of Anchor
Coordinate x, ordinate y, border width w, bezel height h.When the IoU of prediction block and all true frames is both less than or equal to threshold value
When Thresh, then the region of input figure corresponding to the prediction block is non-targeted, i.e. background, and the predicted value of confidence level is bo。
In order to make network adapt to Anchor as soon as possible, Anchor pre-training loss weight is introduced early period in training.1 is defined in YOMO model
A epoch is training early period.
The conventional convolution module conv0 include from top to bottom sequentially connected 3 × 3 convolutional layer, BatchNorm layers and
LeakyReLU active coating.
The input picture of conventional convolution module conv0 selects crop box by the random clipping algorithm of medium-soft
SelectCropbboxIt is cut and is trained, specific steps are as follows:
S1, the crop box Sampled that several length-width ratios are 1 is generated by random clipping algorithmbboxes, after original image is cut
Obtain cut picture, according to the input figure size of network require scaling cut picture, and by equal proportion scaling crop box in have
The true frame of effect, the quantity of each scale face, statistical formula are counted according to face range scale are as follows:
In above formula, NumicFor the number of the c class face scale of i-th of crop box, N is the type of face scale, N=3,
Respectively small scale face, mesoscale face and large scale face, M are the sum of crop box, and 1 () was identifier, and condition is
True duration is 1, is otherwise 0, MinScalecAnd MaxScalecRespectively the boundary minimum value of c class face scale and boundary be most
Big value, bboxkFor the side length of crop box, K is the total quantity of the crop box generated;
S2, face scale classification descending is arranged according to all kinds of face quantity of each crop box are as follows:
Si1≥Si2≥…≥SiN
In above formula, i is crop box serial number, SicFor one kind in i-th of crop box in N class face scale classification;
The quantity of all kinds of face scales when S3, statistics network hands-on, and according to it by face scale classification ascending order
Arrangement are as follows:
A1≤A2≤…≤AN
In above formula, AcFor one kind in N class face scale classification;
S4, in crop box SampledbboxesIn M face scale classification sequence in, searching meet Sic=AcCutting
Frame, random selection one meet the crop box of condition as SelectCropbbox;
S5, when the crop box for meeting step S4 is not found, in crop box SampledbboxesIn M face scale
In classification sequence, searching meets Si1=A1And SiN=ANCrop box, the crop box conduct that random selection one meets condition
SelectCropbbox;
S6, when the crop box for meeting step S5 is not found, in crop box SampledbboxesOne sanction of middle random selection
Frame is cut as SelectCropbbox;
S7, by SelectCropbboxIn face scale of all categories quantity Numsc, update to people all kinds of when hands-on
The quantity of face scaleIn, it may be assumed that
In above formula,For the quantity for all kinds of face scales that preceding primary training obtains, selected by s expression
The crop box serial number selected.
The depth separates convolution module conv1, depth separates convolution module conv2, depth separates convolution mould
Block conv3, depth separate convolution module conv4, depth separates convolution module conv5, depth separates convolution module
Conv6, depth separate convolution module conv7, depth separates convolution module conv8, depth separates convolution module
Conv9, depth separate convolution module conv10, depth separates convolution module conv11, depth separates convolution module
Conv12, depth separate convolution module conv13, depth separates convolution module conv14, depth separates convolution module
Conv16, depth separate convolution module conv17, depth separates convolution module conv19 and depth separates convolution module
The structure of conv20 is identical, including sequentially connected 3 × 3 convolutional layer from top to bottom, BatchNorm layers, LeakyReLU activation
Layer, 1 × 1 convolutional layer, BatchNorm layers and LeakyReLU active coating.
The depth separates convolution module conv14, depth separates convolution module conv17 and depth separates convolution
The output channel number of module conv20 is 1024.
The detection module det-32, detection module det-16 and detection module det-8 include regular volume lamination and defeated
Layer out;
The calculation formula of the output channel quantity of the regular volume lamination are as follows:
numoutput=(numcoordinate+numconfidence+numclasses)×numAnchors
Wherein coordinate, confidence, classes, Anchors respectively indicate frame coordinate points, confidence level, class
Other and anchor.When Anchor number is more, the detection accuracy of network is preferable, but trained and test speed will reduce.Consider
There are 3 detection modules to be responsible for the face of 3 kinds of scales into YOMO, in order to balance speed and precision, numAnchors=3.Therefore it examines
The output channel number for the regular volume lamination surveyed in module is all 18.
The centre coordinate of the output layer prediction block and the calculation formula of side length are as follows:
bx=σ (tx)+Cx,by=σ (ty)+Cy
In above formula, (bx,by) be prediction block centre coordinate, bwAnd bhThe respectively width and height of prediction block, txAnd tyRespectively
For the offset of prediction block central point abscissa and ordinate, (Cx,Cy) top left co-ordinate of grid, σ () where Anchor
For sigmoid function, pwAnd phThe respectively width of Anchor and height.
The output end of the detection module det-32, detection module det-16 and detection module det-8 are all connected with oval return
Return device, output layer prediction block is converted oval true frame, the calculation formula of the oval true frame by the oval recurrence device are as follows:
Y=XW+ ε
In above formula, Y is the coordinate vector of oval true frame, including major semiaxis ra, semi-minor axis rb, angle, θ, the horizontal seat of central point
Mark cxWith ordinate cy, X is the coordinate vector of output layer prediction block, the centre coordinate b including prediction blockx、by, prediction block wide bw
With high bh, W is regression coefficient matrix, and ε is random error;
Wherein, the calculation formula of regression coefficient matrix W are as follows:
In above formula, J () indicates that mean square error function, X ' are the normalized coordinates vector of prediction block, and Y ' is true frame
Normalized coordinates vector;
In above formula, UXAnd σXThe respectively mean value and standard deviation of the X of prediction block coordinate vector, UYAnd σYRespectively true frame
The mean value and standard deviation of coordinate vector Y.
When training ellipse returns device, how to match prediction block and true frame is crucial.In practical operation, to every of FDDB
The true frame of each of picture matches the highest prediction block of IoU therewith, only considers true frame and matched prediction block when training.
Experimental situation of the present invention is based on 64 Ubuntu 14.04LTS systems, and running memory 16GB, CPU are 8 cores
IntelCore i7-7700K, monokaryon frequency are 4.20GHz.All models are based on Caffe frame, training, type in individual GPU
Number be NVIDIA GeForce GTX 1080Ti.
The feature extraction network pre-training of YOMO model is in ImageNet, and the fine tuning Jing Guo 200K iteration.When training
Other parameter settings it is as shown in table 2.The maximum anchor of IoU with true frame is positive example, and the anchor of IoU < 0.3 is recognized
To be background.In view of detection rates and face range scale, each detection module includes 3 anchor, and numerical value is in training
Cluster is concentrated to obtain.Each section weight is respectively λ in loss functionnoobj=1, λprior=1, λcoord=1, λobj=5, λclass=
1.The NMS threshold value of each detection module is set as 0.7 when training, and while testing is 0.45.The training picture of all models in the present invention
It is scaled to 544 × 544 resolution ratio.
WIDERFACE is Face datection benchmark dataset, and picture is collected in internet, and background is more complex.Data set has altogether
Comprising 32203 pictures, it is labelled with totally 393703 width face, the size of face, has blocked higher constant interval at posture.
And 61 event classes are ranged, proportionally 40%, 10%, 50% training set, verifying collection and survey are splitted data into every class
Examination collection.All models in the present invention are obtained in training concentration training.
Picture in FDDB data set is collected in Faces in the Wild data set, altogether includes 2845 pictures, 5171
Width face.It with certain difficulty, including blocks, difficult posture, low resolution and out of focus, further includes black and white and color image.
Different from other face detection data collection, tab area is oval and non-rectangle.All models are in FDDB data set in the present invention
Middle test.
In FDDB data integrated test, all pictures keep length-width ratio scaling, and are embedded in the black of 544 × 544 scales
In background, to guarantee that picture will not deformation occurs.As shown in Fig. 2 (a) and 2 (b), by YOMO and MTCNN, ScaleFace, HR,
HR-ER, ICC-CNN, FANet model, the result in DiscROC and ContROC compare respectively.
YOMO-Fit is testing result of the YOMO after oval recurrence device in Fig. 2.By FDDB assessment result it is found that
YOMO-Fit is under DiscROC and ContROC evaluation criteria, and when erroneous detection number is fixed as 1000, recall rate is respectively
97.7% and 83.6%, it is only below FANet.And even if HR-ER uses FDDB as the training data of 10-fold cross validation,
Recall rate in DiscROC is identical as YOMO, the recall rate ratio YOMO-Fit low 4.9% in ContROC.It is noticeable
It is that ellipse returns device and makes recall rate of the YOMO at DiscROC and ContROC that 0.1% and 8.6% be respectively increased.
Fig. 3 (a), (b) are the visualization result that individual pictures of WIDER FACE and FDDB data set are tested respectively.
Rectangular shaped rim is the prediction block of YOMO model in Fig. 3 (a).In Fig. 3 (b) rectangle and it is oval be respectively the prediction block of YOMO and true
Frame.
Claims (8)
1. a kind of single step face detection system, which is characterized in that including sequentially connected conventional convolution module from left to right
Conv0, depth separate convolution module conv1, depth separates convolution module conv2, depth separates convolution module
Conv3, depth separate convolution module conv4, depth separates convolution module conv5, depth separates convolution module
Conv6, depth separate convolution module conv7, depth separates convolution module conv8, depth separates convolution module
Conv9, depth separate convolution module conv10, depth separates convolution module conv11, depth separates convolution module
Conv12, depth separate convolution module conv13, depth separates convolution module conv14, warp lamination conv15, depth
Separable convolution module conv16, depth separate convolution module conv17, warp lamination conv18, depth and separate convolution mould
Block conv19 and depth separate convolution module conv20;
The output end that the depth separates convolution module conv14 is connect with detection module det-32, the separable volume of the depth
The output end of volume module conv17 is connect with detection module det-16, and the depth separates the output end of convolution module conv20
It is connect with detection module det-8;
The input terminal that the depth separates the output end of convolution module conv11 and depth separates convolution module conv16 connects
It connects, the depth separates output end and the output end Fusion Features of warp lamination conv15 of convolution module conv16 and connect
Depth separates the input terminal of convolution module conv17, and the depth separates the output end of convolution module conv5 and depth can
The input terminal connection of convolution module conv19 is separated, the depth separates the output end and warp lamination of convolution module conv19
The output end Fusion Features of conv18 simultaneously connect the input terminal that depth separates convolution module conv20.
2. single step face detection system according to claim 1, which is characterized in that the conventional convolution module conv0 packet
Include sequentially connected 3 × 3 convolutional layer, BatchNorm layers and LeakyReLU active coating from top to bottom.
3. single step face detection system according to claim 1, which is characterized in that the conventional convolution module conv0's
It inputs picture and crop box SelectCrop is selected by the random clipping algorithm of medium-softbboxIt is cut and is trained, it is specific to walk
Suddenly are as follows:
S1, the crop box Sampled that several length-width ratios are 1 is generated by random clipping algorithmbboxes, obtained after original image is cut
Picture is cut, requires scaling to cut picture according to the input figure size of network, and by effective in equal proportion scaling crop box
True frame counts the quantity of each scale face, statistical formula according to face range scale are as follows:
In above formula, NumicFor the number of the c class face scale of i-th of crop box, N is the type of face scale, N=3, difference
For small scale face, mesoscale face and large scale face, M is the sum of crop box, and 1 () was identifier, when condition is true
Value is 1, is otherwise 0, MinScalecAnd MaxScalecThe respectively boundary minimum value and boundary maximum value of c class face scale,
bboxkFor the side length of crop box, K is the total quantity of the crop box generated;
S2, face scale classification descending is arranged according to all kinds of face quantity of each crop box are as follows:
Si1≥Si2≥…≥SiN
In above formula, i is crop box serial number, SicFor one kind in i-th of crop box in N class face scale classification;
The quantity of all kinds of face scales when S3, statistics network hands-on, and arranged face scale classification ascending order according to it
Are as follows:
A1≤A2≤…≤AN
In above formula, AcFor one kind in N class face scale classification;
S4, in crop box SampledbboxesIn M face scale classification sequence in, searching meet Sic=AcCrop box, with
Machine selects the crop box for meeting condition as SelectCropbbox;
S5, when the crop box for meeting step S4 is not found, in crop box SampledbboxesIn M face scale classification
In sequence, searching meets Si1=A1And SiN=ANCrop box, the crop box conduct that random selection one meets condition
SelectCropbbox;
S6, when the crop box for meeting step S5 is not found, in crop box SampledbboxesOne crop box of middle random selection
As SelectCropbbox;
S7, by SelectCropbboxIn face scale of all categories quantity Numsc, update to face rulers all kinds of when hands-on
The quantity of degreeIn, it may be assumed that
In above formula,For the quantity for all kinds of face scales that preceding primary training obtains, s indicates selected
Crop box serial number.
4. single step face detection system according to claim 1, which is characterized in that the depth separates convolution module
Conv1, depth separate convolution module conv2, depth separates convolution module conv3, depth separates convolution module
Conv4, depth separate convolution module conv5, depth separates convolution module conv6, depth separates convolution module
Conv7, depth separate convolution module conv8, depth separates convolution module conv9, depth separates convolution module
Conv10, depth separate convolution module conv11, depth separates convolution module conv12, depth separates convolution module
Conv13, depth separate convolution module conv14, depth separates convolution module conv16, depth separates convolution module
Conv17, depth separate convolution module conv19 and depth separate convolution module conv20 structure it is identical, include from
Sequentially connected 3 × 3 convolutional layer, BatchNorm layers, LeakyReLU active coating, 1 × 1 convolutional layer, BatchNorm under
Layer and LeakyReLU active coating.
5. single step face detection system according to claim 1, which is characterized in that the depth separates convolution module
Conv14, depth separate convolution module conv17 and the output channel number of the separable convolution module conv20 of depth is
1024。
6. single step face detection system according to claim 1, which is characterized in that the detection module det-32 is for big
Scale Face datection, the detection module det-16 are used for mesoscale Face datection, and the detection module det-8 is used for small scale
Face datection.
7. single step face detection system according to claim 1, which is characterized in that the detection module det-32, detection
Module det-16 and detection module det-8 includes regular volume lamination and output layer;
The output channel quantity of the regular volume lamination is 18;
The centre coordinate of the output layer prediction block and the calculation formula of side length are as follows:
bx=σ (tx)+Cx,by=σ (ty)+Cy
In above formula, (bx,by) be prediction block centre coordinate, bwAnd bhThe respectively width and height of prediction block, txAnd tyIt is respectively pre-
Survey the offset of frame central point abscissa and ordinate, (Cx,Cy) be grid where Anchor top left co-ordinate, σ () is
Sigmoid function, pwAnd phThe respectively width of Anchor and height.
8. single step face detection system according to claim 7, which is characterized in that the detection module det-32, detection
The output end of module det-16 and detection module det-8 are all connected with oval recurrence device, and the oval device that returns predicts output layer
Frame is converted into oval true frame, the calculation formula of the oval true frame are as follows:
Y=XW+ ε
In above formula, Y is the coordinate vector of oval true frame, including major semiaxis ra, semi-minor axis rb, angle, θ, central point abscissa cx
With ordinate cy, X is the coordinate vector of output layer prediction block, the centre coordinate b including prediction blockx、by, prediction block wide bwWith
High bh, W is regression coefficient matrix, and ε is random error;
Wherein, the calculation formula of regression coefficient matrix W are as follows:
In above formula, J () indicates that mean square error function, X ' are the normalized coordinates vector of prediction block, and Y ' is the standard of true frame
Change coordinate vector;
In above formula, UXAnd σXThe respectively mean value and standard deviation of the X of prediction block coordinate vector, UYAnd σYRespectively true frame coordinate to
Measure the mean value and standard deviation of Y.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910550738.8A CN110263731B (en) | 2019-06-24 | 2019-06-24 | Single step human face detection system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910550738.8A CN110263731B (en) | 2019-06-24 | 2019-06-24 | Single step human face detection system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110263731A true CN110263731A (en) | 2019-09-20 |
CN110263731B CN110263731B (en) | 2021-03-16 |
Family
ID=67920979
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910550738.8A Active CN110263731B (en) | 2019-06-24 | 2019-06-24 | Single step human face detection system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110263731B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110807385A (en) * | 2019-10-24 | 2020-02-18 | 腾讯科技(深圳)有限公司 | Target detection method and device, electronic equipment and storage medium |
CN111401292A (en) * | 2020-03-25 | 2020-07-10 | 成都东方天呈智能科技有限公司 | Face recognition network construction method fusing infrared image training |
CN111489332A (en) * | 2020-03-31 | 2020-08-04 | 成都数之联科技有限公司 | Multi-scale IOF random cutting data enhancement method for target detection |
CN112699826A (en) * | 2021-01-05 | 2021-04-23 | 风变科技(深圳)有限公司 | Face detection method and device, computer equipment and storage medium |
Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104866833A (en) * | 2015-05-29 | 2015-08-26 | 中国科学院上海高等研究院 | Video stream face detection method and apparatus thereof |
US9392257B2 (en) * | 2011-11-28 | 2016-07-12 | Sony Corporation | Image processing device and method, recording medium, and program |
CN106599797A (en) * | 2016-11-24 | 2017-04-26 | 北京航空航天大学 | Infrared face identification method based on local parallel nerve network |
CN106709568A (en) * | 2016-12-16 | 2017-05-24 | 北京工业大学 | RGB-D image object detection and semantic segmentation method based on deep convolution network |
CN108182397A (en) * | 2017-12-26 | 2018-06-19 | 王华锋 | A kind of multiple dimensioned face verification method of multi-pose |
CN108564030A (en) * | 2018-04-12 | 2018-09-21 | 广州飒特红外股份有限公司 | Classifier training method and apparatus towards vehicle-mounted thermal imaging pedestrian detection |
CN108647649A (en) * | 2018-05-14 | 2018-10-12 | 中国科学技术大学 | The detection method of abnormal behaviour in a kind of video |
CN108664893A (en) * | 2018-04-03 | 2018-10-16 | 福州海景科技开发有限公司 | A kind of method for detecting human face and storage medium |
WO2018213841A1 (en) * | 2017-05-19 | 2018-11-22 | Google Llc | Multi-task multi-modal machine learning model |
CN109101899A (en) * | 2018-07-23 | 2018-12-28 | 北京飞搜科技有限公司 | A kind of method for detecting human face and system based on convolutional neural networks |
CN109272487A (en) * | 2018-08-16 | 2019-01-25 | 北京此时此地信息科技有限公司 | The quantity statistics method of crowd in a kind of public domain based on video |
CN109284670A (en) * | 2018-08-01 | 2019-01-29 | 清华大学 | A kind of pedestrian detection method and device based on multiple dimensioned attention mechanism |
CN109598290A (en) * | 2018-11-22 | 2019-04-09 | 上海交通大学 | A kind of image small target detecting method combined based on hierarchical detection |
WO2019079895A1 (en) * | 2017-10-24 | 2019-05-02 | Modiface Inc. | System and method for image processing using deep neural networks |
CN109711384A (en) * | 2019-01-09 | 2019-05-03 | 江苏星云网格信息技术有限公司 | A kind of face identification method based on depth convolutional neural networks |
CN109753927A (en) * | 2019-01-02 | 2019-05-14 | 腾讯科技(深圳)有限公司 | A kind of method for detecting human face and device |
CN109815886A (en) * | 2019-01-21 | 2019-05-28 | 南京邮电大学 | A kind of pedestrian and vehicle checking method and system based on improvement YOLOv3 |
CN109919097A (en) * | 2019-03-08 | 2019-06-21 | 中国科学院自动化研究所 | Face and key point combined detection system, method based on multi-task learning |
CN109919308A (en) * | 2017-12-13 | 2019-06-21 | 腾讯科技(深圳)有限公司 | A kind of neural network model dispositions method, prediction technique and relevant device |
-
2019
- 2019-06-24 CN CN201910550738.8A patent/CN110263731B/en active Active
Patent Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9392257B2 (en) * | 2011-11-28 | 2016-07-12 | Sony Corporation | Image processing device and method, recording medium, and program |
CN104866833A (en) * | 2015-05-29 | 2015-08-26 | 中国科学院上海高等研究院 | Video stream face detection method and apparatus thereof |
CN106599797A (en) * | 2016-11-24 | 2017-04-26 | 北京航空航天大学 | Infrared face identification method based on local parallel nerve network |
CN106709568A (en) * | 2016-12-16 | 2017-05-24 | 北京工业大学 | RGB-D image object detection and semantic segmentation method based on deep convolution network |
WO2018213841A1 (en) * | 2017-05-19 | 2018-11-22 | Google Llc | Multi-task multi-modal machine learning model |
WO2019079895A1 (en) * | 2017-10-24 | 2019-05-02 | Modiface Inc. | System and method for image processing using deep neural networks |
CN109919308A (en) * | 2017-12-13 | 2019-06-21 | 腾讯科技(深圳)有限公司 | A kind of neural network model dispositions method, prediction technique and relevant device |
CN108182397A (en) * | 2017-12-26 | 2018-06-19 | 王华锋 | A kind of multiple dimensioned face verification method of multi-pose |
CN108664893A (en) * | 2018-04-03 | 2018-10-16 | 福州海景科技开发有限公司 | A kind of method for detecting human face and storage medium |
CN108564030A (en) * | 2018-04-12 | 2018-09-21 | 广州飒特红外股份有限公司 | Classifier training method and apparatus towards vehicle-mounted thermal imaging pedestrian detection |
CN108647649A (en) * | 2018-05-14 | 2018-10-12 | 中国科学技术大学 | The detection method of abnormal behaviour in a kind of video |
CN109101899A (en) * | 2018-07-23 | 2018-12-28 | 北京飞搜科技有限公司 | A kind of method for detecting human face and system based on convolutional neural networks |
CN109284670A (en) * | 2018-08-01 | 2019-01-29 | 清华大学 | A kind of pedestrian detection method and device based on multiple dimensioned attention mechanism |
CN109272487A (en) * | 2018-08-16 | 2019-01-25 | 北京此时此地信息科技有限公司 | The quantity statistics method of crowd in a kind of public domain based on video |
CN109598290A (en) * | 2018-11-22 | 2019-04-09 | 上海交通大学 | A kind of image small target detecting method combined based on hierarchical detection |
CN109753927A (en) * | 2019-01-02 | 2019-05-14 | 腾讯科技(深圳)有限公司 | A kind of method for detecting human face and device |
CN109711384A (en) * | 2019-01-09 | 2019-05-03 | 江苏星云网格信息技术有限公司 | A kind of face identification method based on depth convolutional neural networks |
CN109815886A (en) * | 2019-01-21 | 2019-05-28 | 南京邮电大学 | A kind of pedestrian and vehicle checking method and system based on improvement YOLOv3 |
CN109919097A (en) * | 2019-03-08 | 2019-06-21 | 中国科学院自动化研究所 | Face and key point combined detection system, method based on multi-task learning |
Non-Patent Citations (2)
Title |
---|
BARRET ZOPH ET.AL.: "Learning Transferable Architectures for Scalable Image Recognition", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
林鹏: "基于Adaboost算法的人脸检测研究及实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110807385A (en) * | 2019-10-24 | 2020-02-18 | 腾讯科技(深圳)有限公司 | Target detection method and device, electronic equipment and storage medium |
CN110807385B (en) * | 2019-10-24 | 2024-01-12 | 腾讯科技(深圳)有限公司 | Target detection method, target detection device, electronic equipment and storage medium |
CN111401292A (en) * | 2020-03-25 | 2020-07-10 | 成都东方天呈智能科技有限公司 | Face recognition network construction method fusing infrared image training |
CN111489332A (en) * | 2020-03-31 | 2020-08-04 | 成都数之联科技有限公司 | Multi-scale IOF random cutting data enhancement method for target detection |
CN112699826A (en) * | 2021-01-05 | 2021-04-23 | 风变科技(深圳)有限公司 | Face detection method and device, computer equipment and storage medium |
CN112699826B (en) * | 2021-01-05 | 2024-05-28 | 风变科技(深圳)有限公司 | Face detection method, device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110263731B (en) | 2021-03-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110263774B (en) | A kind of method for detecting human face | |
JP6830707B1 (en) | Person re-identification method that combines random batch mask and multi-scale expression learning | |
CN110263731A (en) | A kind of single step face detection system | |
US20210089752A1 (en) | Face detection training method and apparatus, and electronic device | |
Ghamisi et al. | A novel feature selection approach based on FODPSO and SVM | |
CN110287960A (en) | The detection recognition method of curve text in natural scene image | |
CN110458165B (en) | Natural scene text detection method introducing attention mechanism | |
CN109101930A (en) | A kind of people counting method and system | |
CN108171233A (en) | Use the method and apparatus of the object detection of the deep learning model based on region | |
CN109117876A (en) | A kind of dense small target deteection model building method, model and detection method | |
CN106960195A (en) | A kind of people counting method and device based on deep learning | |
CN108960404B (en) | Image-based crowd counting method and device | |
CN112949572A (en) | Slim-YOLOv 3-based mask wearing condition detection method | |
US20050163344A1 (en) | System, program, and method for generating visual-guidance information | |
CN105354595A (en) | Robust visual image classification method and system | |
CN111507248A (en) | Face forehead area detection and positioning method and system of low-resolution thermodynamic diagram | |
CN109558902A (en) | A kind of fast target detection method | |
CN109815979A (en) | A kind of weak label semantic segmentation nominal data generation method and system | |
CN109785298A (en) | A kind of multi-angle object detecting method and system | |
CN110826379A (en) | Target detection method based on feature multiplexing and YOLOv3 | |
CN110084211B (en) | Action recognition method | |
CN110751027B (en) | Pedestrian re-identification method based on deep multi-instance learning | |
CN107590427A (en) | Monitor video accident detection method based on space-time interest points noise reduction | |
CN109614990A (en) | A kind of object detecting device | |
CN107944437A (en) | A kind of Face detection method based on neutral net and integral image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |