CN107871102A

CN107871102A - A kind of method for detecting human face and device

Info

Publication number: CN107871102A
Application number: CN201610849655.5A
Authority: CN
Inventors: 宋丽; 段旭; 张祥德
Original assignee: Beijing Eyecool Technology Co Ltd
Current assignee: Beijing Eyecool Technology Co Ltd
Priority date: 2016-09-23
Filing date: 2016-09-23
Publication date: 2018-04-03

Abstract

The present invention, which provides a kind of method for detecting human face and device, this method, to be included：Image to be detected is inputted into the first convolutional neural networks, the first convolutional neural networks detect the human face region in image to be detected, and the first face candidate window is marked in the human face region detected；The scale of the face candidate windows of M 1 is adjusted to input M convolutional neural networks after the scale of M convolutional neural networks, M convolutional neural networks detect the human face region of the face candidate windows of M 1, M face candidate windows are marked in the human face region detected, M=2,3 ..., N, N >=3；N face candidates window and the face candidate windows of N 1 are merged into by the selected window of face using the method for global non-maxima suppression；Wherein, the first scale concatenated in order from small to large to N convolutional neural networks according to convolutional neural networks.The present invention also provides another method for detecting human face and device.This method and device precisely can efficiently detect human face region.

Description

A kind of method for detecting human face and device

Technical field

The present invention relates to human face detection tech field, more particularly to a kind of method for detecting human face and device.

Background technology

Embodied with the rapid development of artificial intelligence and information technology, the problem of man-machine interaction, information security etc. The importance of computer vision.Interpersonal " communication " using the interactive simulation between computer and user, this is Through developing the key issue for being badly in need of solving as technology.Face datection is a kind of basic fundamental therein, and in man-machine interaction The committed step of many human face analysis problems.

Face datection is for providing any piece image, judging whether it contains face using detection algorithm, if so, sentencing The information such as position, size and posture of face in disconnected image.Compared to other biological characteristic detection techniques, there is close friend, side Just the features such as.The application of Face datection is not limited in face identification system, and in image retrieval, Video processing and monitoring Etc. also have important application value.Human face detection tech is always numerous scientific researches and the research of commercial undertaking heat in recent years Point.

Face datection easily by the colour of skin, expression, block, illumination etc. influences.Simultaneously because human face detection tech is extensive The influence that must solve above-mentioned face diversity and the complexity of polytropy and background in ground practical application is fast to improve detection Degree and precision, therefore human face detection tech receives more concerns.

The Face datection of early stage can be divided into four major class methods：Based on priori, feature based is constant, template matches, base In statistical theory.But the recognition of face of early stage is studied mainly for the facial image with compared with Condition of Strong Constraint (such as without background Image), often assume that face location is known or is readily available, Face datection is not yet taken seriously in such cases.Base It is current popular method in the method for statistical theory, it can efficiently solve the Face datection problem under complex background, Until the method that the Adaboost that Viola and Jones are proposed is cascaded, real-time face detection is realized first, makes its actual raw Application in work is provided with feasibility, such as digital camera.In recent years, the horizontal lifting of computer hardware, storage to data and Analysis ability constantly strengthens, application of the deep learning method in Face datection so that a collection of new Face datection algorithm is carried Go out.Therefore, method for detecting human face is broadly divided into the prior art：Conventional face's detection method and based on Region Proposal Deep learning algorithm.

Wherein, conventional face's detection method uses the strategy of sliding window with different chis on given image first Degree, length-width ratio travel through to entire image, the region of some candidates are selected, then to spies such as these extracted regions SIFT, HOG Sign, finally classified using the grader such as SVM, Adaboost of training.But in this method, based on sliding window Regional choice strategy does not have specific aim, and time complexity is high, produces the window of bulk redundancy.In face of the diversity of face and changeable Property and background the factor such as complexity when, the feature of engineer does not have good robustness for this kind of change.

Wherein, the deep learning algorithm based on Region Proposal, is looked in advance according to Region Proposal first The position that face is likely to occur into image, a small amount of window is chosen while high recall rate is ensured.Then convolutional Neural is used Network C NN detects to candidate region, actually the process of feature extraction and two classification.Main thought is to utilize certain side Method produces the face pre-selection window of high quality, then pre-selection window is classified.But the training step of this method is cumbersome, Space-consuming is big, and processing speed image is slower.

The content of the invention

The embodiment of the present invention provides a kind of method for detecting human face, to solve method for detecting human face efficiency of the prior art It is low, the problem of accuracy is low.

The embodiment of the present invention provides a kind of human face detection device, to solve human face detection device detection people of the prior art The problem of efficiency in face region is low, and accuracy is low.

First aspect, there is provided a kind of method for detecting human face, including：Image to be detected is inputted into the first convolutional neural networks, First convolutional neural networks detect the human face region in described image to be detected, in the human face region mark first detected Face candidate window；The scale of the M-1 face candidate windows is adjusted to input institute after the scale of M convolutional neural networks M convolutional neural networks are stated, the M convolutional neural networks detect the human face region of the M-1 face candidate windows, examined Measure human face region mark M face candidate windows, wherein, M=2,3 ..., N, N >=3；Pressed down using global non-maximum The N face candidates window and N-1 face candidate windows are merged into the selected window of face by the method for system；Wherein, it is described The first scale concatenated in order from small to large to N convolutional neural networks according to convolutional neural networks.

Second aspect, there is provided a kind of method for detecting human face, including：Image to be detected is inputted into the first convolutional neural networks, First convolutional neural networks detect the human face region in described image to be detected, in the human face region mark first detected Face candidate window；Described image to be detected is decomposed into multiple M face candidate windows, and by the M face candidate windows Mouth inputs the M convolutional neural networks, and whether the M convolutional neural networks detect each M face candidates window For face window, wherein, the scale of the M face candidate windows is the scale of the M convolutional neural networks, wherein, M= 2nd, 3 ..., N-1, N >=3；First face for being detected as face window is waited using the method for global non-maxima suppression Window is selected to merge to obtain N face candidate windows to the N-1 face candidate windows；By the N face candidate windows Scale inputs the N convolutional neural networks, the N convolutional neural networks after being adjusted to the scale of N convolutional neural networks The human face region of the N face candidate windows is detected, the selected window of face is marked in the human face region detected；Wherein, institute N convolutional neural networks are stated as convolutional neural networks largest in all convolutional neural networks.

The third aspect, there is provided a kind of human face detection device, including：First face candidate window labeling module, for that will treat Detection image inputs the first convolutional neural networks, and first convolutional neural networks detect the face area in described image to be detected Domain, the first face candidate window is marked in the human face region detected；M-1 face candidate window labeling modules, for by institute State M-1 face candidate windows scale be adjusted to the scale of M convolutional neural networks after input the M convolutional Neural nets Network, the M convolutional neural networks detect the human face region of the M-1 face candidate windows, in the human face region detected Mark M face candidate windows, wherein, M=2,3 ..., N, N >=3；The selected window labeling module of face, for using global The N face candidates window and N-1 face candidate windows are merged into the selected window of face by the method for non-maxima suppression； Wherein, the described first scale concatenated in order from small to large to N convolutional neural networks according to convolutional neural networks.

Fourth aspect, there is provided a kind of human face detection device, including：First face candidate window labeling module, for that will treat Detection image inputs the first convolutional neural networks, and first convolutional neural networks detect the face area in described image to be detected Domain, the first face candidate window is marked in the human face region detected；M face candidate window labeling modules, for by described in Image to be detected is decomposed into multiple M face candidate windows, and the M face candidates window is inputted into the M convolution god Through network, the M convolutional neural networks detect whether each M face candidates window is face window, wherein, it is described The scale of M face candidate windows be the M convolutional neural networks scale, wherein, M=2,3 ..., N-1, N >=3；The N face candidate window merging modules, described the of face window will be detected as the method using global non-maxima suppression One face candidate window merges to obtain N face candidate windows to the N-1 face candidate windows；The selected window mark of face Module, for the scale of the N face candidate windows to be adjusted to input the N after the scale of N convolutional neural networks Convolutional neural networks, the N convolutional neural networks detect the human face region of the N face candidate windows, what is detected Human face region marks the selected window of face；Wherein, the N convolutional neural networks are largest in all convolutional neural networks Convolutional neural networks.

In a kind of detection method and device of the embodiment of the present invention, first to N convolutional neural networks according to convolutional Neural The concatenated in order of the scale of network from small to large, human face region, step-sizing face are detected successively by each convolutional neural networks Candidate window, human face region can be detected in the window after screening, so as on the basis of accuracy of detection, more efficiently Detect the human face region in image to be detected.

In another kind of embodiment of the present invention detection method and device, N convolutional neural networks are all convolutional neural networks In largest convolutional neural networks, detect the face of image to be detected respectively to N-1 convolutional neural networks by first It is input to behind region, then the human face region merging treatment detected in N convolutional neural networks and detects human face region so that is defeated Enter the less omission human face region of face candidate window of N convolutional neural networks, so as on the basis of detection efficiency is ensured, More accurately detect the human face region in image to be detected.

Brief description of the drawings

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by institute in the description to the embodiment of the present invention The accompanying drawing needed to use is briefly described, it should be apparent that, drawings in the following description are only some implementations of the present invention Example, for those of ordinary skill in the art, without having to pay creative labor, can also be according to these accompanying drawings Obtain other accompanying drawings.

Fig. 1 is the flow chart of the method for detecting human face of first embodiment of the invention；

Fig. 2 is all types of window schematic diagrames during the use classification and Detection method of the embodiment of the present invention detects；

Fig. 3 is the structure chart of the multi-stage cascade network of the embodiment of the present invention；

Fig. 4 is another structure chart of the 12-net convolutional neural networks of the embodiment of the present invention；

Fig. 5 is the flow chart of the method for detecting human face of second embodiment of the invention；

Fig. 6 is a kind of structured flowchart of the human face detection device of third embodiment of the invention；

Fig. 7 is another structured flowchart of the human face detection device of third embodiment of the invention；

Fig. 8 is a kind of structured flowchart of the human face detection device of fourth embodiment of the invention；

Fig. 9 is another structured flowchart of the human face detection device of fourth embodiment of the invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is part of the embodiment of the present invention, rather than whole embodiments.Based on this hair Embodiment in bright, the every other implementation that those of ordinary skill in the art are obtained under the premise of creative work is not made Example, belongs to the scope of protection of the invention.

First embodiment

On the one hand, first embodiment of the invention provides a kind of method for detecting human face.Multiple convolution of the first embodiment In neutral net, first to N convolutional neural networks according to convolutional neural networks scale concatenated in order from small to large.Its In, small-scale convolutional neural networks are substantially carried out the extraction and screening of human face region；The convolutional neural networks of middle scale enter one Step filtering candidate window, refusal fall substantial amounts of redundancy window；Large-scale convolutional neural networks more accurately detect and marked people Face region.As shown in figure 1, the flow chart of the method for detecting human face for first embodiment of the invention.The face of the first embodiment Detection method specifically includes the steps：

Step S101：Image to be detected is inputted into the first convolutional neural networks, the detection of the first convolutional neural networks is to be detected Human face region in image, the first face candidate window is marked in the human face region detected.

Wherein, each human face region may mark at least one first face candidate window.

Due to the scale is smaller of the first convolutional neural networks, and the scale of image to be detected is relative to the first convolution nerve net Network it is larger, and the first convolutional neural networks only need preliminary screening to go out face candidate window, therefore, input the first volume Image to be detected of product neutral net can be artwork, i.e., integrally need not scale or be decomposed into multiple small images, to improve First convolutional neural networks detect the efficiency of human face region.

Preferably, in order to obtain more accurate first face candidate window, image to be detected can be decomposed into Multiple small images.The scale of each small image is the scale of the first convolutional neural networks.The plurality of small image is by mapping to be checked Obtained as splitting, multiple small images combine to obtain the image to be detected.

Preferably, first convolutional neural networks are 12-net convolutional neural networks, i.e., scale is 12 × 12.

Step S102：The scale of M-1 face candidate windows is adjusted to input after the scale of M convolutional neural networks M convolutional neural networks, M convolutional neural networks detect the human face region of M-1 face candidate windows, in the face area detected Domain marks M face candidate windows.

Wherein, M=2,3 ..., N, N >=3.Each human face region may mark at least one M face candidate windows.

" adjustment " in the step refers to：The scale of M-1 face candidate windows is integrally scaled M convolutional Neurals The scale of network.Because the scale of the characteristic pattern of last layer of generation of the network structure of M convolutional neural networks is 3 × 3, because This, it is last defeated if the scale for inputting the face candidate window of the convolutional neural networks is more than the scale of the convolutional neural networks The scale of the characteristic pattern gone out is more than 3 × 3 scale, and when this feature figure is connected to the fc layers that scale is 128 × 1 entirely, some are special Sign is not utilized；If the scale for inputting the face candidate window of the convolutional neural networks is less than the rule of the convolutional neural networks Mould, then the face candidate window may be next less than the convolutional neural networks after the processing by the convolutional neural networks The scale of layer, it is, for example, less than 3 × 3 scale so that the convolutional neural networks can not detect the face candidate window again.Therefore, Preferably, need first integrally to be scaled the scale of face candidate window the scale of the convolutional neural networks in this step.

Preferably, N=3, then the convolutional neural networks of the cascade are that three-layer coil accumulates neutral net.It is furthermore preferred that volume Two Product neutral net is 24-net convolutional neural networks, i.e., scale is 24 × 24；3rd convolutional neural networks are 48-net convolution god Through network, i.e. scale is 48 × 48.

Step S103：Using the method for global non-maxima suppression by N face candidates window and N-1 face candidate windows Mouth merges into the selected window of face.

Wherein, the N face candidates window of merging and N-1 face candidates window are all corresponding same human face regions Face candidate window.The window handled by global non-maxima suppression method can be the window of different scales.The overall situation is non- The method that maximum suppresses is specific as follows：

1st, the upper left corner and bottom right angular coordinate and the score of N face candidates window and N-1 face candidate windows are stored, And the area of each N face candidates window and N-1 face candidate windows is calculated, and will not have repressed N face candidates Window and N-1 face candidates window are by classification score sequence.The classification score shows the N face candidates window and N-1 Face candidate window is the probability of human face region, and score is higher, and probability is higher.

2nd, the window on the basis of the N face candidates window or N-1 face candidate windows of highest scoring, by remaining window Mouth asks Duplication, the intersecting area/less window of Duplication=two window with benchmark window successively from low to high by score Area.

3rd, Duplication threshold value is set, when Duplication is more than the threshold value, suppresses the low split window, repeat step 1 is not until having Having can repressed window.

4th, handle the N face candidates window of different scales and N-1 face candidate windows include problem, will wrap completely The wicket for being contained in big window suppresses.

Preferably, the method for detection human face region is in step S101 and step S102：Classification and Detection method or thermal image Detection method.If the method for detection human face region is classification and Detection method, the first face candidate window to N-1 face candidate windows It is to be returned and the window after the processing of non-maxima suppression method by window border.N face candidates window is to pass through window Window after the processing of frame homing method.

Wherein, classification and Detection method is that detection zone is divided into human face region and non-face area by convolutional neural networks Domain, the correspondence position that the human face region of detection is reverted in image to be detected, and available rectangular window mark human face region.

The preliminary annotation window obtained using this kind of method has certain deviation relative to the actual position of human face region, because This to the position of the preliminary annotation window, it is necessary to be finely adjusted so that the recurrence window obtained after fine setting is closer to human face region Actual position.The actual position of the face is typically also represented with rectangular window.The process of the fine setting can pass through frame window The mode that mouth returns is realized.

Wherein, the method that window border returns is specific as follows：

Represent rectangular window using four dimensional vectors (x, y, w, h), four dimensional vector represent respectively window center coordinate and It is wide high.As shown in Fig. 2 dotted line frame P represents preliminary annotation window, solid box G represents actual position window, it can be seen that preliminary mark Certain error be present with actual position window in note window.The target that window border returns is to find a kind of mapping relations from preliminary Annotation window P to actual position window G is closer returns window G '.I.e. given (P_x,P_y,P_w,P_h), mapping f is found, is made Obtain f (P_x,P_y,P_w,P_h)=(G '_x,G’_y,G’_w,G’_h), and (G '_x,G’_y,G’_w,G’_h)≈(G_x,G_y,G_w,G_h)。

It is first to translate by the preliminary annotation window P simple thinkings for being transformed to return window G ', then scaling：

1st, first consider Scale invariant translation transformation (Δ x, Δ y), wherein, Δ x=P_wd_x(P), Δ y=P_hd_y(P), Then have

G'_x=P_wd_x(P)+P_x(1),

G'_y=P_hd_y(P)+P_y (2)。

Then the scaling conversion (S of log space is considered_w,S_h), i.e. S_w=P_wd_w(P), S_h=P_hd_h(P), then have

G'_w=P_wexp(d_w(P)) (3),

G'_h=P_hexp(d_h(P)) (4)。

Understood by formula (1)~(4), it is necessary to which what is learnt is four conversion：d_x(P), d_y(P), d_w(P), d_h(P).Originally When walking annotation window and actual position relatively, it could be used for training the linear regression model (LRM) for the window's position to carry out window Fine setting, selection is IoU with actual position window>0.6 window.

Linear regression is directed to input vector x feature, learns one group of parameter w so that the result after recurrence is close to very Real value, i.e. Y (x)=w^Tφ(x).Frame return input be actually by the feature after network before full articulamentum to The characteristic vector as preliminary annotation window is measured, is obtained after a series of nonlinear transformations for representing to pass through in network with φ (x) Feature.And real translational movement (t_x,t_y) and scaling (t_w,t_h) it is by actual position window G and preliminary annotation window P What parameter obtained：

t_x=(G_x-P_x)/P_w(5),

t_y=(G_y-P_y)/P_h(6),

t_w=log (G_w/P_w) (7),

t_h=log (G_h/P_h) (8)。

So object function is expressed as d* (P)=w_* ^Tφ (P), wherein w_*=(x, y, w, h) ' is parameter to be learned, φ (P) is the characteristic vector for inputting preliminary annotation window P, d_*(P) predicted value (* expression x, y, w, h, the every kind of conversion to obtain A corresponding object function), loss function is：

Function optimization target is：

W is can obtain with gradient descent method or least square method_*。

The parameter w obtained according to study_*, for test image, network extraction feature φ (P) is first passed around, output is Prediction change d_*(P)=w_* ^Tφ (P), the recurrence window G ' close to actual value is calculated according to formula (1)~(4).

By above-mentioned process, N face candidate windows are obtained.

Detected by convolutional neural networks and a human face region after window border recurrence processing might have Multiple windows, then for the first face candidate window to N-1 face candidate windows, also need by non-maxima suppression (NMS) Recurrence window of the method to multiple same human face regions, which is handled, to reduce the quantity of window, finds optimal Face datection area Domain.The essence of non-maxima suppression method is search local maximum, suppresses non-maximum element.Preferably, by non-very big The window for being worth suppressing method processing is the window of same scale.The specific process of non-maxima suppression method is as follows：

1st, each upper left corner for returning window of storage and bottom right angular coordinate and score, and calculate each face for returning window Product, and will not have repressed recurrence window by classification score sequence.The classification score shows the recurrence window for human face region Probability, score is higher, and probability is higher.

2nd, the window on the basis of the recurrence window of highest scoring, by remaining recurrence window by score from low to high successively with base Quasi- window seeks Duplication.

Wherein, intersecting area/group window area of Duplication=two window.

3rd, set Duplication threshold value, when Duplication is more than the threshold value, suppress the low split window, repeat step 1 and 2 until Without the repressed window of energy.

By above-mentioned process, the first face candidate window is finally given to N-1 face windows.

Wherein, thermal image detection method uses full convolutional network by each convolutional neural networks, and it detects human face region and obtained To the visualization result of characteristic pattern response that obtains for each convolutional neural networks deconvolution of face thermal image.The visualization knot Fruit can correspond to no color by color reaction, the different regions for the probability of face.According to the color of face thermal image Difference can know the probability that the region is face, pass through face candidate window so as to which probability to be more than to the region of certain threshold value Mark.The high region of probability can be found by thermal image detection method and be used as face candidate window, without being returned again by window border The method with non-maxima suppression is returned to handle window.Thermal image detection method is larger to be checked particularly suitable for detecting Altimetric image.

Exemplified by being followed successively by 12-net, 24-net and 48-net three-level convolutional neural networks, illustrate the convolutional Neural net The process of network classification and Detection.As shown in figure 3, the@in figure represent quantity can sets itself reasonable value, fc represents full connection.Input First convolution neural network 1 2-net image to be detected obtains by the convolution conv layers that convolution kernel size is 3 × 3 twice in succession To the characteristic pattern (obtaining the characteristic pattern of 10 × 10 sizes by 3 × 3 convolution in figure) of 8 × 8 sizes, then pass through 2 × 2 step-lengths For 2 maximum pond max-p (max-pooling) layer, obtain the characteristic pattern of 4 × 4 sizes, then by 4 × 4 convolution conv layers 16 dimensional feature vectors are obtained, the 12-net is that a full convolutional network (replaces full articulamentum with convolutional layer, can save amount of calculation Candidate window position is obtained with convenient).This feature vector is followed by two parallel layer：Classify layer cls and recurrence layer bbox reg (bounding-box regression).Classify layer output is bivector, and whether the region for representing detection is the general of face Rate.It is to return progress fine position by window border to the preliminary annotation window of human face region to obtain returning window to return layer, Output is four dimensional vectors that can represent a recurrence window being made up of center point coordinate and wide height.Input the second convolution The scale of neutral net 24-net the first face candidate window is 24 × 24, is obtained by the convolution conv layers of continuous two 3 × 3 To 20 × 20 characteristic pattern, again by the maximum pond max-p layers that 3 × 3 step-lengths are 2,9 × 9 characteristic pattern is obtained, passes through 2 again Individual 3 × 3 convolution conv layers obtain 5 × 5 characteristic pattern, then obtain 3 × 3 by the maximum pond max-p layers that 3 × 3 step-lengths are 1 Characteristic pattern, the characteristic pattern of preceding layer is stretched as the characteristic vector of 128 dimensions, is followed by classify layer and recurrence by last full articulamentum Layer.The recurrence layer exports the recurrence window of the second convolutional neural networks.Input the 3rd convolutional neural networks 48-net the second people The scale of face candidate window is 48 × 48, and 4 × 4 sizes are obtained by the alternating of 3 groups of convolution conv layers and maximum pond max-p layers Characteristic pattern, obtain the characteristic pattern of 3 × 3 sizes by the convolution conv layers of one 2 × 2 again, full articulamentum is drawn into 256 The characteristic vector of dimension, it is followed by classify layer and recurrence layer.The recurrence layer exports the recurrence window of the 3rd convolutional neural networks.

Preferably, in order that forward calculation is more convenient, the core length and step-length of pond pooling layers is can adjust, makes 12-net Convolution conv and pond pooling operation formed by the core of 3x3 sizes, network structure can use the network knot that Fig. 4 is represented Structure is replaced.

Preferably, before step S101, the method for the first embodiment can also include the steps：

Train each convolutional neural networks.

Wherein, training the building mode in the sample training storehouse of each convolutional neural networks includes：

For positive sample, the positive sample in the sample training storehouse of each convolutional neural networks can be identical.Specifically, by people Positive sample of the face image as the sample training storehouse of each convolutional neural networks.

For negative sample, the negative sample in the sample training storehouse of each convolutional neural networks is had any different.Specifically, for first Convolutional neural networks, by the IoU that frame is demarcated with standard faces of random cropping<0.3 image is as the first convolutional neural networks Sample training storehouse negative sample.Wherein, IoU (intersection over union), which represents two, overlapping bounding box, The ratio of common factor area and union area.

M-1 convolutional neural networks are detected as to negative sample of the negative sample image as M convolutional neural networks of positive sample This.

For example, the three-level convolutional neural networks for being followed successively by 12-net, 24-net and 48-net, if a negative sample Image (i.e. inhuman face image) is detected as positive sample in 12-net convolutional neural networks, then the negative sample image is as training Negative sample in 24-net sample training storehouse.Similarly, if negative sample image (i.e. inhuman face image) is in 24- one by one Positive sample is detected as in net convolutional neural networks, then sample of the negative sample image as training 48-net convolutional neural networks Negative sample in this training storehouse.

The sample training storehouse of each convolutional neural networks is built by above-mentioned mode, the every one-level convolution for obtaining training Neutral net is more accurate.

Image to be detected construction image pyramid is obtained to the image of multiple yardsticks.

For example, image to be detected to be pressed to 0.3,0.5,0.6,0.8 and 1 scale smaller, 5 grades of image pyramids are formed.

The image of different scales can be obtained by constructing image pyramid, is advantageous to the multiple dimensioned inspection to human face region Survey.Further, since the scale of the sample used in the step of training is limited, if only detected during human face region is detected Image to be detected of one scale, it is likely that the scale of the image to be detected not in the range of the scale of the sample of training, Then the testing result of the convolutional neural networks is not relatively accurate, therefore, it is also preferred that obtaining difference using construction image pyramid Image to be detected of scale, so as to improve the scale of the image to be detected and the probability of the Size Match of the sample of training, from And it can further improve accuracy of detection.

After image pyramid is constructed, the human face region of the first convolutional neural networks detection reverts to treating for original scale In detection image, and reduced according to the ratio of the image pyramid of construction.The human face region reduction of M convolutional neural networks detection Into image to be detected of original scale, and according to input M convolutional neural networks M-1 face candidate windows scale with The ratio reduction of M convolutional neural networks.

For example, the scale of image to be detected is 120 × 120, construction image pyramid obtains the mapping to be checked of 5 yardsticks Picture, the first convolutional neural networks are 12-net convolutional neural networks, i.e., the scale of the first convolutional neural networks is 12 × 12, scaling Ratio is 0.3,0.5,0.6,0.8 and 1, exemplified by classification and Detection method.The 5 width figures that image to be detected construction image pyramid obtains The scale of picture is respectively then the first convolutional neural networks detection to 36 × 36,60 × 60,72 × 72,96 × 96,120 × 120 Human face region is reverted in 120 × 120 image to be detected according to above-mentioned 5 scalings of the image pyramid of construction.Example Such as, the second convolutional neural networks are 24-net convolutional neural networks, i.e., the scale of the second convolutional neural networks is 24 × 24, if defeated The scale for entering the first face candidate window of the second convolutional neural networks is 40 × 40, then the people of the second convolutional neural networks detection Face region is according to 40:24 ratio is reverted in 120 × 120 image to be detected.

To sum up, the method for detecting human face of first embodiment of the invention, first to N convolutional neural networks according to convolutional Neural The concatenated in order of the scale of network from small to large, human face region, step-sizing face are detected successively by each convolutional neural networks Candidate window, human face region can be detected in the window after screening, so as on the basis of accuracy of detection, more efficiently Detect the human face region in image to be detected.

Second embodiment

Second embodiment of the invention provides a kind of method for detecting human face.Multiple convolutional neural networks of the second embodiment In, N convolutional neural networks are convolutional neural networks largest in all convolutional neural networks.Wherein, small-scale volume Product neutral net is substantially carried out the extraction and screening of human face region；The convolutional neural networks of middle scale further filter candidate's window Mouthful, refusal falls substantial amounts of redundancy window；Large-scale convolutional neural networks more accurately detect and marked human face region.Such as Fig. 5 It is shown, it is the flow chart of the method for detecting human face of second embodiment of the invention.The method for detecting human face of the second embodiment is specific Including the steps：

Step S201：Image to be detected is inputted into the first convolutional neural networks, the detection of the first convolutional neural networks is to be detected Human face region in image, the first face candidate window is marked in the human face region detected.

Preferably, the first convolution neural network 1 2-net convolutional neural networks, i.e. scale are 12 × 12.

Step S202：Image to be detected is decomposed into multiple M face candidate windows, and M face candidate windows is defeated Enter M convolutional neural networks, M convolutional neural networks detect whether every M face candidates window is face window.

Wherein, the scale of M face candidates window be M convolutional neural networks scale, M=2,3 ..., N-1, N >= 3。

With step S201, the scale of each small image inputted in step S202 after the decomposition of M convolutional neural networks For the scale of M convolutional neural networks.Image to be detected is split to obtain by the plurality of small image, and multiple small image sets close To obtain the image to be detected.

Step S203：First face candidate window of face window will be detected as using the method for global non-maxima suppression Merge to obtain N face candidate windows to N-1 face candidate windows.

Wherein, the first face candidate window of merging is all corresponding same human face region to N-1 face candidates window Face candidate window.The window handled by global non-maxima suppression method can be the window of different scales.The overall situation The method of non-maxima suppression is specific as follows：

1st, the first face candidate window is stored to the upper left corner of N-1 face candidate windows and bottom right angular coordinate and is obtained Point, and each first face candidate window is calculated to the area of N-1 face candidate windows, and will not have repressed first face Candidate window is to N-1 face candidates window by classification score sequence.The classification score shows that the first face candidate window arrives N-1 face candidates window is the probability of human face region, and score is higher, and probability is higher.

2nd, with the first face candidate window of highest scoring to a face candidate window in N-1 face candidate windows On the basis of window, remaining window is asked into Duplication, Duplication=two window phase with benchmark window successively from low to high by score The area of friendship/less window area.

4th, the first face candidate window of different scales is handled to the problem that includes of N-1 face candidate windows, will be complete The wicket for being contained in big window suppresses.

Step S204：N is inputted after the scale of N face candidate windows is adjusted into the scale of N convolutional neural networks Convolutional neural networks, N convolutional neural networks detect the human face region of N face candidate windows, in the human face region detected Mark the selected window of face.

Preferably, step S201, the method for detection human face region is in S202, S204：Classification and Detection method or thermal image Detection method.The classification and Detection method and thermal image method are identical with the method in first embodiment, will not be repeated here.If detect face The method in region is classification and Detection method, then the first face candidate window to N-1 face candidates window and the selected window of face It is to be returned and the window after the processing of non-maxima suppression method by window border.It is furthermore preferred that pass through non-maxima suppression The window of method processing is the window of same scale.The method and non-maximum method and first embodiment that the window border returns In method it is identical, will not be repeated here.

Preferably, the method for the second embodiment also includes training each convolutional neural networks and constructs image to be detected Image pyramid obtains the step of image of multiple yardsticks, and it is identical with the method for first embodiment, will not be repeated here.

To sum up, the method for detecting human face of second embodiment of the invention, N convolutional neural networks are all convolutional neural networks In largest convolutional neural networks, detect the face of image to be detected respectively to N-1 convolutional neural networks by first It is input to behind region, then the human face region merging treatment detected in N convolutional neural networks and detects human face region so that is defeated Enter the less omission human face region of face candidate window of N convolutional neural networks, so as on the basis of detection efficiency is ensured, More accurately detect the human face region in image to be detected.

3rd embodiment

Present invention also offers a kind of human face detection device.Multiple convolutional neural networks for the human face detection device In, first to N convolutional neural networks according to convolutional neural networks scale concatenated in order from small to large.Wherein, on a small scale Convolutional neural networks be substantially carried out the extraction and screening of human face region；The convolutional neural networks of middle scale further filter candidate Window, refusal fall substantial amounts of redundancy window；Large-scale convolutional neural networks more accurately detect and marked human face region.Such as figure It is the structured flowchart of the human face detection device of third embodiment of the invention shown in 6.The human face detection device tool of the 3rd embodiment Body includes following module：

First face candidate window labeling module 301, for by image to be detected input the first convolutional neural networks, first Convolutional neural networks detect the human face region in image to be detected, and the first face candidate window is marked in the human face region detected Mouthful.

M-1 face candidate windows labeling module 302, for the scale of M-1 face candidate windows to be adjusted into the M volumes M convolutional neural networks are inputted after the scale of product neutral net, M convolutional neural networks detection M-1 face candidate windows Human face region, M face candidate windows are marked in the human face region detected.

Wherein, M=2,3 ..., N, N >=3.

The selected window labeling module 303 of face, for the method using global non-maxima suppression by N face candidate windows Mouth and N-1 face candidate windows merge into the selected window of face.

Preferably, N=3, the first convolutional neural networks are 12-net convolutional neural networks, i.e., scale is 12 × 12；Second Convolutional neural networks are 24-net convolutional neural networks, i.e., scale is 24 × 24；3rd convolutional neural networks are 48-net convolution Neutral net, i.e. scale are 48 × 48.

Preferably, the first face candidate window labeling module 301 and M-1 face candidate windows labeling module 302 detect The method of human face region is：Classification and Detection method or thermal image detection method.

Preferably, if the first face candidate window labeling module 301 and M-1 face candidate windows labeling module 302 are examined The method for surveying human face region is classification and Detection method, then the first face candidate window to N-1 face candidate windows is to pass through window Mouth frame returns and the window after the processing of non-maxima suppression method；N face candidates window is to pass through window border recurrence side Window after method processing.It is furthermore preferred that the window handled by non-maxima suppression method is the window of same scale.

Preferably, as shown in fig. 7, the human face detection device also includes：

Training module 304, for image to be detected to be inputted into the first convolutional neural networks, the detection of the first convolutional neural networks Human face region in image to be detected, before the step of human face region detected marks the first face candidate window, training Each convolutional neural networks.

Preferably, the human face detection device also includes：

Constructing module 305, for image to be detected to be inputted into the first convolutional neural networks, the detection of the first convolutional neural networks Human face region in image to be detected, before the step of human face region detected marks the first face candidate window, it will treat Detection image construction image pyramid obtains the image of multiple yardsticks.

For device embodiment, because it is substantially similar to the method for first embodiment, so the comparison of description is simple Single, related part illustrates referring to the part of this method embodiment.

To sum up, the human face detection device of third embodiment of the invention, for the device first to N convolutional neural networks According to the scale concatenated in order from small to large of convolutional neural networks, human face region is detected successively by each convolutional neural networks, Step-sizing face candidate window, human face region can be detected in the window after screening, so that on the basis of accuracy of detection, The human face region in image to be detected can more efficiently be detected.

Fourth embodiment

Present invention also offers a kind of human face detection device.Multiple convolutional neural networks for the human face detection device In, N convolutional neural networks are convolutional neural networks largest in all convolutional neural networks.Wherein, small-scale volume Product neutral net is substantially carried out the extraction and screening of human face region；The convolutional neural networks of middle scale further filter candidate's window Mouthful, refusal falls substantial amounts of redundancy window；Large-scale convolutional neural networks more accurately detect and marked human face region.Such as Fig. 8 It is shown, it is the structured flowchart of the human face detection device of fourth embodiment of the invention.The human face detection device tool of the fourth embodiment Body includes following module：

First face candidate window labeling module 401, for by image to be detected input the first convolutional neural networks, first Convolutional neural networks detect the human face region in described image to be detected, and the first face candidate is marked in the human face region detected Window.

M face candidate windows labeling module 402, for image to be detected to be decomposed into multiple M face candidate windows, And M face candidates window is inputted into M convolutional neural networks, M convolutional neural networks detect every M face candidate windows Whether it is face window.

N face candidate windows merging module 403, for face will to be detected as using the method for global non-maxima suppression First face candidate window of window merges to obtain N face candidate windows to N-1 face candidate windows.

The selected window labeling module 404 of face, for the scale of N face candidate windows to be adjusted into N convolutional Neurals N convolutional neural networks are inputted after the scale of network, N convolutional neural networks detect the human face region of N face candidate windows, The selected window of face is marked in the human face region detected.

Preferably, the first face candidate window labeling module 401, M face candidate windows labeling module 402 and face essence Select window labeling module 404 detect human face region method be：Classification and Detection method or thermal image detection method.

Preferably, if the first face candidate window labeling module 401, M face candidate windows labeling module 402 and face The method that selected window labeling module 404 detects human face region is classification and Detection method, then the first face candidate window to N-1 people Face candidate window and the selected window of face are by the window after window border recurrence and the processing of non-maxima suppression method Mouthful.It is furthermore preferred that the window handled by non-maxima suppression method is the window of same scale.

Preferably, as shown in figure 9, the human face detection device also includes：

Training module 405, for image to be detected to be inputted into the first convolutional neural networks, the detection of the first convolutional neural networks Human face region in image to be detected, before the step of human face region detected marks the first face candidate window, training Each convolutional neural networks.

Preferably, the human face detection device also includes：

Constructing module 406, for image to be detected to be inputted into the first convolutional neural networks, the detection of the first convolutional neural networks Human face region in image to be detected, before the step of human face region detected marks the first face candidate window, it will treat Detection image construction image pyramid obtains the image of multiple yardsticks.

For device embodiment, because it is substantially similar to the method for second embodiment, so the comparison of description is simple Single, related part illustrates referring to the part of this method embodiment.

To sum up, the human face detection device of fourth embodiment of the invention, the N convolutional neural networks for the device are all Largest convolutional neural networks in convolutional neural networks, detected respectively by first to N-1 convolutional neural networks to be checked It is input to after the human face region of altimetric image, then the human face region merging treatment detected in N convolutional neural networks and detects people Face region so that the less omission human face region of face candidate window of input N convolutional neural networks, so as to ensure to detect On the basis of efficiency, the human face region in image to be detected is more accurately detected.

Those of ordinary skill in the art it is to be appreciated that with reference to disclosed in the embodiment of the present invention embodiment description it is each The unit and algorithm steps of example, it can be realized with the combination of electronic hardware or computer software and electronic hardware.These Function is performed with hardware or software mode actually, application-specific and design constraint depending on technical scheme.Specialty Technical staff can realize described function using distinct methods to each specific application, but this realization should not Think beyond the scope of this invention.

It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, the corresponding process in preceding method embodiment is may be referred to, will not be repeated here.

In embodiment provided herein, it should be understood that disclosed apparatus and method, others can be passed through Mode is realized.For example, device embodiment described above is only schematical, for example, the division of the unit, is only A kind of division of logic function, can there is an other dividing mode when actually realizing, for example, multiple units or component can combine or Person is desirably integrated into another system, or some features can be ignored, or does not perform.Another, shown or discussed is mutual Between coupling or direct-coupling or communication connection can be INDIRECT COUPLING or communication link by some interfaces, device or unit Connect, can be electrical, mechanical or other forms.

The unit illustrated as separating component can be or may not be physically separate, show as unit The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple On NE.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.

In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also That unit is individually physically present, can also two or more units it is integrated in a unit.

If the function is realized in the form of SFU software functional unit and is used as independent production marketing or in use, can be with It is stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially in other words The part to be contributed to prior art or the part of the technical scheme can be embodied in the form of software product, the meter Calculation machine software product is stored in a storage medium, including some instructions are causing a computer equipment (can be People's computer, server, or network equipment etc.) perform all or part of step of each embodiment methods described of the present invention. And foregoing storage medium includes：USB flash disk, mobile hard disk, ROM, RAM, magnetic disc or CD etc. are various can be with store program codes Medium.

The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be defined by scope of the claims.

Claims

A kind of 1. method for detecting human face, it is characterised in that including：

Image to be detected is inputted into the first convolutional neural networks, first convolutional neural networks are detected in described image to be detected Human face region, mark the first face candidate window in the human face region that detects；

The scale of the M-1 face candidate windows is adjusted to input the M convolution after the scale of M convolutional neural networks Neutral net, the M convolutional neural networks detect the human face region of the M-1 face candidate windows, in the people detected Face area marking M face candidate windows, wherein, M=2,3 ..., N, N >=3；

The N face candidates window and N-1 face candidate windows are merged into using the method for global non-maxima suppression The selected window of face；

Wherein, the described first scale concatenated in order from small to large to N convolutional neural networks according to convolutional neural networks.
2. according to the method for claim 1, it is characterised in that the method for detecting human face region is：Classification and Detection method or Thermal image detection method.
3. according to the method for claim 2, it is characterised in that if the method for detection human face region is classification and Detection method, The first face candidate window to the N-1 face candidate windows be by window border return and non-maxima suppression Window after method processing；The N face candidates window is the window after being handled by window border homing method.
4. according to the method for claim 3, it is characterised in that the window handled by non-maxima suppression method is same The window of scale.
A kind of 5. method for detecting human face, it is characterised in that including：

Image to be detected is inputted into the first convolutional neural networks, first convolutional neural networks are detected in described image to be detected Human face region, mark the first face candidate window in the human face region that detects；

Described image to be detected is decomposed into multiple M face candidate windows, and by described in M face candidates window input M convolutional neural networks, the M convolutional neural networks detect whether each M face candidates window is face window, Wherein, the scale of the M face candidate windows be the M convolutional neural networks scale, M=2,3 ..., N-1, N >= 3；

The first face candidate window of face window will be detected as to described using the method for global non-maxima suppression N-1 face candidate windows merge to obtain N face candidate windows；

The scale of the N face candidate windows is adjusted to input the N convolution god after the scale of N convolutional neural networks Through network, the N convolutional neural networks detect the human face region of the N face candidate windows, in the face area detected Domain marks the selected window of face；

Wherein, the N convolutional neural networks are convolutional neural networks largest in all convolutional neural networks.
6. according to the method for claim 5, it is characterised in that the method for detecting human face region is：Classification and Detection method or Thermal image detection method.
7. according to the method for claim 6, it is characterised in that if the method for detection human face region is classification and Detection method, The first face candidate window to the N-1 face candidates window and the selected window of the face be to pass through window Frame returns and the window after the processing of non-maxima suppression method.
8. according to the method for claim 7, it is characterised in that the window handled by non-maxima suppression method is same The window of scale.
A kind of 9. human face detection device, it is characterised in that including：

First face candidate window labeling module, for image to be detected to be inputted into the first convolutional neural networks, the first volume Product neutral net detects the human face region in described image to be detected, and the first face candidate window is marked in the human face region detected Mouthful；

M-1 face candidate window labeling modules, for the scale of the M-1 face candidate windows to be adjusted into M convolution The M convolutional neural networks are inputted after the scale of neutral net, the M convolutional neural networks detect the M-1 faces The human face region of candidate window, M face candidate windows are marked in the human face region that detects, wherein, M=2,3 ..., N, N ≥3；

The selected window labeling module of face, for the method using global non-maxima suppression by the N face candidate windows The selected window of face is merged into N-1 face candidate windows；

Wherein, the described first scale concatenated in order from small to large to N convolutional neural networks according to convolutional neural networks.
A kind of 10. human face detection device, it is characterised in that including：

First face candidate window labeling module, for image to be detected to be inputted into the first convolutional neural networks, the first volume Product neutral net detects the human face region in described image to be detected, and the first face candidate window is marked in the human face region detected Mouthful；

M face candidate window labeling modules, for described image to be detected to be decomposed into multiple M face candidate windows, and The M face candidates window is inputted into the M convolutional neural networks, the M convolutional neural networks detection is each described Whether M face candidates window is face window, wherein, the scale of the M face candidate windows is the M convolutional Neurals The scale of network, M=2,3 ..., N-1, N >=3；

N face candidate window merging modules, for using the method for global non-maxima suppression face window will to be detected as The first face candidate window merges to obtain N face candidate windows to the N-1 face candidate windows；

The selected window labeling module of face, for the scale of the N face candidate windows to be adjusted into N convolutional neural networks Scale after input the N convolutional neural networks, the N convolutional neural networks detect the N face candidate windows Human face region, the selected window of face is marked in the human face region detected；

Wherein, the N convolutional neural networks are convolutional neural networks largest in all convolutional neural networks.