CN108664893A

CN108664893A - A kind of method for detecting human face and storage medium

Info

Publication number: CN108664893A
Application number: CN201810290187.1A
Authority: CN
Inventors: 黄海清; 刘智勇; 郑碎武; 杨旭; 黄志明; 谢德坤; 田�健
Original assignee: FUZHOU HAIJING TECHNOLOGY DEVELOPMENT Co Ltd
Current assignee: FUZHOU HAIJING TECHNOLOGY DEVELOPMENT Co Ltd
Priority date: 2018-04-03
Filing date: 2018-04-03
Publication date: 2018-10-16
Anticipated expiration: 2038-04-03
Also published as: CN108664893B

Abstract

A kind of method for detecting human face and storage medium, this approach includes the following steps：Step 102, the same training image of a batch is inputted respectively to light weight network and complex network；Step 104, the output for light weight network and the classification chart of complex network using difficult sample mining method as a result, be filtered；Step 106, comprehensive loss function is constructed, the comprehensive loss function includes knowledge distillation loss function or the Face datection loss function based on label, and the knowledge distillation loss function is obtained according to the output result of light weight network and the classification chart of complex network；Step 108, it is based on loss function, the parameter of light weight network is updated, does not update the parameter of complex network；Step 110, it repeats the above steps, until light weight network training is extremely restrained.A kind of neural network algorithm of fast velocity modulation ginseng is provided, solves the problems, such as that lightweight neural network carries out Face datection.

Description

A kind of method for detecting human face and storage medium

Technical field

The invention belongs to image processing and pattern recognition fields, and in particular to a kind of method for detecting human face is arrived, it can To be applied to security monitoring, the numerous areas such as human-computer interaction.

Background technology

Face datection is an important technology, it has a demand in the application of many computer visions, for example, face with Track, face alignment, recognition of face etc..In recent years, due to the development of convolutional neural networks, the performance of Face datection has obtained obviously Raising.However, the existing usual calculating speed of Face datection model is slow, because they need bigger neural network To keep preferable Face datection performance.Although also there is the detection framework based on one-step method to be proposed for the speed of acceleration detection (such as SSD, YOLO), but they are still not fast enough for practical application scene, especially in the environment based on CPU.Another party Face, if reaching rate request by reducing the parameter of convolutional network, the performance of detector if, can be decreased obviously.Therefore, it Obtain the task that a lightweight human-face detector of good performance is a great challenge.

Knowledge distillation (knowledge distillation) be it is a kind of can allow the small big network of network learning by imitation, to carry Rise the technology of small network performance.The validity of knowledge distillation is verified in classification and metric learning task.For Detection task has no idea directly to distill (knowledge distillation) technology using original knowledge, because of detection There are the unbalanced problem of classification (background classes will be far more than other classes) for the output of device, if only imitating as classification task All output is practised, then is unable to get a good performance.The detector of most of lightweight be based on one-step method, rather than Two step method, because the former has speed advantage.Compared with two step method, one-step method has lacked the region nomination for eliminating negative sample Network, thus the unbalanced problem of classification is more serious.

Invention content

For this reason, it may be necessary to which providing a kind of novel can be applicable in one-step method to promote lightweight detection model performance, fast velocity modulation The neural network algorithm of ginseng solves the problems, such as that lightweight neural network carries out Face datection.In the present invention, it inventor provides A kind of method for detecting human face, this approach includes the following steps：

Step 102, the same training image of a batch is inputted respectively to light weight network and complex network；

Step 104, the output for light weight network and the classification chart of complex network using difficult sample mining method as a result, carried out Filtering；

Step 106, comprehensive loss function is constructed, the comprehensive loss function includes knowledge distillation loss function or is based on marking The Face datection loss function of label, the knowledge distillation loss function is according to the output of light weight network and the classification chart of complex network As a result it obtains；

Step 108, it is based on loss function, the parameter of light weight network is updated, does not update the parameter of complex network；

Step 110, it repeats the above steps, until light weight network training is extremely restrained.

Preferably, the difficult sample mining method is filtered specially：

A threshold value T is set, for judging whether some probability in classification chart has enough confidence levels；T is one Hyper parameter, value range are 0 to 1, each index in classification chart are traversed, when probability of the index in light weight network is more than T and when the probability of complex network is less than T, which is added set S_m；Alternatively, working as probability of the index in light weight network Less than T when the probability of complex network is more than T, S also is added in the index_m。

Optionally, the knowledge distillation loss function is：

Wherein, p⁽ⁱ⁾For i-th of probability score in the classification chart of complex network, q⁽ⁱ⁾It is then the of light weight network class figure I probability score.

Further,

The Face datection loss function based on label is：

L_G=L_cls+L_reg

Wherein, L_clsIt is the two class Softmax loss functions for classification, L_regIt is the robust regression loss letter for recurrence Number；

The comprehensive loss function is that knowledge distillation loss function is weighted with the Face datection loss function based on label：

L=L_G+cL_KD

C is coefficient of balance.

Specifically, further include step, structure light weight network, complex network；

The Face datection model based on convolutional neural networks is built, as complex network, its extremely convergence of training；

The Face datection model with the convolutional neural networks of frame with complex network is built, it is described light as light weight network The quantity for measuring every layer of filter in the frame of network is respectively less than complex network.

A kind of Face datection storage medium of knowledge based distillation, is stored with computer program, the computer program exists Following steps are executed when being run：

Specifically, the difficult sample mining method is filtered specially：

Optionally, the knowledge distillation loss function is：

Further,

The Face datection loss function based on label is：

L_G=L_cls+L_reg

L=L_G+cL_KD

C is coefficient of balance.

Specifically, computer program also carries out step, structure light weight network, complex network when being run；

It is different from the prior art, above-mentioned technology uses standardized calculation means, in entire evaluation procedure, introduces original The index not having in evaluation system, and unified parameters so that quantitative criteria is relatively uniform, and therefore, the present invention solves network carriage Feelings dynamically real-time problem analysis.

Description of the drawings

Fig. 1 is the method for detecting human face flow chart described in specific implementation mode.

Specific implementation mode

For the technology contents of technical solution, construction feature, the objects and the effects are described in detail, below in conjunction with specific reality It applies example and attached drawing is coordinated to be explained in detail.

In the embodiment shown in fig. 1, it may be seen that a kind of method for detecting human face, this approach includes the following steps：

Step 100, structure the Face datection model based on convolutional neural networks be used as teacher's network, training the model up to Convergence.

The frame of teacher's network is usually as student network, if but the quantity of every layer of filter is student network Dry times, therefore performance can be more preferable.We are in order that allow the complexity of conventional convolutional neural networks to simplify, therefore herein Content in, teacher's network can be replaced mutually with complex network, and student network equally can also be replaced with light weight network, light weight The characteristics of network, is that it is the Face datection model with the convolutional neural networks of frame with complex network, the light weight network The quantity of every layer of filter is respectively less than complex network in frame.The training method of teacher's network and conventional detection model complete one Sample, by taking the present invention as an example, the loss function of teacher's network is:L_G=L_cls+L_reg.Wherein, L_clsIt is two classes for classification Softmax loss functions, L_regIt is robust regression loss function (the smooth L for recurrence₁).Build light weight network, complexity Network；

Student network is exactly the detection model finally to be obtained, and is carried out now to the parameter of student network with Xavier methods Random initializtion.

And in some other specific embodiment, the present invention can carry out above-mentioned preparation process in advance and directly start In step 102, the same training image of a batch is inputted respectively to light weight network and complex network；

Training image can be not processed, and can also carry out data augmentation technology herein, specific as follows：

For every training image of input, using data augmentation technology, to increase the Generalization Capability of model.With this hair For bright, data augmentation includes the following steps：

(1) colour dither operates：Respectively with 0.5 probability, the randomly brightness of adjusting training image, contrast, saturation The parameters such as degree.

(2) random cropping operates：On this training image, 5 square subgraphs are randomly cut out.Wherein 1 It is its maximum square subgraph, in addition the length of side of 4 square subgraphs is 0.3~1.0 times of training image short side. In this 5 square subgraphs, 1 is randomly chosen as final training sample.

(3) flip horizontal operates：For the training sample that this chooses, flip horizontal is randomly carried out with 0.5 probability Operation.

(4) change of scale operates：The training sample obtained by aforesaid operations, 1024 × 1024 sizes are zoomed to, are sent Enter network for training.

Step 104, the output for light weight network and the classification chart (classification map) of complex network as a result, It is filtered using difficult sample mining method；To solve the problems, such as the unbalanced problem low with Fitting efficiency of classification.

Knowledge distillation (knowledge distillation) method be desirable to student network by imitate teacher's network, from And the result as teacher's network is exported as much as possible.In neural network, the information of layer more rearward and final prediction knot Fruit association is closer, can provide better supervision message for learning by imitation.Therefore, last layer, which compares, is adapted to allow student E-learning is imitated.In the Face datection frame based on single -step method, there are two modules for last layer, are classification chart respectively (classification map) and regression figure (regression map).Why effective knowledge distillation is, is that it provides old The soft label information that teacher's e-learning arrives is to student network, and for these soft labels compared with original mark label, information is more accurate It is really and smooth, thus more conducively e-learning.In Face datection, the mark label for returning frame is exactly originally real number, has been compared It is more accurate；And the mark label of classification task only has 0 and 1, is not very accurate.Therefore, classification chart is particularly suited for knowledge Distill (knowledge distillation) study.

One typically the classification chart based on single -step method (classification map) output size be 2N × H × W, Wherein N is the number of anchor point frame, and 2 indicate that each anchor point frame is required for predicting that the probability of positive class and negative class, H are the height of classification chart, W For its width.Because the probability of positive class and negative class is by standardization, addition is always 1, can only be closed when carrying out knowledge distillation The probability of positive class is noted, therefore the output of classification chart can be reduced to N × H × W.In the training process, teacher's network and student Network can export respectively classification chart as a result, for the two as a result, it is desirable to determine classification chart (classification Map which index) should be filtered, which can be used for knowledge distillation (knowledge distillation) 's.

Step 106 is then also carried out, constructs comprehensive loss function, the comprehensive loss function includes knowledge distillation loss letter Number or the Face datection loss function based on label, the knowledge distillation loss function is according to point of light weight network and complex network The output result of class figure obtains.In some optional embodiments, (knowledge is distilled by constructing knowledge Distillation) loss function so that result is as possible for the classification chart (classification map) of current student network It is drawn close to the classification chart of teacher's network, as a specific embodiment, the knowledge distillation loss function is：

Further, in the training process, in addition to knowledge distills (knowledge distillation) loss function, also There are the conventional Face datection loss function based on label, the area in the loss function and classics detection framework Faster RCNN Domain referral networks are consistent：

The Face datection loss function based on label is：

L_G=L_cls+L_reg

Wherein, L_clsIt is the two class Softmax loss functions for classification, L_regIt is the robust regression loss letter for recurrence Number；When training, knowledge based distills the loss function of (knowledge distillation) and the loss function based on label It is added, forms final comprehensive loss function.

The comprehensive loss function be knowledge distillation loss function with：

L=L_G+cL_KD

C is the coefficient for balancing two loss functions, is fixed as 50 in the present invention, optimal value should be by concrete scene It determines.

As shown in Figure 1, then also carrying out step 108, it is based on loss function, updates the parameter of light weight network, not more the old and new The parameter of teacher's network.In this step, according to obtained comprehensive loss function, using back-propagation algorithm, to student The parameter of network is updated, to complete primary training.The parameter of teacher's network need not update, therefore be needed in training Freezed.

Step 110, repeat the above steps 102-108, until light weight network training is extremely restrained.

The present invention can effectively promote the precision of lightweight human-face detector by the design of application above-mentioned steps so that Face datection can also obtain satisfactory detection result in the equipment that computing resource is limited.Due to detection model and classification with And there are the differences in network structure for metric learning model, so can not (knowledge directly be distilled knowledge Distillation) method is directly used in Detection task.The inventors found that the Face datection model based on single -step method In regression figure and without enough effective informations for learning, and classification chart (classification map) can then carry It is used as student network and teacher's network migration for effective soft label information, therefore by classification chart (classification map) The medium of knowledge.In addition, the output result of classification chart (classification map) has largely negative class sample, lead to class Not unbalanced problem.The present invention proposes that a kind of difficult sample method for digging is used to filter simple negative sample so that classification reaches equal Weighing apparatus, while also having filtered simple positive sample so that knowledge distills the more efficient of (knowledge distillation).Instruction When practicing, knowledge based distills the loss function of (knowledge distillation) and based on the loss function of label with appropriate Ratio be added, constitute complete loss function.In specific implement, the method for the present invention also carries out following steps, test Image inputs trained student network model, exports testing result frame.Since the quantity of the detection block of output is very more, need They are screened with merging.In the present embodiment, most detection is screened out by confidence threshold value T=0.05 first Frame selects preceding N then according to confidence level_a=400 detection blocks.Then the detection block for using non-maxima suppression removal to repeat, And preceding N is selected according to confidence level_b=200 detection blocks are to get to final testing result.

Finally, the training method of knowledge based distillation (knowledge distillation) proposed by the present invention can be effective Improve the detectability of lightweight Face datection model in ground.

Specifically, the difficult sample mining method is filtered specially：

Optionally, the knowledge distillation loss function is：

Further,

The Face datection loss function based on label is：

L_G=L_cls+L_reg

The comprehensive loss function be knowledge distillation loss function with：

L=L_G+cL_KD

C is coefficient of balance.

The Face datection model with the convolutional neural networks of frame with Complex Neural Network is built, as light weight network, institute The quantity for stating every layer of filter in the frame of light weight network is respectively less than complex network.

It should be noted that herein, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that process, method, article or terminal device including a series of elements include not only those Element, but also include other elements that are not explicitly listed, or further include for this process, method, article or end The intrinsic element of end equipment.In the absence of more restrictions, being limited by sentence " including ... " or " including ... " Element, it is not excluded that there is also other elements in process, method, article or the terminal device including the element.This Outside, herein, " being more than ", " being less than ", " being more than " etc. are interpreted as not including this number；" more than ", " following ", " within " etc. understandings It includes this number to be.

It should be understood by those skilled in the art that, the various embodiments described above can be provided as method, apparatus or computer program production Product.Complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in these embodiments Form.All or part of step in the method that the various embodiments described above are related to can be instructed by program relevant hardware come It completes, the program can be stored in the storage medium that computer equipment can be read, for executing the various embodiments described above side All or part of step described in method.The computer equipment, including but not limited to：Personal computer, server, general-purpose computations It is machine, special purpose computer, the network equipment, embedded device, programmable device, intelligent mobile terminal, smart home device, wearable Smart machine, vehicle intelligent equipment etc.；The storage medium, including but not limited to：RAM, ROM, magnetic disc, tape, CD, sudden strain of a muscle It deposits, USB flash disk, mobile hard disk, storage card, memory stick, webserver storage, network cloud storage etc..

The various embodiments described above are with reference to method, equipment (system) and the computer program product according to embodiment Flowchart and/or the block diagram describes.It should be understood that can be realized by computer program instructions every in flowchart and/or the block diagram The combination of flow and/or box in one flow and/or box and flowchart and/or the block diagram.These computers can be provided Program instruction is to the processor of computer equipment to generate a machine so that the finger executed by the processor of computer equipment It enables and generates to specify in one flow of flow chart or multiple flows and/or one box of block diagram or multiple boxes The device of function.

These computer program instructions, which may also be stored in, can guide computer equipment computer operate in a specific manner to set In standby readable memory so that the instruction generation being stored in the computer equipment readable memory includes the manufacture of command device Product, command device realization refer in one flow of flow chart or multiple flows and/or one box of block diagram or multiple boxes Fixed function.

These computer program instructions can be also loaded on computer equipment so that be executed on a computing device a series of To generate computer implemented processing, the instruction to execute on a computing device is provided for realizing in flow operating procedure The step of function of being specified in one flow of figure or multiple flows and/or one box of block diagram or multiple boxes.

Although the various embodiments described above are described, once a person skilled in the art knows basic wounds The property made concept, then additional changes and modifications can be made to these embodiments, so example the above is only the implementation of the present invention, It is not intended to limit the scope of patent protection of the present invention, it is every to utilize equivalent structure made by description of the invention and accompanying drawing content Or equivalent process transformation, it is applied directly or indirectly in other relevant technical fields, the patent for being similarly included in the present invention Within protection domain.

Claims

1. a kind of method for detecting human face, which is characterized in that this approach includes the following steps：

Step 104, the output for light weight network and the classification chart of complex network using difficult sample mining method as a result, carried out Filter；

Step 106, comprehensive loss function is constructed, the comprehensive loss function includes knowledge distillation loss function or based on label Face datection loss function, the knowledge distillation loss function is according to the output result of light weight network and the classification chart of complex network It obtains；

2. method for detecting human face according to claim 1, which is characterized in that the hardly possible sample mining method is filtered specifically For：

A threshold value T is set, for judging whether some probability in classification chart has enough confidence levels；T is a super ginseng Number, value range is 0 to 1, traverse classification chart in each index, when probability of the index in light weight network more than T and When the probability of complex network is less than T, which is added set S_m；Alternatively, when probability of the index in light weight network is small In T when the probability of complex network is more than T, S also is added in the index_m。

3. method for detecting human face according to claim 2, which is characterized in that

The knowledge distillation loss function is：

Wherein, p⁽ⁱ⁾For i-th of probability score in the classification chart of complex network, q⁽ⁱ⁾It is then i-th of light weight network class figure Probability score.

4. method for detecting human face according to claim 2 or 3, which is characterized in that

The Face datection loss function based on label is：

L_G=L_cls+L_reg

Wherein, L_clsIt is the two class Softmax loss functions for classification, L_regIt is the robust regression loss function for recurrence；

L=L_G+cL_KD

C is coefficient of balance.

5. method for detecting human face according to claim 1, which is characterized in that further include step, structure light weight network, complexity Network；

Structure is with complex network with the Face datection model of the convolutional neural networks of frame, as light weight network, the light weight net The quantity of every layer of filter is respectively less than complex network in the frame of network.

6. a kind of Face datection storage medium, which is characterized in that be stored with computer program, the computer program is being run Shi Zhihang following steps：

7. Face datection storage medium according to claim 6, which is characterized in that the hardly possible sample mining method is filtered Specially：

8. Face datection storage medium according to claim 7, which is characterized in that

The knowledge distillation loss function is：

9. Face datection storage medium according to claim 7 or 8, which is characterized in that

The Face datection loss function based on label is：

L_G=L_cls+L_reg

L=L_G+cL_KD

C is coefficient of balance.

10. Face datection storage medium according to claim 6, which is characterized in that computer program is gone back when being run Carry out step, structure light weight network, complex network；