CN108399373B

CN108399373B - The model training and its detection method and device of face key point

Info

Publication number: CN108399373B
Application number: CN201810118211.3A
Authority: CN
Inventors: 李宣平
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2018-02-06
Filing date: 2018-02-06
Publication date: 2019-05-10
Anticipated expiration: 2038-02-06
Also published as: CN108399373A

Abstract

The embodiment of the invention provides the model training of face key point and its detection method and device, which includes: that human face data is extracted from training image；The human face data is input to the first order network to be trained, the first order network is used to export the prediction coordinate of face key point；When the first order network training is completed, in the human face data, it is based on the prediction Coordinate generation target data；The target data is input to the second level network to be trained, the second level network is used to export the coordinate shift value of the face key point；When the second level network training is completed, determine that the cascade network is face critical point detection model.Learn prediction coordinate and coordinate shift value, under complex scene, the coordinate of still available accurate face key point by two-level network.

Description

The model training and its detection method and device of face key point

Technical field

The present invention relates to the technical fields of computer disposal, model training and its detection more particularly to face key point Method and apparatus.

Background technique

Face critical point detection is one of the basic technology in facial image research, it is therefore an objective to automatically estimate face picture The coordinate of upper facial feature points, for example, face mask coordinate, face coordinate etc., are widely used in recognition of face, posture is estimated Meter, face filter, make up U.S. face, three-dimensional modeling etc..

In existing face critical point detection technology, traditional method includes being based on shape constraining method, is based on cascading back The method returned, classical model have active shape model (Active Shape Models, ASM)) and cascade regression model (Cascaded pose regression, CPR) etc..

But traditional method robustness is poor, under complex scene, the detection accuracy of face key point is lower.

Summary of the invention

The embodiment of the present invention proposes the model training and its detection method and device of face key point, to solve in complexity Under scene, the lower problem of the detection accuracy of face key point.

According to one aspect of the present invention, a kind of training of face critical point detection model based on cascade network is provided Method, the cascade network include first order network and second level network, which comprises

Human face data is extracted from training image；

The human face data is input to the first order network to be trained, the first order network is for exporting face The prediction coordinate of key point；

When the first order network training is completed, in the human face data, it is based on the prediction Coordinate generation target Data；

The target data is input to the second level network to be trained, the second level network is described for exporting The coordinate shift value of face key point；

When the second level network training is completed, determine that the cascade network is face critical point detection model.

It is optionally, described the human face data is input to the first order network to be trained, comprising:

The human face data is input to the first order network to handle, exports the prediction coordinate of face key point；

First-loss value is calculated using the prediction coordinate；

Judge whether the first order network restrains according to the first-loss value；

If so, determining that the first order network training is completed；

If it is not, then adjusting the first order network according to the first-loss value, it is described by the face number to return to execution It is handled according to the first order network is input to, exports the original coordinates of face key point.

It is optionally, described that first-loss value is calculated using the prediction coordinate, comprising:

Calculate the first distance between the prediction coordinate and true coordinate；

The average value for calculating the first distance, as first-loss value.

Optionally, it is described in the human face data, be based on the prediction Coordinate generation target data, comprising:

Partial image data is extracted in the human face data, based on the prediction coordinate；

The corresponding partial image data of multiple face key points is combined into data matrix according to color group, as target Data.

It is optionally, described the target data is input to the second level network to be trained, comprising:

The target data is input to the second level network to handle, the coordinate for exporting the face key point is inclined Shifting value；

Second penalty values are calculated using the coordinate shift value；

Judge whether the second level network restrains according to second penalty values；

If so, determining that the second level network training is completed；

If it is not, then adjusting the second level network according to second penalty values, it is described by the number of targets to return to execution It is handled according to the second level network is input to, exports the coordinate shift value of the face key point.

It is optionally, described that second penalty values are calculated using the coordinate shift value, comprising:

Calculate the second distance between the prediction coordinate and the offset coordinates；

The average value for calculating the second distance, as the second penalty values.

Optionally, further includes:

Data enhancing processing is carried out to the human face data；

Wherein, the data enhancing processing includes at least one following:

Increase noise data, cutting and restores, translation processing, increases contrast.

According to another aspect of the present invention, a kind of face critical point detection based on face critical point detection model is provided Method, the face critical point detection model include first order network and second level network, which comprises

Human face data is extracted from target image；

In the human face data, it is based on the prediction Coordinate generation target data；

The coordinate shift value is added on the basis of the prediction coordinate, the target for obtaining the face key point is sat Mark.

According to another aspect of the present invention, a kind of training of face critical point detection model based on cascade network is provided Device, the cascade network include first order network and second level network, and described device includes:

Human face data extraction module, for extracting human face data from training image；

First order network training module is trained, institute for the human face data to be input to the first order network First order network is stated for exporting the prediction coordinate of face key point；

Target data generation module, in the human face data, being based on when the first order network training is completed The prediction Coordinate generation target data；

Second level network training module is trained, institute for the target data to be input to the second level network Second level network is stated for exporting the coordinate shift value of the face key point；

Model determining module, for when the second level network training is completed, determining the cascade network for face pass Key point detection model.

Optionally, the first order network training module includes:

Human face data input submodule is handled for the human face data to be input to the first order network, defeated The prediction coordinate of face key point out；

First-loss value computational submodule, for calculating first-loss value using the prediction coordinate；

First order network convergence judging submodule, for whether judging the first order network according to the first-loss value Convergence；If so, calling first order network to complete submodule, if it is not, then calling first order network adjusting submodule；

First order network completes submodule, for determining that the first order network training is completed；

First order network adjusting submodule is returned and is adjusted for adjusting the first order network according to the first-loss value With the human face data input submodule.

Optionally, the first-loss value computational submodule includes:

First distance computing unit, for calculating the first distance between the prediction coordinate and true coordinate；

First average calculation unit, for calculating the average value of the first distance, as first-loss value.

Optionally, the target data generation module includes:

Partial image data extracting sub-module, for extracting part in the human face data, based on the prediction coordinate Image data；

Matrix group zygote module, for combining the corresponding partial image data of multiple face key points according to color For data matrix, as target data.

Optionally, the second level network training module includes:

Target data input submodule is handled for the target data to be input to the second level network, defeated The coordinate shift value of the face key point out；

Second penalty values computational submodule, for calculating the second penalty values using the coordinate shift value；

Second level network convergence judging submodule, for whether judging the second level network according to second penalty values Convergence；If so, calling second level network to complete submodule, if it is not, then calling second level network adjusting submodule；

Second level network completes submodule, for determining that the second level network training is completed；

Second level network adjusting submodule is returned and is adjusted for adjusting the second level network according to second penalty values With the target data input submodule.

Optionally, the second penalty values computational submodule includes:

Second distance computing unit, for calculating the second distance between the prediction coordinate and the offset coordinates；

Second average calculation unit, for calculating the average value of the second distance, as the second penalty values.

Optionally, further includes:

Data enhance processing module, for carrying out data enhancing processing to the human face data；

Wherein, the data enhancing processing includes at least one following:

According to another aspect of the present invention, a kind of face critical point detection based on face critical point detection model is provided Device, the face critical point detection model includes first order network and second level network, and described device includes:

Human face data extraction module, for extracting human face data from target image；

First order network process module is handled for the human face data to be input to the first order network, defeated The prediction coordinate of face key point out；

Target data generation module is used in the human face data, is based on the prediction Coordinate generation target data；

Second level network process module is handled for the target data to be input to the second level network, defeated The coordinate shift value of the face key point out；

Coordinates of targets computing module obtains institute for adding the coordinate shift value on the basis of prediction coordinate State the coordinates of targets of face key point.

The embodiment of the present invention includes following advantages:

In embodiments of the present invention, cascade network includes first order network and second level network, and first order network is for defeated The prediction coordinate of face key point out, second level network is used to export the coordinate shift value of face key point, from training image It extracts human face data and is input to first order network and be trained, when first order network training is completed, in human face data, base It in prediction Coordinate generation target data and is input to second level network and is trained, when second level network training is completed, determine Cascade network is face critical point detection model, learns prediction coordinate and coordinate shift value by two-level network, in complex scene Under, the coordinate of still available accurate face key point, also, the input size of second level network is much smaller than first order net The input size of network, can reduce the time loss of second level network, be suitable for the lesser equipment of the resources such as mobile terminal into Pedestrian's face critical point detection, to improve practicability.

Detailed description of the invention

Fig. 1 is a kind of training method of face critical point detection model based on cascade network of one embodiment of the invention Step flow chart；

Fig. 2 is a kind of first order network of one embodiment of the invention and the topology example figure of second level network；

Fig. 3 is a kind of face critical point detection method based on face critical point detection model of one embodiment of the invention Step flow chart；

Fig. 4 is a kind of training device of face critical point detection model based on cascade network of one embodiment of the invention Structural block diagram；

Fig. 5 is a kind of face critical point detection device based on face critical point detection model of one embodiment of the invention Structural block diagram.

Specific embodiment

In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real Applying mode, the present invention is described in further detail.

Referring to Fig.1, a kind of face critical point detection model based on cascade network of one embodiment of the invention is shown Training method step flow chart, can specifically include following steps:

Step 101, human face data is extracted from training image.

It in embodiments of the present invention, can be using cascade network training face critical point detection model, to detect face Key point.

Cascade network includes first order network and second level network, and every grade of network can independently be run, first order net The output of network relies on the input of second level network.

First order network and second are used as with CNN (Convolutional Neural Network, convolutional neural networks) The example of grade network, CNN have multiple layers, upper one layer of the input exported as next layer.

Each layer of CNN is generally made of multiple map, and each map is made of multiple neural units, the institute of the same map There is neural unit to share a convolution kernel (i.e. weight), convolution kernel often represents a feature, for example some convolution sum represents one Section arc, then this convolution kernel is rolled on entire picture, the biggish region of convolution value is just likely to be one section of arc.

As shown in Fig. 2, in this example, first order network (CNN) has 5 preceding convolutional layers of sequence, 2 sequences exist Full articulamentum afterwards, second level network (CNN) have 3 preceding convolutional layers of sequence, 2 posterior full articulamentums of sequence.

In convolutional layer, convolutional layer is substantially a feature extraction layer, can set hyper parameter F to specify and set up how many Feature extractor (Filter) is equivalent to the moving window of a k*d size from input matrix for some Filter First character start constantly to move backward, wherein k and d is the specified window size of Filter.For window sometime, By the nonlinear transformation of neural network, the input value in this window is converted into some characteristic value, as window is constantly past After move, the corresponding characteristic value of this Filter constantly generates, and forms the feature vector of this Filter.Here it is convolutional layer pumpings Take the process of feature.Each Filter is so operated, and forms different feature extractors.

It is that the convolution kernel of n 1*1 carries out convolution to upper layer feature, then after to convolution in full articulamentum Feature is a mean value pooling.

Certainly, the structure of above-mentioned first order network and second level network is intended only as example, is implementing the embodiment of the present invention When, the structure of other first order networks and second level network can be set according to the actual situation, this is not added in the embodiment of the present invention With limitation.In addition, those skilled in the art can also be according to reality other than the structure of above-mentioned first order network and second level network Border needs the structure using other first order networks and second level network, and the embodiment of the present invention is also without restriction to this.

When using cascade network training face critical point detection model, pre-prepd training can be extracted from picture library Image is the image data comprising face in the training image.

Face datection is carried out to the training image, identifies the region where face, and the region is cut to specified size The block of pixels of (such as 10*10), as human face data.

In the concrete realization, Face datection can be carried out by following one or more modes:

1, reference template method

The template of one or several standard faces is designed first, between the sample and standard form for then calculating test acquisition Matching degree, and pass through threshold value to determine whether there are faces.

2, face rule method

Since face has certain structure distribution feature, the method for so-called face rule extracts these features and generates phase The rule answered is to judge whether test sample includes face

3, sample learning method

This method is the method for using artificial neural network in pattern-recognition, i.e., by opposite as sample sets and non-image surface The study of sample sets generates classifier

4, complexion model method

This method is to be distributed the rule of Relatively centralized in color space according to the looks colour of skin to be detected.

5, sub-face of feature method

This method be all image surface set are considered as to an image surface subspace, and based on test sample and its in subspace The distance between projection judge whether there is image surface.

In video, the scenes such as take pictures, the detection of face key point is often shaken larger, and the shake of human world key point may It is related with the detection shake of face frame, it is also related with illumination variation.

It in embodiments of the present invention,, can be right in the training process of face critical point detection model in order to reduce shake Human face data carries out data enhancing processing, and the enhancing of human face data helps to improve the robustness of face critical point detection model.

In one example, data enhancing processing includes at least one following:

1, increase noise data

Random pixel value is added in human face data, as noise (such as Gaussian noise).

2, it cuts and restores

By human face data random cropping, it is generally cut to the 80%~100% of full size, and be stretched to full size (resize), at this point, the true coordinate of face key point will also do corresponding transformation, it is ensured that the position of face key point will not be inclined It moves.

3, translation is handled

The pixel value of facial image is moved integrally, the pixel value of area of absence waits other pixel values to fill up with 0.

4, increase contrast

By in human face data, the biggish region of pixel value increases its pixel value, and the low region of pixel value reduces its pixel value.

Certainly, above-mentioned data enhancing processing is intended only as example, in implementing the embodiments of the present invention, can be according to practical feelings Other data enhancings processing is arranged in condition, and the embodiments of the present invention are not limited thereto.In addition, in addition to above-mentioned data enhancing is handled, Those skilled in the art can also be handled using other data enhancings according to actual needs, and the embodiment of the present invention is not also subject to this Limitation.

Step 102, the human face data first order network is input to be trained.

In embodiments of the present invention, first order network can be used for exporting the prediction coordinate of face key point.

As shown in Fig. 2, human face data is inputted in first order network, using human face data as training sample, to the first order Network is trained.

In one embodiment of the invention, step 102 may include following sub-step:

The human face data is input to the first order network and handled by sub-step S11, output face key point Predict coordinate.

Sub-step S12 calculates first-loss value using the prediction coordinate.

Sub-step S13 judges whether the first order network restrains according to the first-loss value；If so, executing son Step S14, if it is not, then executing sub-step S15.

Sub-step S14 determines that the first order network training is completed.

Sub-step S15 adjusts the first order network according to the first-loss value, returns and execute sub-step S11.

In embodiments of the present invention, human face data is previously provided with the true coordinate of face key point.

Human face data is inputted in first order network, is handled according to the logic of first order network, for example, such as Fig. 2 institute Show, after carrying out process of convolution in 5 convolutional layers in sequence, carries out process of convolution in 2 full articulamentums.

After first order network processes finish, the prediction coordinate of face key point is exported.

At this point, being inputted in preset loss function using the prediction coordinate of the face key point as the parameter calculated, calculate First-loss value.

In one example, the first distance between prediction coordinate and true coordinate can be calculated, first distance is calculated Average value, as first-loss value.

By taking Euclidean distance as an example, first-loss value is calculated by following formula:

Wherein, the total n of human face data (n is positive integer) a face key point, (x_1i, y_1i) it is that (i is i-th in first order network Positive integer, i≤n) a face key point prediction coordinate,For the true coordinate of i-th of face key point.

In each round iteration, judge whether first-loss value meets preset condition, the first threshold such as less than set Deng, if so, determining that first order network training is completed, otherwise, the parameter in first order network is adjusted, into next iteration, Continue to train, so that first order network under the constraint of loss function, is gradually restrained by modes such as backpropagations, until Stablize, stop iteration.

Step 103, when the first order network training is completed, in the human face data, it is based on the prediction coordinate Generate target data.

In training face critical point detection model, first order network is first trained, after first order network convergence is stablized, Gu Determine the parameter in first order network, is further continued for training second level network later.

As shown in Fig. 2, can extract partial data based on face key point in human face data, group is combined into target data, To reduce data volume.

In one embodiment of the invention, step 103 may include following sub-step:

Sub-step S21 extracts partial image data in the human face data, based on the prediction coordinate.

The partial image data is converted to color matrix, as target data by sub-step S22.

It in embodiments of the present invention, can be using the prediction coordinate of face key point as datum mark, by neighbouring a certain range Data in (such as 10*10) are cut out from human face data to be come, the partial image data as specified size (such as 10*10*3).

The corresponding partial image data of multiple face key points is combined into data matrix according to color group (to tie up based on color Spend the matrix of composition), as target data.

For example, the size of partial image data is 10*10*3, wherein 3 be GRB (RGB) three Color Channels, n The corresponding partial image data of a face key point combines, and is formed 10*10*3n data matrix.

Step 104, the target data second level network is input to be trained.

In embodiments of the present invention, second level network can be used for exporting the coordinate shift value of face key point, so-called seat Deviant is marked, can refer to that prediction coordinate deviates the degree of true coordinate.

As shown in Fig. 2, target data is inputted in the network of the second level, using target data as training sample, to the second level Network is trained.

In one embodiment of the invention, step 102 may include following sub-step:

The target data is input to the second level network and handled by sub-step S31, and it is crucial to export the face The coordinate shift value of point.

Sub-step S32 calculates the second penalty values using the coordinate shift value.

Sub-step S33 judges whether the second level network restrains according to second penalty values；If so, executing son Step S34, if it is not, then executing sub-step S35.

Sub-step S34 determines that the second level network training is completed.

Sub-step S35 adjusts the second level network according to second penalty values, returns and execute sub-step S31.

Target data is inputted in the network of the second level, is handled according to the logic of second level network, for example, such as Fig. 2 institute Show, after carrying out process of convolution in 3 convolutional layers in sequence, carries out process of convolution in 2 full articulamentums.

It, can be with residual between the prediction coordinate and true coordinate of face key point after second level network processes finish Difference, the coordinate shift value as face key point.

At this point, being inputted in preset loss function using the coordinate shift value of the face key point as the parameter calculated, count Calculate the second penalty values.

In one example, the second distance between prediction coordinate and offset coordinates can be calculated, second distance is calculated Average value, as the second penalty values.

By taking Euclidean distance as an example, the second penalty values are calculated by following formula:

Wherein, the total n of human face data (n is positive integer) a face key point, (x_2i, y_2i) it is that (i is i-th in the network of the second level Positive integer, i≤n) a face key point prediction coordinate,For the prediction coordinate of i-th of face key point and true Residual error (i.e. coordinate shift value) between real coordinate.

At this point,(x_1i, y_1i) it is that i-th of face closes in first order network The prediction coordinate of key point.

In each round iteration, judge whether the second penalty values meet preset condition, the second threshold such as less than set Deng, if so, determining that second level network training is completed, otherwise, the parameter in the network of the second level is adjusted, into next iteration, Continue to train, so that second level network under the constraint of loss function, is gradually restrained by modes such as backpropagations, until To stabilization, stop iteration.

Step 105, when the second level network training is completed, determine that the cascade network is face critical point detection mould Type.

After second level network convergence is stablized, the parameter in the network of the second level is fixed, at this point, the cascade network is face Critical point detection model.

Referring to Fig. 3, a kind of face based on face critical point detection model for showing one embodiment of the invention is crucial The step flow chart of point detecting method, can specifically include following steps:

Step 301, human face data is extracted from target image.

Step 302, the human face data is input to the first order network to handle, exports the pre- of face key point Survey coordinate.

Step 303, in the human face data, based on the prediction Coordinate generation target data.

Step 304, the target data is input to the second level network to handle, exports the face key point Coordinate shift value.

Step 305, the coordinate shift value is added on the basis of the prediction coordinate, obtains the face key point Coordinates of targets.

In practical applications, face critical point detection model can be deployed in access control system, monitoring system, payment system, The systems such as camera programm carry out Face datection to user according to the demand of business, identify face key point therein, as face takes turns Wide coordinate, face coordinate etc..

In embodiments of the present invention, face critical point detection model includes first order network and second level network, every grade of net Network can independently be run, and the output of first order network relies on the input of second level network.

If obtaining the target image of face key point to be detected, Face datection can be carried out to the target image, known Region where others' face, and the region is cut to specify the block of pixels of size (such as 10*10), as human face data.

Human face data is inputted in first order network, is handled according to the logic of first order network, output face is crucial The prediction coordinate of point.

For example, as shown in Fig. 2, in sequence in 5 convolutional layers carry out process of convolution after, 2 full articulamentums into Row process of convolution.

At this point, can extract partial data based on face key point in human face data, group is combined into target data, to reduce Data volume.

In one embodiment, partial image data can be extracted in human face data, based on prediction coordinate.It will be multiple The corresponding partial image data of face key point is combined into data matrix according to color group, as target data.

Hereafter, target data is inputted in the network of the second level, is handled according to the logic of second level network, export face The coordinate shift value of key point.

For example, as shown in Fig. 2, in sequence in 3 convolutional layers carry out process of convolution after, 2 full articulamentums into Row process of convolution.

At this point, adding coordinate shift value on the basis of predicting coordinate, it can the coordinates of targets as face key point:

(x+ Δ x, y+ Δ y)

Wherein, (x, y) is the prediction coordinate of face key point, and (Δ x, Δ y) are the coordinate shift value of face key point.

Output of the coordinates of targets as face critical point detection model, for other modules in system carry out using.

In embodiments of the present invention, cascade network includes first order network and second level network, is extracted from target image Human face data simultaneously inputs first order network and is handled, and the prediction coordinate of face key point is exported, in the human face data, base In prediction Coordinate generation target data, target data is input to the second level network and is handled, exports face key point Coordinate shift value, on the basis of predicting coordinate add coordinate shift value, obtain face key point coordinates of targets, pass through two Grade e-learning prediction coordinate and coordinate shift value, under complex scene, the seat of still available accurate face key point Mark, also, the input size of second level network much smaller than first order network input size, can reduce second level network when Between consume, be suitable for carrying out face critical point detection in the lesser equipment of the resources such as mobile terminal, to improve practicability.

It should be noted that for simple description, therefore, it is stated as a series of action groups for embodiment of the method It closes, but those skilled in the art should understand that, embodiment of that present invention are not limited by the describe sequence of actions, because according to According to the embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also should Know, the embodiments described in the specification are all preferred embodiments, and the related movement not necessarily present invention is implemented Necessary to example.

Referring to Fig. 4, a kind of face critical point detection model based on cascade network of one embodiment of the invention is shown Training device structural block diagram, the cascade network includes first order network and second level network, and described device specifically can be with Including following module:

Human face data extraction module 401, for extracting human face data from training image；

First order network training module 402 is trained for the human face data to be input to the first order network, The first order network is used to export the prediction coordinate of face key point；

Target data generation module 403 is used for when the first order network training is completed, in the human face data, Based on the prediction Coordinate generation target data；

Second level network training module 404 is trained for the target data to be input to the second level network, The second level network is used to export the coordinate shift value of the face key point；

Model determining module 405, for when the second level network training is completed, determining that the cascade network is face Critical point detection model.

In one embodiment of the invention, the first order network training module 402 includes:

In one example of an embodiment of the present invention, the first-loss value computational submodule includes:

In one embodiment of the invention, the target data generation module 403 includes:

In one embodiment of the invention, the second level network training module 404 includes:

In one example of an embodiment of the present invention, the second penalty values computational submodule includes:

In one embodiment of the invention, further includes:

Wherein, the data enhancing processing includes at least one following:

Referring to Fig. 5, a kind of face based on face critical point detection model for showing one embodiment of the invention is crucial The structural block diagram of point detection device, the face critical point detection model include first order network and second level network, the dress It sets and can specifically include following module:

Human face data extraction module 501, for extracting human face data from target image；

First order network process module 502 is handled for the human face data to be input to the first order network, Export the prediction coordinate of face key point；

Target data generation module 503 is used in the human face data, is based on the prediction Coordinate generation number of targets According to；

Second level network process module 504 is handled for the target data to be input to the second level network, Export the coordinate shift value of the face key point；

Coordinates of targets computing module 505 is obtained for adding the coordinate shift value on the basis of prediction coordinate The coordinates of targets of the face key point.

In one embodiment of the invention, the target data generation module 503 includes:

For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple Place illustrates referring to the part of embodiment of the method.

All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.

It should be understood by those skilled in the art that, the embodiment of the embodiment of the present invention can provide as method, apparatus or calculate Machine program product.Therefore, the embodiment of the present invention can be used complete hardware embodiment, complete software embodiment or combine software and The form of the embodiment of hardware aspect.Moreover, the embodiment of the present invention can be used one or more wherein include computer can With in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program code The form of the computer program product of implementation.

The embodiment of the present invention be referring to according to the method for the embodiment of the present invention, terminal device (system) and computer program The flowchart and/or the block diagram of product describes.It should be understood that flowchart and/or the block diagram can be realized by computer program instructions In each flow and/or block and flowchart and/or the block diagram in process and/or box combination.It can provide these Computer program instructions are set to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing terminals Standby processor is to generate a machine, so that being held by the processor of computer or other programmable data processing terminal devices Capable instruction generates for realizing in one or more flows of the flowchart and/or one or more blocks of the block diagram The device of specified function.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing terminal devices In computer-readable memory operate in a specific manner, so that instruction stored in the computer readable memory generates packet The manufacture of command device is included, which realizes in one side of one or more flows of the flowchart and/or block diagram The function of being specified in frame or multiple boxes.

These computer program instructions can also be loaded into computer or other programmable data processing terminal devices, so that Series of operation steps are executed on computer or other programmable terminal equipments to generate computer implemented processing, thus The instruction executed on computer or other programmable terminal equipments is provided for realizing in one or more flows of the flowchart And/or in one or more blocks of the block diagram specify function the step of.

Although the preferred embodiment of the embodiment of the present invention has been described, once a person skilled in the art knows bases This creative concept, then additional changes and modifications can be made to these embodiments.So the following claims are intended to be interpreted as Including preferred embodiment and fall into all change and modification of range of embodiment of the invention.

Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that process, method, article or terminal device including a series of elements not only wrap Those elements are included, but also including other elements that are not explicitly listed, or further includes for this process, method, article Or the element that terminal device is intrinsic.In the absence of more restrictions, being wanted by what sentence "including a ..." limited Element, it is not excluded that there is also other identical elements in process, method, article or the terminal device for including the element.

Above to a kind of training method of the face critical point detection model based on cascade network provided by the present invention, one Face critical point detection method and a kind of face key point inspection based on cascade network of the kind based on face critical point detection model Training device, a kind of face critical point detection device based on face critical point detection model for surveying model, have carried out detailed Jie It continues, used herein a specific example illustrates the principle and implementation of the invention, and the explanation of above embodiments is only It is to be used to help understand method and its core concept of the invention；At the same time, for those skilled in the art, according to this hair Bright thought, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification should not manage Solution is limitation of the present invention.

Claims

1. a kind of training method of the face critical point detection model based on cascade network, which is characterized in that the cascade network Including first order network and second level network, which comprises

Human face data is extracted from training image；

The human face data is input to the first order network to be trained, the first order network is for exporting face key The prediction coordinate of point；

The target data is input to the second level network to be trained, the second level network is for exporting the face The coordinate shift value of key point；

When the second level network training is completed, determine that the cascade network is face critical point detection model；

Wherein, it is described in the human face data, be based on the prediction Coordinate generation target data, comprising: in the face number Partial image data is extracted in, based on the prediction coordinate；By the corresponding partial image data of multiple face key points It is combined into data matrix according to color group, as target data；

The human face data is previously provided with the true coordinate of the face key point.

2. the method according to claim 1, wherein described be input to the first order net for the human face data Network is trained, comprising:

First-loss value is calculated using the prediction coordinate；

If so, determining that the first order network training is completed；

If it is not, then adjusting the first order network according to the first-loss value, it is described that the human face data is defeated to return to execution Enter to the first order network and handled, exports the original coordinates of face key point.

3. according to the method described in claim 2, it is characterized in that, it is described using the prediction coordinate calculating first-loss value, Include:

The average value for calculating the first distance, as first-loss value.

4. the method according to claim 1, wherein described be input to the second level net for the target data Network is trained, comprising:

The target data is input to the second level network to handle, exports the coordinate shift of the face key point Value；

Second penalty values are calculated using the coordinate shift value；

If so, determining that the second level network training is completed；

If it is not, then adjusting the second level network according to second penalty values, it is described that the target data is defeated to return to execution Enter to the second level network and handled, exports the coordinate shift value of the face key point.

5. according to the method described in claim 4, it is characterized in that, described calculate the second loss using the coordinate shift value Value, comprising:

Calculate the second distance between the prediction coordinate and the coordinate shift value；

6. method according to claim 1-5, which is characterized in that further include:

Data enhancing processing is carried out to the human face data；

Wherein, the data enhancing processing includes at least one following:

7. a kind of face critical point detection method based on face critical point detection model, which is characterized in that the face is crucial Point detection model includes first order network and second level network, which comprises

Human face data is extracted from target image；

The coordinate shift value is added on the basis of the prediction coordinate, obtains the coordinates of targets of the face key point；

Wherein, it is described in the human face data, be based on the prediction Coordinate generation target data, comprising: based on the prediction Coordinate extracts partial image data；The corresponding partial image data of multiple face key points is combined into data square according to color group Battle array, as target data；

8. a kind of training device of the face critical point detection model based on cascade network, which is characterized in that the cascade network Including first order network and second level network, described device includes:

First order network training module is trained for the human face data to be input to the first order network, and described Primary network station is used to export the prediction coordinate of face key point；

Target data generation module is used for when the first order network training is completed, in the human face data, based on described Predict Coordinate generation target data；

Second level network training module is trained for the target data to be input to the second level network, and described Two grade network is used to export the coordinate shift value of the face key point；

Model determining module, for when the second level network training is completed, determining that the cascade network is face key point Detection model；

Wherein, the target data generation module includes:

Partial image data extracting sub-module, for extracting topography in the human face data, based on the prediction coordinate Data；

Matrix group zygote module, for the corresponding partial image data of multiple face key points to be combined into number according to color group According to matrix, as target data；

9. device according to claim 8, which is characterized in that the first order network training module includes:

Human face data input submodule is handled for the human face data to be input to the first order network, exports people The prediction coordinate of face key point；

First order network convergence judging submodule, for judging whether the first order network is received according to the first-loss value It holds back；If so, calling first order network to complete submodule, if it is not, then calling first order network adjusting submodule；

First order network adjusting submodule returns for adjusting the first order network according to the first-loss value and calls institute State human face data input submodule.

10. device according to claim 9, which is characterized in that the first-loss value computational submodule includes:

11. device according to claim 8, which is characterized in that the second level network training module includes:

Target data input submodule is handled for the target data to be input to the second level network, exports institute State the coordinate shift value of face key point；

Second level network convergence judging submodule, for judging whether the second level network is received according to second penalty values It holds back；If so, calling second level network to complete submodule, if it is not, then calling second level network adjusting submodule；

Second level network adjusting submodule returns for adjusting the second level network according to second penalty values and calls institute State target data input submodule.

12. device according to claim 11, which is characterized in that the second penalty values computational submodule includes:

Second distance computing unit, for calculating the second distance between the prediction coordinate and the coordinate shift value；

13. according to the described in any item devices of claim 8-12, which is characterized in that further include:

Wherein, the data enhancing processing includes at least one following:

14. a kind of face critical point detection device based on face critical point detection model, which is characterized in that the face is crucial Point detection model includes first order network and second level network, and described device includes:

First order network process module is handled for the human face data to be input to the first order network, exports people The prediction coordinate of face key point；

Second level network process module is handled for the target data to be input to the second level network, exports institute State the coordinate shift value of face key point；

Coordinates of targets computing module obtains the people for adding the coordinate shift value on the basis of prediction coordinate The coordinates of targets of face key point；

Wherein, the target data generation module includes: