CN108399373B - The model training and its detection method and device of face key point - Google Patents
The model training and its detection method and device of face key point Download PDFInfo
- Publication number
- CN108399373B CN108399373B CN201810118211.3A CN201810118211A CN108399373B CN 108399373 B CN108399373 B CN 108399373B CN 201810118211 A CN201810118211 A CN 201810118211A CN 108399373 B CN108399373 B CN 108399373B
- Authority
- CN
- China
- Prior art keywords
- data
- network
- coordinate
- face
- human face
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The embodiment of the invention provides the model training of face key point and its detection method and device, which includes: that human face data is extracted from training image;The human face data is input to the first order network to be trained, the first order network is used to export the prediction coordinate of face key point;When the first order network training is completed, in the human face data, it is based on the prediction Coordinate generation target data;The target data is input to the second level network to be trained, the second level network is used to export the coordinate shift value of the face key point;When the second level network training is completed, determine that the cascade network is face critical point detection model.Learn prediction coordinate and coordinate shift value, under complex scene, the coordinate of still available accurate face key point by two-level network.
Description
Technical field
The present invention relates to the technical fields of computer disposal, model training and its detection more particularly to face key point
Method and apparatus.
Background technique
Face critical point detection is one of the basic technology in facial image research, it is therefore an objective to automatically estimate face picture
The coordinate of upper facial feature points, for example, face mask coordinate, face coordinate etc., are widely used in recognition of face, posture is estimated
Meter, face filter, make up U.S. face, three-dimensional modeling etc..
In existing face critical point detection technology, traditional method includes being based on shape constraining method, is based on cascading back
The method returned, classical model have active shape model (Active Shape Models, ASM)) and cascade regression model
(Cascaded pose regression, CPR) etc..
But traditional method robustness is poor, under complex scene, the detection accuracy of face key point is lower.
Summary of the invention
The embodiment of the present invention proposes the model training and its detection method and device of face key point, to solve in complexity
Under scene, the lower problem of the detection accuracy of face key point.
According to one aspect of the present invention, a kind of training of face critical point detection model based on cascade network is provided
Method, the cascade network include first order network and second level network, which comprises
Human face data is extracted from training image;
The human face data is input to the first order network to be trained, the first order network is for exporting face
The prediction coordinate of key point;
When the first order network training is completed, in the human face data, it is based on the prediction Coordinate generation target
Data;
The target data is input to the second level network to be trained, the second level network is described for exporting
The coordinate shift value of face key point;
When the second level network training is completed, determine that the cascade network is face critical point detection model.
It is optionally, described the human face data is input to the first order network to be trained, comprising:
The human face data is input to the first order network to handle, exports the prediction coordinate of face key point;
First-loss value is calculated using the prediction coordinate;
Judge whether the first order network restrains according to the first-loss value;
If so, determining that the first order network training is completed;
If it is not, then adjusting the first order network according to the first-loss value, it is described by the face number to return to execution
It is handled according to the first order network is input to, exports the original coordinates of face key point.
It is optionally, described that first-loss value is calculated using the prediction coordinate, comprising:
Calculate the first distance between the prediction coordinate and true coordinate;
The average value for calculating the first distance, as first-loss value.
Optionally, it is described in the human face data, be based on the prediction Coordinate generation target data, comprising:
Partial image data is extracted in the human face data, based on the prediction coordinate;
The corresponding partial image data of multiple face key points is combined into data matrix according to color group, as target
Data.
It is optionally, described the target data is input to the second level network to be trained, comprising:
The target data is input to the second level network to handle, the coordinate for exporting the face key point is inclined
Shifting value;
Second penalty values are calculated using the coordinate shift value;
Judge whether the second level network restrains according to second penalty values;
If so, determining that the second level network training is completed;
If it is not, then adjusting the second level network according to second penalty values, it is described by the number of targets to return to execution
It is handled according to the second level network is input to, exports the coordinate shift value of the face key point.
It is optionally, described that second penalty values are calculated using the coordinate shift value, comprising:
Calculate the second distance between the prediction coordinate and the offset coordinates;
The average value for calculating the second distance, as the second penalty values.
Optionally, further includes:
Data enhancing processing is carried out to the human face data;
Wherein, the data enhancing processing includes at least one following:
Increase noise data, cutting and restores, translation processing, increases contrast.
According to another aspect of the present invention, a kind of face critical point detection based on face critical point detection model is provided
Method, the face critical point detection model include first order network and second level network, which comprises
Human face data is extracted from target image;
The human face data is input to the first order network to handle, exports the prediction coordinate of face key point;
In the human face data, it is based on the prediction Coordinate generation target data;
The target data is input to the second level network to handle, the coordinate for exporting the face key point is inclined
Shifting value;
The coordinate shift value is added on the basis of the prediction coordinate, the target for obtaining the face key point is sat
Mark.
According to another aspect of the present invention, a kind of training of face critical point detection model based on cascade network is provided
Device, the cascade network include first order network and second level network, and described device includes:
Human face data extraction module, for extracting human face data from training image;
First order network training module is trained, institute for the human face data to be input to the first order network
First order network is stated for exporting the prediction coordinate of face key point;
Target data generation module, in the human face data, being based on when the first order network training is completed
The prediction Coordinate generation target data;
Second level network training module is trained, institute for the target data to be input to the second level network
Second level network is stated for exporting the coordinate shift value of the face key point;
Model determining module, for when the second level network training is completed, determining the cascade network for face pass
Key point detection model.
Optionally, the first order network training module includes:
Human face data input submodule is handled for the human face data to be input to the first order network, defeated
The prediction coordinate of face key point out;
First-loss value computational submodule, for calculating first-loss value using the prediction coordinate;
First order network convergence judging submodule, for whether judging the first order network according to the first-loss value
Convergence;If so, calling first order network to complete submodule, if it is not, then calling first order network adjusting submodule;
First order network completes submodule, for determining that the first order network training is completed;
First order network adjusting submodule is returned and is adjusted for adjusting the first order network according to the first-loss value
With the human face data input submodule.
Optionally, the first-loss value computational submodule includes:
First distance computing unit, for calculating the first distance between the prediction coordinate and true coordinate;
First average calculation unit, for calculating the average value of the first distance, as first-loss value.
Optionally, the target data generation module includes:
Partial image data extracting sub-module, for extracting part in the human face data, based on the prediction coordinate
Image data;
Matrix group zygote module, for combining the corresponding partial image data of multiple face key points according to color
For data matrix, as target data.
Optionally, the second level network training module includes:
Target data input submodule is handled for the target data to be input to the second level network, defeated
The coordinate shift value of the face key point out;
Second penalty values computational submodule, for calculating the second penalty values using the coordinate shift value;
Second level network convergence judging submodule, for whether judging the second level network according to second penalty values
Convergence;If so, calling second level network to complete submodule, if it is not, then calling second level network adjusting submodule;
Second level network completes submodule, for determining that the second level network training is completed;
Second level network adjusting submodule is returned and is adjusted for adjusting the second level network according to second penalty values
With the target data input submodule.
Optionally, the second penalty values computational submodule includes:
Second distance computing unit, for calculating the second distance between the prediction coordinate and the offset coordinates;
Second average calculation unit, for calculating the average value of the second distance, as the second penalty values.
Optionally, further includes:
Data enhance processing module, for carrying out data enhancing processing to the human face data;
Wherein, the data enhancing processing includes at least one following:
Increase noise data, cutting and restores, translation processing, increases contrast.
According to another aspect of the present invention, a kind of face critical point detection based on face critical point detection model is provided
Device, the face critical point detection model includes first order network and second level network, and described device includes:
Human face data extraction module, for extracting human face data from target image;
First order network process module is handled for the human face data to be input to the first order network, defeated
The prediction coordinate of face key point out;
Target data generation module is used in the human face data, is based on the prediction Coordinate generation target data;
Second level network process module is handled for the target data to be input to the second level network, defeated
The coordinate shift value of the face key point out;
Coordinates of targets computing module obtains institute for adding the coordinate shift value on the basis of prediction coordinate
State the coordinates of targets of face key point.
The embodiment of the present invention includes following advantages:
In embodiments of the present invention, cascade network includes first order network and second level network, and first order network is for defeated
The prediction coordinate of face key point out, second level network is used to export the coordinate shift value of face key point, from training image
It extracts human face data and is input to first order network and be trained, when first order network training is completed, in human face data, base
It in prediction Coordinate generation target data and is input to second level network and is trained, when second level network training is completed, determine
Cascade network is face critical point detection model, learns prediction coordinate and coordinate shift value by two-level network, in complex scene
Under, the coordinate of still available accurate face key point, also, the input size of second level network is much smaller than first order net
The input size of network, can reduce the time loss of second level network, be suitable for the lesser equipment of the resources such as mobile terminal into
Pedestrian's face critical point detection, to improve practicability.
Detailed description of the invention
Fig. 1 is a kind of training method of face critical point detection model based on cascade network of one embodiment of the invention
Step flow chart;
Fig. 2 is a kind of first order network of one embodiment of the invention and the topology example figure of second level network;
Fig. 3 is a kind of face critical point detection method based on face critical point detection model of one embodiment of the invention
Step flow chart;
Fig. 4 is a kind of training device of face critical point detection model based on cascade network of one embodiment of the invention
Structural block diagram;
Fig. 5 is a kind of face critical point detection device based on face critical point detection model of one embodiment of the invention
Structural block diagram.
Specific embodiment
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real
Applying mode, the present invention is described in further detail.
Referring to Fig.1, a kind of face critical point detection model based on cascade network of one embodiment of the invention is shown
Training method step flow chart, can specifically include following steps:
Step 101, human face data is extracted from training image.
It in embodiments of the present invention, can be using cascade network training face critical point detection model, to detect face
Key point.
Cascade network includes first order network and second level network, and every grade of network can independently be run, first order net
The output of network relies on the input of second level network.
First order network and second are used as with CNN (Convolutional Neural Network, convolutional neural networks)
The example of grade network, CNN have multiple layers, upper one layer of the input exported as next layer.
Each layer of CNN is generally made of multiple map, and each map is made of multiple neural units, the institute of the same map
There is neural unit to share a convolution kernel (i.e. weight), convolution kernel often represents a feature, for example some convolution sum represents one
Section arc, then this convolution kernel is rolled on entire picture, the biggish region of convolution value is just likely to be one section of arc.
As shown in Fig. 2, in this example, first order network (CNN) has 5 preceding convolutional layers of sequence, 2 sequences exist
Full articulamentum afterwards, second level network (CNN) have 3 preceding convolutional layers of sequence, 2 posterior full articulamentums of sequence.
In convolutional layer, convolutional layer is substantially a feature extraction layer, can set hyper parameter F to specify and set up how many
Feature extractor (Filter) is equivalent to the moving window of a k*d size from input matrix for some Filter
First character start constantly to move backward, wherein k and d is the specified window size of Filter.For window sometime,
By the nonlinear transformation of neural network, the input value in this window is converted into some characteristic value, as window is constantly past
After move, the corresponding characteristic value of this Filter constantly generates, and forms the feature vector of this Filter.Here it is convolutional layer pumpings
Take the process of feature.Each Filter is so operated, and forms different feature extractors.
It is that the convolution kernel of n 1*1 carries out convolution to upper layer feature, then after to convolution in full articulamentum
Feature is a mean value pooling.
Certainly, the structure of above-mentioned first order network and second level network is intended only as example, is implementing the embodiment of the present invention
When, the structure of other first order networks and second level network can be set according to the actual situation, this is not added in the embodiment of the present invention
With limitation.In addition, those skilled in the art can also be according to reality other than the structure of above-mentioned first order network and second level network
Border needs the structure using other first order networks and second level network, and the embodiment of the present invention is also without restriction to this.
When using cascade network training face critical point detection model, pre-prepd training can be extracted from picture library
Image is the image data comprising face in the training image.
Face datection is carried out to the training image, identifies the region where face, and the region is cut to specified size
The block of pixels of (such as 10*10), as human face data.
In the concrete realization, Face datection can be carried out by following one or more modes:
1, reference template method
The template of one or several standard faces is designed first, between the sample and standard form for then calculating test acquisition
Matching degree, and pass through threshold value to determine whether there are faces.
2, face rule method
Since face has certain structure distribution feature, the method for so-called face rule extracts these features and generates phase
The rule answered is to judge whether test sample includes face
3, sample learning method
This method is the method for using artificial neural network in pattern-recognition, i.e., by opposite as sample sets and non-image surface
The study of sample sets generates classifier
4, complexion model method
This method is to be distributed the rule of Relatively centralized in color space according to the looks colour of skin to be detected.
5, sub-face of feature method
This method be all image surface set are considered as to an image surface subspace, and based on test sample and its in subspace
The distance between projection judge whether there is image surface.
In video, the scenes such as take pictures, the detection of face key point is often shaken larger, and the shake of human world key point may
It is related with the detection shake of face frame, it is also related with illumination variation.
It in embodiments of the present invention,, can be right in the training process of face critical point detection model in order to reduce shake
Human face data carries out data enhancing processing, and the enhancing of human face data helps to improve the robustness of face critical point detection model.
In one example, data enhancing processing includes at least one following:
1, increase noise data
Random pixel value is added in human face data, as noise (such as Gaussian noise).
2, it cuts and restores
By human face data random cropping, it is generally cut to the 80%~100% of full size, and be stretched to full size
(resize), at this point, the true coordinate of face key point will also do corresponding transformation, it is ensured that the position of face key point will not be inclined
It moves.
3, translation is handled
The pixel value of facial image is moved integrally, the pixel value of area of absence waits other pixel values to fill up with 0.
4, increase contrast
By in human face data, the biggish region of pixel value increases its pixel value, and the low region of pixel value reduces its pixel value.
Certainly, above-mentioned data enhancing processing is intended only as example, in implementing the embodiments of the present invention, can be according to practical feelings
Other data enhancings processing is arranged in condition, and the embodiments of the present invention are not limited thereto.In addition, in addition to above-mentioned data enhancing is handled,
Those skilled in the art can also be handled using other data enhancings according to actual needs, and the embodiment of the present invention is not also subject to this
Limitation.
Step 102, the human face data first order network is input to be trained.
In embodiments of the present invention, first order network can be used for exporting the prediction coordinate of face key point.
As shown in Fig. 2, human face data is inputted in first order network, using human face data as training sample, to the first order
Network is trained.
In one embodiment of the invention, step 102 may include following sub-step:
The human face data is input to the first order network and handled by sub-step S11, output face key point
Predict coordinate.
Sub-step S12 calculates first-loss value using the prediction coordinate.
Sub-step S13 judges whether the first order network restrains according to the first-loss value;If so, executing son
Step S14, if it is not, then executing sub-step S15.
Sub-step S14 determines that the first order network training is completed.
Sub-step S15 adjusts the first order network according to the first-loss value, returns and execute sub-step S11.
In embodiments of the present invention, human face data is previously provided with the true coordinate of face key point.
Human face data is inputted in first order network, is handled according to the logic of first order network, for example, such as Fig. 2 institute
Show, after carrying out process of convolution in 5 convolutional layers in sequence, carries out process of convolution in 2 full articulamentums.
After first order network processes finish, the prediction coordinate of face key point is exported.
At this point, being inputted in preset loss function using the prediction coordinate of the face key point as the parameter calculated, calculate
First-loss value.
In one example, the first distance between prediction coordinate and true coordinate can be calculated, first distance is calculated
Average value, as first-loss value.
By taking Euclidean distance as an example, first-loss value is calculated by following formula:
Wherein, the total n of human face data (n is positive integer) a face key point, (x1i, y1i) it is that (i is i-th in first order network
Positive integer, i≤n) a face key point prediction coordinate,For the true coordinate of i-th of face key point.
In each round iteration, judge whether first-loss value meets preset condition, the first threshold such as less than set
Deng, if so, determining that first order network training is completed, otherwise, the parameter in first order network is adjusted, into next iteration,
Continue to train, so that first order network under the constraint of loss function, is gradually restrained by modes such as backpropagations, until
Stablize, stop iteration.
Step 103, when the first order network training is completed, in the human face data, it is based on the prediction coordinate
Generate target data.
In training face critical point detection model, first order network is first trained, after first order network convergence is stablized, Gu
Determine the parameter in first order network, is further continued for training second level network later.
As shown in Fig. 2, can extract partial data based on face key point in human face data, group is combined into target data,
To reduce data volume.
In one embodiment of the invention, step 103 may include following sub-step:
Sub-step S21 extracts partial image data in the human face data, based on the prediction coordinate.
The partial image data is converted to color matrix, as target data by sub-step S22.
It in embodiments of the present invention, can be using the prediction coordinate of face key point as datum mark, by neighbouring a certain range
Data in (such as 10*10) are cut out from human face data to be come, the partial image data as specified size (such as 10*10*3).
The corresponding partial image data of multiple face key points is combined into data matrix according to color group (to tie up based on color
Spend the matrix of composition), as target data.
For example, the size of partial image data is 10*10*3, wherein 3 be GRB (RGB) three Color Channels, n
The corresponding partial image data of a face key point combines, and is formed 10*10*3n data matrix.
Step 104, the target data second level network is input to be trained.
In embodiments of the present invention, second level network can be used for exporting the coordinate shift value of face key point, so-called seat
Deviant is marked, can refer to that prediction coordinate deviates the degree of true coordinate.
As shown in Fig. 2, target data is inputted in the network of the second level, using target data as training sample, to the second level
Network is trained.
In one embodiment of the invention, step 102 may include following sub-step:
The target data is input to the second level network and handled by sub-step S31, and it is crucial to export the face
The coordinate shift value of point.
Sub-step S32 calculates the second penalty values using the coordinate shift value.
Sub-step S33 judges whether the second level network restrains according to second penalty values;If so, executing son
Step S34, if it is not, then executing sub-step S35.
Sub-step S34 determines that the second level network training is completed.
Sub-step S35 adjusts the second level network according to second penalty values, returns and execute sub-step S31.
In embodiments of the present invention, human face data is previously provided with the true coordinate of face key point.
Target data is inputted in the network of the second level, is handled according to the logic of second level network, for example, such as Fig. 2 institute
Show, after carrying out process of convolution in 3 convolutional layers in sequence, carries out process of convolution in 2 full articulamentums.
It, can be with residual between the prediction coordinate and true coordinate of face key point after second level network processes finish
Difference, the coordinate shift value as face key point.
At this point, being inputted in preset loss function using the coordinate shift value of the face key point as the parameter calculated, count
Calculate the second penalty values.
In one example, the second distance between prediction coordinate and offset coordinates can be calculated, second distance is calculated
Average value, as the second penalty values.
By taking Euclidean distance as an example, the second penalty values are calculated by following formula:
Wherein, the total n of human face data (n is positive integer) a face key point, (x2i, y2i) it is that (i is i-th in the network of the second level
Positive integer, i≤n) a face key point prediction coordinate,For the prediction coordinate of i-th of face key point and true
Residual error (i.e. coordinate shift value) between real coordinate.
At this point,(x1i, y1i) it is that i-th of face closes in first order network
The prediction coordinate of key point.
In each round iteration, judge whether the second penalty values meet preset condition, the second threshold such as less than set
Deng, if so, determining that second level network training is completed, otherwise, the parameter in the network of the second level is adjusted, into next iteration,
Continue to train, so that second level network under the constraint of loss function, is gradually restrained by modes such as backpropagations, until
To stabilization, stop iteration.
Step 105, when the second level network training is completed, determine that the cascade network is face critical point detection mould
Type.
After second level network convergence is stablized, the parameter in the network of the second level is fixed, at this point, the cascade network is face
Critical point detection model.
In embodiments of the present invention, cascade network includes first order network and second level network, and first order network is for defeated
The prediction coordinate of face key point out, second level network is used to export the coordinate shift value of face key point, from training image
It extracts human face data and is input to first order network and be trained, when first order network training is completed, in human face data, base
It in prediction Coordinate generation target data and is input to second level network and is trained, when second level network training is completed, determine
Cascade network is face critical point detection model, learns prediction coordinate and coordinate shift value by two-level network, in complex scene
Under, the coordinate of still available accurate face key point, also, the input size of second level network is much smaller than first order net
The input size of network, can reduce the time loss of second level network, be suitable for the lesser equipment of the resources such as mobile terminal into
Pedestrian's face critical point detection, to improve practicability.
Referring to Fig. 3, a kind of face based on face critical point detection model for showing one embodiment of the invention is crucial
The step flow chart of point detecting method, can specifically include following steps:
Step 301, human face data is extracted from target image.
Step 302, the human face data is input to the first order network to handle, exports the pre- of face key point
Survey coordinate.
Step 303, in the human face data, based on the prediction Coordinate generation target data.
Step 304, the target data is input to the second level network to handle, exports the face key point
Coordinate shift value.
Step 305, the coordinate shift value is added on the basis of the prediction coordinate, obtains the face key point
Coordinates of targets.
In practical applications, face critical point detection model can be deployed in access control system, monitoring system, payment system,
The systems such as camera programm carry out Face datection to user according to the demand of business, identify face key point therein, as face takes turns
Wide coordinate, face coordinate etc..
In embodiments of the present invention, face critical point detection model includes first order network and second level network, every grade of net
Network can independently be run, and the output of first order network relies on the input of second level network.
If obtaining the target image of face key point to be detected, Face datection can be carried out to the target image, known
Region where others' face, and the region is cut to specify the block of pixels of size (such as 10*10), as human face data.
Human face data is inputted in first order network, is handled according to the logic of first order network, output face is crucial
The prediction coordinate of point.
For example, as shown in Fig. 2, in sequence in 5 convolutional layers carry out process of convolution after, 2 full articulamentums into
Row process of convolution.
At this point, can extract partial data based on face key point in human face data, group is combined into target data, to reduce
Data volume.
In one embodiment, partial image data can be extracted in human face data, based on prediction coordinate.It will be multiple
The corresponding partial image data of face key point is combined into data matrix according to color group, as target data.
Hereafter, target data is inputted in the network of the second level, is handled according to the logic of second level network, export face
The coordinate shift value of key point.
For example, as shown in Fig. 2, in sequence in 3 convolutional layers carry out process of convolution after, 2 full articulamentums into
Row process of convolution.
At this point, adding coordinate shift value on the basis of predicting coordinate, it can the coordinates of targets as face key point:
(x+ Δ x, y+ Δ y)
Wherein, (x, y) is the prediction coordinate of face key point, and (Δ x, Δ y) are the coordinate shift value of face key point.
Output of the coordinates of targets as face critical point detection model, for other modules in system carry out using.
In embodiments of the present invention, cascade network includes first order network and second level network, is extracted from target image
Human face data simultaneously inputs first order network and is handled, and the prediction coordinate of face key point is exported, in the human face data, base
In prediction Coordinate generation target data, target data is input to the second level network and is handled, exports face key point
Coordinate shift value, on the basis of predicting coordinate add coordinate shift value, obtain face key point coordinates of targets, pass through two
Grade e-learning prediction coordinate and coordinate shift value, under complex scene, the seat of still available accurate face key point
Mark, also, the input size of second level network much smaller than first order network input size, can reduce second level network when
Between consume, be suitable for carrying out face critical point detection in the lesser equipment of the resources such as mobile terminal, to improve practicability.
It should be noted that for simple description, therefore, it is stated as a series of action groups for embodiment of the method
It closes, but those skilled in the art should understand that, embodiment of that present invention are not limited by the describe sequence of actions, because according to
According to the embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also should
Know, the embodiments described in the specification are all preferred embodiments, and the related movement not necessarily present invention is implemented
Necessary to example.
Referring to Fig. 4, a kind of face critical point detection model based on cascade network of one embodiment of the invention is shown
Training device structural block diagram, the cascade network includes first order network and second level network, and described device specifically can be with
Including following module:
Human face data extraction module 401, for extracting human face data from training image;
First order network training module 402 is trained for the human face data to be input to the first order network,
The first order network is used to export the prediction coordinate of face key point;
Target data generation module 403 is used for when the first order network training is completed, in the human face data,
Based on the prediction Coordinate generation target data;
Second level network training module 404 is trained for the target data to be input to the second level network,
The second level network is used to export the coordinate shift value of the face key point;
Model determining module 405, for when the second level network training is completed, determining that the cascade network is face
Critical point detection model.
In one embodiment of the invention, the first order network training module 402 includes:
Human face data input submodule is handled for the human face data to be input to the first order network, defeated
The prediction coordinate of face key point out;
First-loss value computational submodule, for calculating first-loss value using the prediction coordinate;
First order network convergence judging submodule, for whether judging the first order network according to the first-loss value
Convergence;If so, calling first order network to complete submodule, if it is not, then calling first order network adjusting submodule;
First order network completes submodule, for determining that the first order network training is completed;
First order network adjusting submodule is returned and is adjusted for adjusting the first order network according to the first-loss value
With the human face data input submodule.
In one example of an embodiment of the present invention, the first-loss value computational submodule includes:
First distance computing unit, for calculating the first distance between the prediction coordinate and true coordinate;
First average calculation unit, for calculating the average value of the first distance, as first-loss value.
In one embodiment of the invention, the target data generation module 403 includes:
Partial image data extracting sub-module, for extracting part in the human face data, based on the prediction coordinate
Image data;
Matrix group zygote module, for combining the corresponding partial image data of multiple face key points according to color
For data matrix, as target data.
In one embodiment of the invention, the second level network training module 404 includes:
Target data input submodule is handled for the target data to be input to the second level network, defeated
The coordinate shift value of the face key point out;
Second penalty values computational submodule, for calculating the second penalty values using the coordinate shift value;
Second level network convergence judging submodule, for whether judging the second level network according to second penalty values
Convergence;If so, calling second level network to complete submodule, if it is not, then calling second level network adjusting submodule;
Second level network completes submodule, for determining that the second level network training is completed;
Second level network adjusting submodule is returned and is adjusted for adjusting the second level network according to second penalty values
With the target data input submodule.
In one example of an embodiment of the present invention, the second penalty values computational submodule includes:
Second distance computing unit, for calculating the second distance between the prediction coordinate and the offset coordinates;
Second average calculation unit, for calculating the average value of the second distance, as the second penalty values.
In one embodiment of the invention, further includes:
Data enhance processing module, for carrying out data enhancing processing to the human face data;
Wherein, the data enhancing processing includes at least one following:
Increase noise data, cutting and restores, translation processing, increases contrast.
Referring to Fig. 5, a kind of face based on face critical point detection model for showing one embodiment of the invention is crucial
The structural block diagram of point detection device, the face critical point detection model include first order network and second level network, the dress
It sets and can specifically include following module:
Human face data extraction module 501, for extracting human face data from target image;
First order network process module 502 is handled for the human face data to be input to the first order network,
Export the prediction coordinate of face key point;
Target data generation module 503 is used in the human face data, is based on the prediction Coordinate generation number of targets
According to;
Second level network process module 504 is handled for the target data to be input to the second level network,
Export the coordinate shift value of the face key point;
Coordinates of targets computing module 505 is obtained for adding the coordinate shift value on the basis of prediction coordinate
The coordinates of targets of the face key point.
In one embodiment of the invention, the target data generation module 503 includes:
Partial image data extracting sub-module, for extracting part in the human face data, based on the prediction coordinate
Image data;
Matrix group zygote module, for combining the corresponding partial image data of multiple face key points according to color
For data matrix, as target data.
For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple
Place illustrates referring to the part of embodiment of the method.
All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with
The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.
It should be understood by those skilled in the art that, the embodiment of the embodiment of the present invention can provide as method, apparatus or calculate
Machine program product.Therefore, the embodiment of the present invention can be used complete hardware embodiment, complete software embodiment or combine software and
The form of the embodiment of hardware aspect.Moreover, the embodiment of the present invention can be used one or more wherein include computer can
With in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program code
The form of the computer program product of implementation.
The embodiment of the present invention be referring to according to the method for the embodiment of the present invention, terminal device (system) and computer program
The flowchart and/or the block diagram of product describes.It should be understood that flowchart and/or the block diagram can be realized by computer program instructions
In each flow and/or block and flowchart and/or the block diagram in process and/or box combination.It can provide these
Computer program instructions are set to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing terminals
Standby processor is to generate a machine, so that being held by the processor of computer or other programmable data processing terminal devices
Capable instruction generates for realizing in one or more flows of the flowchart and/or one or more blocks of the block diagram
The device of specified function.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing terminal devices
In computer-readable memory operate in a specific manner, so that instruction stored in the computer readable memory generates packet
The manufacture of command device is included, which realizes in one side of one or more flows of the flowchart and/or block diagram
The function of being specified in frame or multiple boxes.
These computer program instructions can also be loaded into computer or other programmable data processing terminal devices, so that
Series of operation steps are executed on computer or other programmable terminal equipments to generate computer implemented processing, thus
The instruction executed on computer or other programmable terminal equipments is provided for realizing in one or more flows of the flowchart
And/or in one or more blocks of the block diagram specify function the step of.
Although the preferred embodiment of the embodiment of the present invention has been described, once a person skilled in the art knows bases
This creative concept, then additional changes and modifications can be made to these embodiments.So the following claims are intended to be interpreted as
Including preferred embodiment and fall into all change and modification of range of embodiment of the invention.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that process, method, article or terminal device including a series of elements not only wrap
Those elements are included, but also including other elements that are not explicitly listed, or further includes for this process, method, article
Or the element that terminal device is intrinsic.In the absence of more restrictions, being wanted by what sentence "including a ..." limited
Element, it is not excluded that there is also other identical elements in process, method, article or the terminal device for including the element.
Above to a kind of training method of the face critical point detection model based on cascade network provided by the present invention, one
Face critical point detection method and a kind of face key point inspection based on cascade network of the kind based on face critical point detection model
Training device, a kind of face critical point detection device based on face critical point detection model for surveying model, have carried out detailed Jie
It continues, used herein a specific example illustrates the principle and implementation of the invention, and the explanation of above embodiments is only
It is to be used to help understand method and its core concept of the invention;At the same time, for those skilled in the art, according to this hair
Bright thought, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification should not manage
Solution is limitation of the present invention.
Claims (14)
1. a kind of training method of the face critical point detection model based on cascade network, which is characterized in that the cascade network
Including first order network and second level network, which comprises
Human face data is extracted from training image;
The human face data is input to the first order network to be trained, the first order network is for exporting face key
The prediction coordinate of point;
When the first order network training is completed, in the human face data, it is based on the prediction Coordinate generation target data;
The target data is input to the second level network to be trained, the second level network is for exporting the face
The coordinate shift value of key point;
When the second level network training is completed, determine that the cascade network is face critical point detection model;
Wherein, it is described in the human face data, be based on the prediction Coordinate generation target data, comprising: in the face number
Partial image data is extracted in, based on the prediction coordinate;By the corresponding partial image data of multiple face key points
It is combined into data matrix according to color group, as target data;
The human face data is previously provided with the true coordinate of the face key point.
2. the method according to claim 1, wherein described be input to the first order net for the human face data
Network is trained, comprising:
The human face data is input to the first order network to handle, exports the prediction coordinate of face key point;
First-loss value is calculated using the prediction coordinate;
Judge whether the first order network restrains according to the first-loss value;
If so, determining that the first order network training is completed;
If it is not, then adjusting the first order network according to the first-loss value, it is described that the human face data is defeated to return to execution
Enter to the first order network and handled, exports the original coordinates of face key point.
3. according to the method described in claim 2, it is characterized in that, it is described using the prediction coordinate calculating first-loss value,
Include:
Calculate the first distance between the prediction coordinate and true coordinate;
The average value for calculating the first distance, as first-loss value.
4. the method according to claim 1, wherein described be input to the second level net for the target data
Network is trained, comprising:
The target data is input to the second level network to handle, exports the coordinate shift of the face key point
Value;
Second penalty values are calculated using the coordinate shift value;
Judge whether the second level network restrains according to second penalty values;
If so, determining that the second level network training is completed;
If it is not, then adjusting the second level network according to second penalty values, it is described that the target data is defeated to return to execution
Enter to the second level network and handled, exports the coordinate shift value of the face key point.
5. according to the method described in claim 4, it is characterized in that, described calculate the second loss using the coordinate shift value
Value, comprising:
Calculate the second distance between the prediction coordinate and the coordinate shift value;
The average value for calculating the second distance, as the second penalty values.
6. method according to claim 1-5, which is characterized in that further include:
Data enhancing processing is carried out to the human face data;
Wherein, the data enhancing processing includes at least one following:
Increase noise data, cutting and restores, translation processing, increases contrast.
7. a kind of face critical point detection method based on face critical point detection model, which is characterized in that the face is crucial
Point detection model includes first order network and second level network, which comprises
Human face data is extracted from target image;
The human face data is input to the first order network to handle, exports the prediction coordinate of face key point;
In the human face data, it is based on the prediction Coordinate generation target data;
The target data is input to the second level network to handle, exports the coordinate shift of the face key point
Value;
The coordinate shift value is added on the basis of the prediction coordinate, obtains the coordinates of targets of the face key point;
Wherein, it is described in the human face data, be based on the prediction Coordinate generation target data, comprising: based on the prediction
Coordinate extracts partial image data;The corresponding partial image data of multiple face key points is combined into data square according to color group
Battle array, as target data;
The human face data is previously provided with the true coordinate of the face key point.
8. a kind of training device of the face critical point detection model based on cascade network, which is characterized in that the cascade network
Including first order network and second level network, described device includes:
Human face data extraction module, for extracting human face data from training image;
First order network training module is trained for the human face data to be input to the first order network, and described
Primary network station is used to export the prediction coordinate of face key point;
Target data generation module is used for when the first order network training is completed, in the human face data, based on described
Predict Coordinate generation target data;
Second level network training module is trained for the target data to be input to the second level network, and described
Two grade network is used to export the coordinate shift value of the face key point;
Model determining module, for when the second level network training is completed, determining that the cascade network is face key point
Detection model;
Wherein, the target data generation module includes:
Partial image data extracting sub-module, for extracting topography in the human face data, based on the prediction coordinate
Data;
Matrix group zygote module, for the corresponding partial image data of multiple face key points to be combined into number according to color group
According to matrix, as target data;
The human face data is previously provided with the true coordinate of the face key point.
9. device according to claim 8, which is characterized in that the first order network training module includes:
Human face data input submodule is handled for the human face data to be input to the first order network, exports people
The prediction coordinate of face key point;
First-loss value computational submodule, for calculating first-loss value using the prediction coordinate;
First order network convergence judging submodule, for judging whether the first order network is received according to the first-loss value
It holds back;If so, calling first order network to complete submodule, if it is not, then calling first order network adjusting submodule;
First order network completes submodule, for determining that the first order network training is completed;
First order network adjusting submodule returns for adjusting the first order network according to the first-loss value and calls institute
State human face data input submodule.
10. device according to claim 9, which is characterized in that the first-loss value computational submodule includes:
First distance computing unit, for calculating the first distance between the prediction coordinate and true coordinate;
First average calculation unit, for calculating the average value of the first distance, as first-loss value.
11. device according to claim 8, which is characterized in that the second level network training module includes:
Target data input submodule is handled for the target data to be input to the second level network, exports institute
State the coordinate shift value of face key point;
Second penalty values computational submodule, for calculating the second penalty values using the coordinate shift value;
Second level network convergence judging submodule, for judging whether the second level network is received according to second penalty values
It holds back;If so, calling second level network to complete submodule, if it is not, then calling second level network adjusting submodule;
Second level network completes submodule, for determining that the second level network training is completed;
Second level network adjusting submodule returns for adjusting the second level network according to second penalty values and calls institute
State target data input submodule.
12. device according to claim 11, which is characterized in that the second penalty values computational submodule includes:
Second distance computing unit, for calculating the second distance between the prediction coordinate and the coordinate shift value;
Second average calculation unit, for calculating the average value of the second distance, as the second penalty values.
13. according to the described in any item devices of claim 8-12, which is characterized in that further include:
Data enhance processing module, for carrying out data enhancing processing to the human face data;
Wherein, the data enhancing processing includes at least one following:
Increase noise data, cutting and restores, translation processing, increases contrast.
14. a kind of face critical point detection device based on face critical point detection model, which is characterized in that the face is crucial
Point detection model includes first order network and second level network, and described device includes:
Human face data extraction module, for extracting human face data from target image;
First order network process module is handled for the human face data to be input to the first order network, exports people
The prediction coordinate of face key point;
Target data generation module is used in the human face data, is based on the prediction Coordinate generation target data;
Second level network process module is handled for the target data to be input to the second level network, exports institute
State the coordinate shift value of face key point;
Coordinates of targets computing module obtains the people for adding the coordinate shift value on the basis of prediction coordinate
The coordinates of targets of face key point;
Wherein, the target data generation module includes:
Partial image data extracting sub-module, for extracting topography in the human face data, based on the prediction coordinate
Data;
Matrix group zygote module, for the corresponding partial image data of multiple face key points to be combined into number according to color group
According to matrix, as target data;
The human face data is previously provided with the true coordinate of the face key point.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810118211.3A CN108399373B (en) | 2018-02-06 | 2018-02-06 | The model training and its detection method and device of face key point |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810118211.3A CN108399373B (en) | 2018-02-06 | 2018-02-06 | The model training and its detection method and device of face key point |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108399373A CN108399373A (en) | 2018-08-14 |
CN108399373B true CN108399373B (en) | 2019-05-10 |
Family
ID=63095216
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810118211.3A Active CN108399373B (en) | 2018-02-06 | 2018-02-06 | The model training and its detection method and device of face key point |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108399373B (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109376593B (en) * | 2018-09-10 | 2020-12-29 | 杭州格像科技有限公司 | Face feature point positioning method and system |
CN109597123B (en) * | 2018-10-26 | 2021-02-19 | 长江大学 | Effective signal detection method and system |
CN109558837B (en) * | 2018-11-28 | 2024-03-22 | 北京达佳互联信息技术有限公司 | Face key point detection method, device and storage medium |
CN109800648B (en) * | 2018-12-18 | 2021-09-28 | 北京英索科技发展有限公司 | Face detection and recognition method and device based on face key point correction |
CN109685023A (en) * | 2018-12-27 | 2019-04-26 | 深圳开立生物医疗科技股份有限公司 | A kind of facial critical point detection method and relevant apparatus of ultrasound image |
CN109858466A (en) * | 2019-03-01 | 2019-06-07 | 北京视甄智能科技有限公司 | A kind of face critical point detection method and device based on convolutional neural networks |
CN110021150B (en) * | 2019-03-27 | 2021-03-19 | 创新先进技术有限公司 | Data processing method, device and equipment |
CN109961103B (en) * | 2019-04-02 | 2020-10-27 | 北京迈格威科技有限公司 | Training method of feature extraction model, and image feature extraction method and device |
CN110309706B (en) * | 2019-05-06 | 2023-05-12 | 深圳华付技术股份有限公司 | Face key point detection method and device, computer equipment and storage medium |
CN112101342A (en) * | 2019-06-17 | 2020-12-18 | 顺丰科技有限公司 | Box key point detection method and device, computing equipment and computer readable storage medium |
CN110969100B (en) * | 2019-11-20 | 2022-10-25 | 北京奇艺世纪科技有限公司 | Human body key point identification method and device and electronic equipment |
CN110909664A (en) * | 2019-11-20 | 2020-03-24 | 北京奇艺世纪科技有限公司 | Human body key point identification method and device and electronic equipment |
CN111028212B (en) * | 2019-12-02 | 2024-02-27 | 上海联影智能医疗科技有限公司 | Key point detection method, device, computer equipment and storage medium |
CN111783948A (en) * | 2020-06-24 | 2020-10-16 | 北京百度网讯科技有限公司 | Model training method and device, electronic equipment and storage medium |
CN112115845B (en) * | 2020-09-15 | 2023-12-29 | 中山大学 | Active shape model parameterization method for face key point detection |
CN112861689A (en) * | 2021-02-01 | 2021-05-28 | 上海依图网络科技有限公司 | Searching method and device of coordinate recognition model based on NAS technology |
CN112949492A (en) * | 2021-03-03 | 2021-06-11 | 南京视察者智能科技有限公司 | Model series training method and device for face detection and key point detection and terminal equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103824049A (en) * | 2014-02-17 | 2014-05-28 | 北京旷视科技有限公司 | Cascaded neural network-based face key point detection method |
CN106575367A (en) * | 2014-08-21 | 2017-04-19 | 北京市商汤科技开发有限公司 | A method and a system for facial landmark detection based on multi-task |
CN106874898A (en) * | 2017-04-08 | 2017-06-20 | 复旦大学 | Extensive face identification method based on depth convolutional neural networks model |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1452997B1 (en) * | 2003-02-25 | 2010-09-15 | Canon Kabushiki Kaisha | Apparatus and method for managing articles |
US7118026B2 (en) * | 2003-06-26 | 2006-10-10 | International Business Machines Corporation | Apparatus, method, and system for positively identifying an item |
-
2018
- 2018-02-06 CN CN201810118211.3A patent/CN108399373B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103824049A (en) * | 2014-02-17 | 2014-05-28 | 北京旷视科技有限公司 | Cascaded neural network-based face key point detection method |
CN106575367A (en) * | 2014-08-21 | 2017-04-19 | 北京市商汤科技开发有限公司 | A method and a system for facial landmark detection based on multi-task |
CN106874898A (en) * | 2017-04-08 | 2017-06-20 | 复旦大学 | Extensive face identification method based on depth convolutional neural networks model |
Non-Patent Citations (3)
Title |
---|
基于两阶段定位模型的人脸对齐算法研究;王峰;《中国优秀硕士学位论文全文数据库信息科技辑》;20180115(第01期);第11-53页 |
基于级联卷积神经网络的人脸关键点检测算法;靳一凡;《中国优秀硕士学位论文全文数据库信息科技辑》;20160215(第02期);I138-1621 |
结合人脸检测的人脸特征点定位方法研究;董瑞霞;《中国优秀硕士学位论文全文数据库信息科技辑》;20180115(第01期);I138-967 |
Also Published As
Publication number | Publication date |
---|---|
CN108399373A (en) | 2018-08-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108399373B (en) | The model training and its detection method and device of face key point | |
Fischer et al. | Flownet: Learning optical flow with convolutional networks | |
CN104408742B (en) | A kind of moving target detecting method based on space time frequency spectrum Conjoint Analysis | |
US20100067863A1 (en) | Video editing methods and systems | |
CN107292247A (en) | A kind of Human bodys' response method and device based on residual error network | |
CN110956646B (en) | Target tracking method, device, equipment and storage medium | |
CN108198201A (en) | A kind of multi-object tracking method, terminal device and storage medium | |
CN111881804B (en) | Posture estimation model training method, system, medium and terminal based on joint training | |
CN109919059B (en) | Salient object detection method based on deep network layering and multi-task training | |
CN108121931A (en) | two-dimensional code data processing method, device and mobile terminal | |
CN107909026A (en) | Age and gender assessment based on the small-scale convolutional neural networks of embedded system | |
CN113095254B (en) | Method and system for positioning key points of human body part | |
CN109410211A (en) | The dividing method and device of target object in a kind of image | |
CN111008631B (en) | Image association method and device, storage medium and electronic device | |
CN111723707A (en) | Method and device for estimating fixation point based on visual saliency | |
CN106952304A (en) | A kind of depth image computational methods of utilization video sequence interframe correlation | |
CN113177470A (en) | Pedestrian trajectory prediction method, device, equipment and storage medium | |
CN107948586A (en) | Trans-regional moving target detecting method and device based on video-splicing | |
CN110969110A (en) | Face tracking method and system based on deep learning | |
CN101587590A (en) | Selective visual attention computation model based on pulse cosine transform | |
CN111260687B (en) | Aerial video target tracking method based on semantic perception network and related filtering | |
CN114021704B (en) | AI neural network model training method and related device | |
CN117095300B (en) | Building image processing method, device, computer equipment and storage medium | |
CN109325405A (en) | A kind of mask method of lens type, device and equipment | |
CN112257492A (en) | Real-time intrusion detection and tracking method for multiple cameras |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |