WO2016026063A1 - A method and a system for facial landmark detection based on multi-task - Google Patents
A method and a system for facial landmark detection based on multi-task Download PDFInfo
- Publication number
- WO2016026063A1 WO2016026063A1 PCT/CN2014/000769 CN2014000769W WO2016026063A1 WO 2016026063 A1 WO2016026063 A1 WO 2016026063A1 CN 2014000769 W CN2014000769 W CN 2014000769W WO 2016026063 A1 WO2016026063 A1 WO 2016026063A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- facial
- training
- task
- landmark
- error
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
- G06V40/165—Detection; Localisation; Normalisation using facial parts and geometric relationships
Definitions
- the present application relates to face alignment, in particular, to a method and a system for facial landmark detection.
- Facial landmark detection is a fundamental component in many face analysis tasks, such as facial attribute inference, face verification, and face recognition, but has long been impeded by problems of occlusion and pose variation.
- Accurate facial landmark detection can be performed using a cascaded CNN (Convolutional Neural Network), in which faces are divided into different parts by pre-partition, each of which is processed by separate deep CNNs. The resulting outputs are subsequently averaged and channeled to separate cascaded layers to process each facial landmark individually.
- a cascaded CNN Convolutional Neural Network
- the facial landmark detection is not a standalone problem, and its estimation may be influenced by a number of heterogeneous and subtly correlated factors. For example, when a kid is smiling, his/her mouth is widely opened. Effectively discovering and exploiting such an intrinsically correlated facial attribute would help in detecting the mouth corners more accurately. Also, the inter-ocular distance is smaller in faces with large yaw rotation. Such pose information may be leveraged as additional source of information to constrain the solution space of landmark estimation. Given the rich set of plausible related tasks, treating facial landmark detection in isolation is counterproductive.
- a method for detecting facial landmarks of a face image may comprise extracting multiple feature maps from at least one facial region of the face image; generating a shared facial feature vector from the extracted multiple feature maps; and predicting facial landmark locations of the face image from the generated shared facial feature vector.
- a system for detecting facial landmarks of a face image may comprise a feature extractor and a predictor.
- the feature extractor may extract multiple feature maps from at least one facial region of the face image and generate a shared facial feature vector from the extracted multiple feature maps.
- the predictor may predict facial landmark locations of the face image from the shared facial feature vector generated by the feature extractor.
- the method may comprise 1) sampling a training face image, its ground-truth landmark locations and its ground-truth target for each auxiliary task from the predetermined training set; 2) comparing dissimilarity between the predicted facial landmark locations and the ground-truth landmark locations to generate a landmark error; 3) comparing dissimilarities between the target predictions and the ground-truth target for each auxiliary task, respectively, to generate at least one training task error; 4) back-propagating the generated landmark error and all the training task errors through the convolutional neural network to adjust weights on connections between neurons of the convolutional neural network; 5) sampling a validating face image and its ground-truth target for each auxiliary task from a predetermined validation set; 6) comparing dissimilarities between the target prediction and the ground-truth target to generate a validating task error; and 7) determining if the generated training task error is less than a first predetermined
- the facial landmark detection can be optimized together with heterogeneous but subtly auxiliary tasks, so that the detection robustness can be improved through multi-task learning, especially in dealing with faces with severe occlusion and pose variation.
- the training process of the CNN is conducted with an "early stopping" to stop the related tasks which begin to over-fit the training set and thus harm the main task, so as to facilitate learning convergence.
- Fig. 1 is a schematic diagram illustrating a system for facial landmark detection consistent with some disclosed embodiments.
- Fig.2 is a schematic diagram illustrating a training unit as shown in Fig. 1 consistent with some disclosed embodiments.
- Fig. 3 is a schematic diagram illustrating an example of a system for facial landmark detection consistent with some disclosed embodiments, in which an example of a convolutional neural network is shown.
- Fig. 4 is a schematic diagram illustrating a system for facial landmark detection when it is implemented in software consistent with some disclosed embodiments.
- Fig. 5 is a schematic flowchart illustrating a method for facial landmark detection consistent with some disclosed embodiments.
- FIG. 6 is a schematic flowchart illustrating a training process of the multi-task convolutional neural network consistent with some disclosed embodiments.
- Fig. 1 is a schematic diagram illustrating an exemplary system 1000 for facial landmark detection consistent with some disclosed embodiments.
- facial landmark detection (hereinafter, also referred to as main task) is optimized jointly with at least one related/auxiliary task.
- the facial landmark detection means that detect 2D-locations, i.e, 2D coordinates (x and y) of facial region of a face image.
- Examples of facial landmark may include, but not limited to, left and right centers of the eyes, nose, left and right corners of the mouth of a face image.
- Examples of the auxiliary task may include, but not limited to, head pose estimation, demographic such as gender classification, age estimation, facial expression recognition such as smiling or facial attribute inference such as wearing glasses. It shall be appreciated that the number or type of the auxiliary tasks are not limited to those mentioned herein.
- the system 1000 may comprise a feature extractor 100, a training unit 200 and a predictor 300.
- the feature extractor 100 may extract multiple feature maps from at least one facial region of the face image and/or the whole face image. Then, a shared facial feature vector may be generated by the feature extractor 100 from the extracted multiple feature maps.
- the predictor 300 may predict facial landmark locations of the face image from the shared facial feature vector extracted by the feature extractor 100. Simultaneously, the predictor 300 may further, from the shared facial feature vector, predict corresponding target of at least one auxiliary task associated with the facial landmark detection. According to the system 1000, the facial landmark detection can be optimized jointly with the auxiliary tasks.
- the feature extractor 100 may comprise a convolutional neural network.
- the network may comprise a plurality of convolution-pooling layers and a fully connect layer.
- each of the plurality of convolution-pooling layers may perform convolution and max-pooling operations, and the feature maps extracted by a previous layer of the convolution-pooling layers are inputted into a next layer of the convolution-pooling layers to extract feature maps different from the previously extracted feature maps.
- the fully connect layer may generate the shared facial feature vector from all the extracted multiple feature maps.
- the convolutional neural network comprises an input layer, a plurality of (for example, three) convolution-pooling layers comprising one or more (for example, three) convolutional layers and one or more (for example, three) pooling layers, one convolutional layer and one fully connected layer.
- the network is shown for exemplary, and the convolutional neural network in the feature extractor is not limited to it.
- a 40 X 40 (for example) gray-scale face image is inputted in the input layer.
- the first convolution-pooling layer extracts feature maps from the inputted image.
- the second convolution-pooling layer takes the output of first layer as input, to generate different feature maps.
- the multiple layers of feature maps are used by the fully connected layer to generate the shared facial feature vector. That is, the shared facial feature vector is generated by performing multiple times of convolution and max pooling operations.
- Each layer contains a plurality of neurons with local or global receptive fields, and the weights on connection between the neurons of the convolutional neural network may be adjusted, so that the network is trained accordingly.
- the system 1000 may further comprise a training unit 200.
- the training unit 200 may train, with a predetermined training set, the feature extractor so as to adjust the weights on connections between the neurons of the convolutional neural network such that the trained feature extractor is capable of extracting the shared facial feature vector.
- the training unit 200 may comprise a sampler 201, a comparator 202 and a back-propagator 203.
- the sampler 201 may sample a training face image, its ground-truth landmark locations and its ground-truth target for each auxiliary task from the predetermined training set.
- five ground-truth landmarks that is, centers of the eyes, nose tip, corners of the mouth may be annotated directly on each training face image.
- the ground-truth target for each auxiliary task may be labeled manually.
- the ground-truth target may be labeled as female (F) or male (M).
- the ground-truth target may be labeled as wearing (Y) or not wearing (N).
- head pose estimation For head pose estimation,
- (0° , ⁇ 30°, ⁇ 60°) may be labeled and for expression recognition, such as smiling, yes/no may labeled accordingly.
- the comparator 202 may compare dissimilarity between the predicted facial landmark locations and the ground-truth landmark locations to generate a landmark error.
- the landmark error may be obtained by using, for example, least square method.
- the comparator 203 may further compare dissimilarities between the target predictions and the ground-truth target for each auxiliary task, respectively, to generate at least one training task error.
- the training task error may be obtained by using, for example, cross-entropy method.
- the back-propagator 203 may back-propagate the generated landmark error and all the training task errors through the convolutional neural network to adjust weights on connections between the neurons of the convolutional neural network.
- the training unit 200 may further comprise a determiner 204.
- the determiner 204 may determine whether the training process of the facial landmark detection is converged.
- the determiner 204 may further determine whether the training process of each task is converged, which will be discussed later.
- T tasks are trained jointly by the training unit 200.
- main task is denoted as r
- auxiliary task is denoted as a
- the training data is denoted as ,
- N number of the training data.
- the training data is denoted as
- training data is denoted as .
- four tasks and s are
- weights of all the task may be updated accordingly.
- the weight matrix of each task a may be calculated in a similar manner as
- the generated landmark error and the training task errors may be back-propagated layer by layer until the lowest layer by the back-propagator 203 through the convolutional neural network to adjust weights on connections between neurons of the convolutional neural network.
- the error may be propagated back through the network followin a back-propagation strategy as below:
- ⁇ represents the error of the lowest layer, and represents the error of the second lowest layer.
- the errors of the lower la ers are computed following Eq.(3). For instance, where is the gradient of the activation function of the
- the above training process is repeated until the training process of the facial landmark detection is determined by the determiner 204 to be converged. In other words, if the error is less than a predetermined value, the training process will be determined to be converged.
- the feature extractor 100 is capable of extracting the shared facial feature vector from a given face image.
- the trained feature extractor for any face image the trained feature extractor
- the determiner 204 may further determine whether the training process of the auxiliary tasks is converged.
- t represents the current iteration
- k represents a training length
- k represents a training length
- the 'meet denotes the function for calculating median value.
- the first term in Eq.(4) represents the tendency of the training task error of the task a. If the training error drops rapidly within a period of length k, the value of the first term is small, which indicates that training process of the task can be continued as the task is still valuable. Otherwise, the first term is large, then the task is more likely to be stopped. From this, an auxiliary task can be switched off during the training process before it over-fit, so that the task can be "early stopped" before it begins to over-fit the training set and thus harm the main task.
- the feature extractor 100 is capable of extracting a shared facial feature vector from any face image.
- a face image x° is inputted in the input layer of the convolutional neural network as for example shown in Fig. 3.
- ⁇ ( ⁇ ) and W s represent the non-linear activation function applied to the face image and the filters needed to be learned in the layer / of CNN.
- the shared facial feature vector can be
- system 1000 may be implemented using certain hardware, software, or a combination thereof.
- embodiments of the present invention may be adapted to a computer program product embodied on one or more computer readable storage media (comprising but not limited to disk storage, CD-ROM, optical memory and the like) containing computer program codes.
- the system 1000 may include a general purpose computer, a computer cluster, a mainstream computer, a computing device dedicated for providing online contents, or a computer network comprising a group of computers operating in a centralized or distributed fashion.
- the system 1000 may include one or more processors (processors 102, 104, 106 etc.), a memory 112, a storage device 116, and a bus to facilitate information exchange among various components of system 1000.
- processors 102-106 may include a central processing unit (“CPU"), a graphic processing unit (“GPU”), or other suitable information processing devices.
- processors 102-106 can include one or more printed circuit boards, and/or one or more microprocessor chips. Processors 102-106 can execute sequences of computer program instructions to perform various methods that will be explained in greater detail below.
- Memory 112 can include, among other things, a random access memory (“RAM”) and a read-only memory (“ROM”). Computer program instructions can be stored, accessed, and read from memory 112 for execution by one or more of processors 102-106. For example, memory 112 may store one or more software applications. Further, memory 112 may store an entire software application or only a part of a software application that is executable by one or more of processors 102-106. It is noted that although only one block is shown in Fig. 1, memory 112 may include multiple physical devices installed on a central computing device or on different computing devices.
- Fig. 5 shows a schematic flowchart of the method for facial landmark detection
- Fig. 6 shows a schematic flowchart of a training process of the multi-task convolutional neural network by the training unit 200.
- methods 500 and 600 comprise a series of steps that may be performed by one or more of processors 102-106 or each module/unit of the system 1000 to implement a data processing operation.
- processors 102-106 or each module/unit of the system 1000 to implement a data processing operation.
- the following discussion is made in reference to the situation where each module/unit of the system 1000 is made in hardware or the combination of hardware and software.
- the skilled in the art shall appreciate that other suitable devices or systems shall be applicable to carry out the following process and the system 1000 are just used to be an illustration to carry out the process.
- multiple feature maps are extracted by the feature extractor 100 from at least one facial region of the face image in step S501.
- the multiple feature maps may be extracted from the whole face image in step S501.
- a shared facial feature vector is generated from the multiple feature maps extracted in step S501.
- facial landmark locations of the face image is predicted from the shared facial feature vector generated in step S502.
- the shared facial feature vector may be used to predict corresponding target of at least one auxiliary task associated with the facial landmark detection. Then, the target predictions of all the auxiliary tasks are obtained simultaneously.
- the feature extractor comprises a convolutional neural network comprising a plurality of convolution-pooling layers and a fully connect layer.
- Each of convolution-pooling layers is configured to perform convolution and max-pooling operations,.
- the multiple feature maps may be extracted by the plurality of convolution-pooling layer consecutively, wherein the feature maps extracted by a previous layer of the convolution-pooling layers are inputted into a next layer of the convolution-pooling layers to extract feature maps different from the previously extracted feature maps.
- the shared facial feature vector may be generated by the fully connect layer from all the multiple feature maps extracted in step S501.
- the method 500 further comprises a training step (not shown in Fig. 5), which will be discussed with reference to Fig. 6.
- step S601 a training face image, its ground-truth landmark locations and its ground-truth target for each auxiliary task are sampled from the predetermined training set.
- For the training face image its facial landmark prediction and the target predictions of all the auxiliary tasks may be obtained from the predicator 300 accordingly in step S602.
- the dissimilarity between the predicted facial landmark locations and the ground-truth landmark locations is compared to generate a landmark error in step S603.
- step S604 dissimilarities between the target predictions and the ground-truth target for each auxiliary task are compared respectively to generate at least one training task error.
- step S606 it is determined that one of the auxiliary tasks is converged. If no, the process 600 turns back to step S606. If yes, the training process of the task is stopped in step S607 and proceeds to step S608. In the step S608, it is determined that the training process of the facial landmark detection is converged. If yes, the process 600 ends. If no, the process 600 turns back to step S601.
- the facial landmark detection can be optimized together with heterogeneous but subtly related tasks.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Geometry (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The present application disclosed a method and system for detecting facial landmarks of a face image. The method may comprise extracting multiple feature maps from at least one facial region of the face image and/or the whole face image; generating a shared facial feature vector from the extracted multiple feature maps; and predicting facial landmark locations of the face image from the generated shared facial feature vector. With the present method and system, the facial landmark detection can be optimized together with heterogeneous but subtly related task, so that the detection robustness can be improved through multi-task learning.
Description
A METHOD AND A SYSTEM FOR FACIAL LANDMARK DETECTION
BASED ON MULTI-TASK
Technical Field
[0001] The present application relates to face alignment, in particular, to a method and a system for facial landmark detection.
Background
[0002] Facial landmark detection is a fundamental component in many face analysis tasks, such as facial attribute inference, face verification, and face recognition, but has long been impeded by problems of occlusion and pose variation.
[0003] Accurate facial landmark detection can be performed using a cascaded CNN (Convolutional Neural Network), in which faces are divided into different parts by pre-partition, each of which is processed by separate deep CNNs. The resulting outputs are subsequently averaged and channeled to separate cascaded layers to process each facial landmark individually.
[0004] In addition, the facial landmark detection is not a standalone problem, and its estimation may be influenced by a number of heterogeneous and subtly correlated factors. For example, when a kid is smiling, his/her mouth is widely opened. Effectively discovering and exploiting such an intrinsically correlated facial attribute would help in detecting the mouth corners more accurately. Also, the inter-ocular distance is smaller in faces with large yaw rotation. Such pose information may be leveraged as additional source of information to constrain the solution space of landmark estimation. Given the rich set of plausible related tasks, treating facial landmark detection in isolation is counterproductive.
[0005] However, different tasks are inherently different in learning difficulties and have different convergence rates. Further, certain tasks are likely to be over-fitting earlier than the others when learning simultaneously, which will jeopardizes the learning convergence of the whole model.
Summary
[0006] In one aspect of the present application, disclosed is a method for detecting facial landmarks of a face image. The method may comprise extracting multiple feature maps from at least one facial region of the face image; generating a shared facial feature vector from the extracted multiple feature maps; and predicting facial landmark locations of the face image from the generated shared facial feature vector.
[0007] In another aspect of the present application, disclosed is a system for detecting facial landmarks of a face image. The system may comprise a feature extractor and a predictor. The feature extractor may extract multiple feature maps from at least one facial region of the face image and generate a shared facial feature vector from the extracted multiple feature maps. The predictor may predict facial landmark locations of the face image from the shared facial feature vector generated by the feature extractor.
[0008] According to the present application, there is a method for training a convolutional feature network for performing simultaneously facial landmark detection and at least one associated auxiliary task. The method may comprise 1) sampling a training face image, its ground-truth landmark locations and its ground-truth target for each auxiliary task from the predetermined training set; 2) comparing dissimilarity between the predicted facial landmark locations and the ground-truth landmark locations to generate a landmark error; 3) comparing dissimilarities between the target predictions and the ground-truth target for each auxiliary task, respectively, to generate at least one training task error; 4) back-propagating the generated landmark error and all the training task errors through the convolutional neural network to adjust weights on connections between neurons of the convolutional neural network; 5) sampling a validating face image and its ground-truth target for each auxiliary task from a predetermined validation set; 6) comparing dissimilarities between the target prediction and the ground-truth target to generate a validating task error; and 7) determining if the generated training task error is less than a first predetermined value and the generated validating task error is less
than a second predetermined value. If yes, the method for training the convolutional neural network will be terminated, otherwise, the steps l)-7) will be repeated.
[0009] According to the present application, there is further provided a computer-readable medium for storing the instructions executable by one or more processors to implement the above processer of the method.
[0010] In contrast to existing methods, the facial landmark detection can be optimized together with heterogeneous but subtly auxiliary tasks, so that the detection robustness can be improved through multi-task learning, especially in dealing with faces with severe occlusion and pose variation.
[0011] According to the present application, only one single CNN is used, and thus complexity of the required system/device can be reduced. Neither pre-partition of faces nor cascaded convolutional neural layers are required, leading to drastic reduction in model complexity, whilst still achieving comparable or even better accuracy.
[0012] As training proceeds, certain related tasks which are no longer beneficial to the main task when they reach their peak performance, and thus their training process can be halted. According to the present application, the training process of the CNN is conducted with an "early stopping" to stop the related tasks which begin to over-fit the training set and thus harm the main task, so as to facilitate learning convergence.
Brief Description of the Drawing
[0013] Exemplary non-limiting embodiments of the present invention are described below with reference to the attached drawings. The drawings are illustrative and generally not to an exact scale. The same or similar elements on different figures are referenced with the same reference numbers.
[0014] Fig. 1 is a schematic diagram illustrating a system for facial landmark detection consistent with some disclosed embodiments.
[0015] Fig.2 is a schematic diagram illustrating a training unit as shown in Fig. 1 consistent with some disclosed embodiments.
[0016] Fig. 3 is a schematic diagram illustrating an example of a system for facial landmark detection consistent with some disclosed embodiments, in which an example of a convolutional neural network is shown.
[0017] Fig. 4 is a schematic diagram illustrating a system for facial landmark detection when it is implemented in software consistent with some disclosed embodiments.
[0018] Fig. 5 is a schematic flowchart illustrating a method for facial landmark detection consistent with some disclosed embodiments.
[0019] Fig. 6 is a schematic flowchart illustrating a training process of the multi-task convolutional neural network consistent with some disclosed embodiments.
Detailed Description
[0020] Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When appropriate, the same reference numbers are used throughout the drawings to refer to the same or like parts.
[0021] Fig. 1 is a schematic diagram illustrating an exemplary system 1000 for facial landmark detection consistent with some disclosed embodiments. According to the system 1000, facial landmark detection (hereinafter, also referred to as main task) is optimized jointly with at least one related/auxiliary task. The facial landmark detection means that detect 2D-locations, i.e, 2D coordinates (x and y) of facial region of a face image. Examples of facial landmark may include, but not limited to, left and right centers of the eyes, nose, left and right corners of the mouth of a face image. Examples of the auxiliary task may include, but not limited to, head pose estimation, demographic such as gender classification, age estimation, facial expression recognition such as smiling or facial attribute inference such as wearing glasses. It shall be appreciated that the number or type of the auxiliary tasks are not limited to those mentioned herein.
[0022] Referring to Fig. 1 again, where the system 1000 is implemented by the hardware, it may comprise a feature extractor 100, a training unit 200 and a predictor
300. The feature extractor 100 may extract multiple feature maps from at least one facial region of the face image and/or the whole face image. Then, a shared facial feature vector may be generated by the feature extractor 100 from the extracted multiple feature maps.
[0023] The predictor 300 may predict facial landmark locations of the face image from the shared facial feature vector extracted by the feature extractor 100. Simultaneously, the predictor 300 may further, from the shared facial feature vector, predict corresponding target of at least one auxiliary task associated with the facial landmark detection. According to the system 1000, the facial landmark detection can be optimized jointly with the auxiliary tasks.
[0024] According to an embodiment, the feature extractor 100 may comprise a convolutional neural network. The network may comprise a plurality of convolution-pooling layers and a fully connect layer. In the network, each of the plurality of convolution-pooling layers may perform convolution and max-pooling operations, and the feature maps extracted by a previous layer of the convolution-pooling layers are inputted into a next layer of the convolution-pooling layers to extract feature maps different from the previously extracted feature maps. The fully connect layer may generate the shared facial feature vector from all the extracted multiple feature maps.
[0025] An example of the network is shown in Fig. 3, in which the convolutional neural network comprises an input layer, a plurality of (for example, three) convolution-pooling layers comprising one or more (for example, three) convolutional layers and one or more (for example, three) pooling layers, one convolutional layer and one fully connected layer. It is noted that the network is shown for exemplary, and the convolutional neural network in the feature extractor is not limited to it. As shown in Fig. 3, a 40 X 40 (for example) gray-scale face image is inputted in the input layer. The first convolution-pooling layer extracts feature maps from the inputted image. Then, the second convolution-pooling layer takes the output of first layer as input, to generate different feature maps. This process is continued by using all three convolution-pooling layers. At the end, the multiple layers of feature
maps are used by the fully connected layer to generate the shared facial feature vector. That is, the shared facial feature vector is generated by performing multiple times of convolution and max pooling operations. Each layer contains a plurality of neurons with local or global receptive fields, and the weights on connection between the neurons of the convolutional neural network may be adjusted, so that the network is trained accordingly.
[0026] According to an embodiment, the system 1000 may further comprise a training unit 200. The training unit 200 may train, with a predetermined training set, the feature extractor so as to adjust the weights on connections between the neurons of the convolutional neural network such that the trained feature extractor is capable of extracting the shared facial feature vector. According to an embodiment of the present application shown in Fig. 2, the training unit 200 may comprise a sampler 201, a comparator 202 and a back-propagator 203.
[0027] As shown in Fig. 2, the sampler 201 may sample a training face image, its ground-truth landmark locations and its ground-truth target for each auxiliary task from the predetermined training set. According to an embodiment, five ground-truth landmarks, that is, centers of the eyes, nose tip, corners of the mouth may be annotated directly on each training face image. According to another embodiment, the ground-truth target for each auxiliary task may be labeled manually. For example, for gender classification, the ground-truth target may be labeled as female (F) or male (M). For facial attribute inference, such as wearing glasses, the ground-truth target may be labeled as wearing (Y) or not wearing (N). For head pose estimation,
(0° , ±30°, ±60°) may be labeled and for expression recognition, such as smiling, yes/no may labeled accordingly.
[0028] The comparator 202 may compare dissimilarity between the predicted facial landmark locations and the ground-truth landmark locations to generate a landmark error. The landmark error may be obtained by using, for example, least square method. The comparator 203 may further compare dissimilarities between the target predictions and the ground-truth target for each auxiliary task, respectively, to
generate at least one training task error. According to another embodiment, the training task error may be obtained by using, for example, cross-entropy method.
[0029] The back-propagator 203 may back-propagate the generated landmark error and all the training task errors through the convolutional neural network to adjust weights on connections between the neurons of the convolutional neural network.
[0030] According to an embodiment, the training unit 200 may further comprise a determiner 204. The determiner 204 may determine whether the training process of the facial landmark detection is converged. According to another embodiment, the determiner 204 may further determine whether the training process of each task is converged, which will be discussed later.
[0031] Hereinafter, components in the training unit 200 as mentioned above will be discussed in detail. For purpose of illustration, we will describe an embodiment in which T tasks are trained jointly by the training unit 200. For the T tasks, facial landmark detection, i.e., main task is denoted as r, and one of at least one related/auxiliary task is denoted as a, where
[0032] For each of the tasks, the training data is denoted as ,
particular, for the facial landmark detection r, the training data is denoted as
shown and represent inferences of 'pose', 'gender', 'wear glasses', and 'smiling', respectively. Thus, represents five different poses
wearing/wearing glasses and not smiling/smiling, respectively. Different weights are assigned to the main task r and each auxiliary task a, and are denoted as W and respectively.
[0033] Then, an objection function of all the tasks is formulated as below to optimize the main task r and the auxiliary task a:
where, is a linear function of and a weight vector
represents loss function; represents importance coefficient of a-th task's error; and represents a shared facial feature vector.
[0034] According to an embodiment, least square and cross-entropy functions are used as the loss function /(·) for the main task r and the auxiliary task a, respectively, to generate corresponding landmark error and training task errors. Therefore, the above objective function can be rewritten as below:
[0035] In the in the first term is a linear function. The
second term is a posterior probability function
term penalizes large weights
[0036] According to an embodiment, weights of all the task may be updated accordingly. In particular, the weight matrix of the facial landmark detection is updated by where η represents the learning rate (such as η = 0.003),
and Also, the weight matrix of each task a may be
calculated in a similar manner as
[0037] Then, the generated landmark error and the training task errors may be back-propagated layer by layer until the lowest layer by the back-propagator 203 through the convolutional neural network to adjust weights on connections between neurons of the convolutional neural network.
[0038] According to an embodiment, the error may be propagated back through the network followin a back-propagation strategy as below:
[0039] In the
q represent all the error in the layer with
. For example, ε represents the error of the lowest layer, and
represents the error of the second lowest layer. The errors of the lower la ers are computed following Eq.(3). For instance,
where is the gradient of the activation function of the
network
[0040] The above training process is repeated until the training process of the facial landmark detection is determined by the determiner 204 to be converged. In other words, if the error is less than a predetermined value, the training process will be determined to be converged. With the above training process, the feature extractor 100 is capable of extracting the shared facial feature vector from a given face image.
[0041] During the above training process, at least one auxiliary task is trained simultaneously. However, different tasks have different loss functions and learning
difficulties, and thus have different convergence rates. According to another embodiment, the determiner 204 may further determine whether the training process of the auxiliary tasks is converged.
on a validation set and the training set, respectively, if one task's measure exceeds a threshold as below, the task will stop:
[0043] In the Eq.(4), t represents the current iteration, k represents a training length, and
represents the importance coefficient of a-th task's error. The 'meet denotes the function for calculating median value. The first term in Eq.(4) represents the tendency of the training task error of the task a. If the training error drops rapidly within a period of length k, the value of the first term is small, which indicates that training process of the task can be continued as the task is still valuable. Otherwise, the first term is large, then the task is more likely to be stopped. From this, an auxiliary task can be switched off during the training process before it over-fit, so that the task can be "early stopped" before it begins to over-fit the training set and thus harm the main task.
[0044] With the above training process, the feature extractor 100 is capable of extracting a shared facial feature vector from any face image. For example, a face image x° is inputted in the input layer of the convolutional neural network as for example shown in Fig. 3. There are multiple sets of convolutional filters plus an activation function applied to a face image in each convolutional layer in the CNN and they are applied sequentially to project the face image to the higher layer. That is, the face image is projected to higher layer gradually by learning a sequence of non-linear mappings as below to obtain the shared facial feature vector x
[0045] Here, σ(·) and Ws represent the non-linear activation function applied to the face image and the filters needed to be learned in the layer / of CNN. For instance, Referring to Fig. 3 again, the shared facial feature vector can
be used for landmark detection and auxiliary/related tasks simultaneously in the estimation stage.
[0046] It shall be appreciated that the system 1000 may be implemented using certain hardware, software, or a combination thereof. In addition, the embodiments of the present invention may be adapted to a computer program product embodied on one or more computer readable storage media (comprising but not limited to disk storage, CD-ROM, optical memory and the like) containing computer program codes.
[0047] In the case that the system 1000 is implemented with software, the system 1000 may include a general purpose computer, a computer cluster, a mainstream computer, a computing device dedicated for providing online contents, or a computer network comprising a group of computers operating in a centralized or distributed fashion. As shown in Fig. 4, the system 1000 may include one or more processors (processors 102, 104, 106 etc.), a memory 112, a storage device 116, and a bus to facilitate information exchange among various components of system 1000. Processors 102-106 may include a central processing unit ("CPU"), a graphic processing unit ("GPU"), or other suitable information processing devices. Depending on the type of hardware being used, processors 102-106 can include one or more printed circuit boards, and/or one or more microprocessor chips. Processors 102-106 can execute sequences of computer program instructions to perform various methods that will be explained in greater detail below.
[0048] Memory 112 can include, among other things, a random access memory ("RAM") and a read-only memory ("ROM"). Computer program instructions can be stored, accessed, and read from memory 112 for execution by one or more of processors 102-106. For example, memory 112 may store one or more software applications. Further, memory 112 may store an entire software application or only a part of a software application that is executable by one or more of processors 102-106.
It is noted that although only one block is shown in Fig. 1, memory 112 may include multiple physical devices installed on a central computing device or on different computing devices.
[0049] The system for facial landmark detection is described in the above. The following will describe a method for facial landmark detection with reference to Figs. 5 and 6.
[0050] Fig. 5 shows a schematic flowchart of the method for facial landmark detection and Fig. 6 shows a schematic flowchart of a training process of the multi-task convolutional neural network by the training unit 200.
[0051] In Figs. 5 and 6, methods 500 and 600 comprise a series of steps that may be performed by one or more of processors 102-106 or each module/unit of the system 1000 to implement a data processing operation. For purpose of description, the following discussion is made in reference to the situation where each module/unit of the system 1000 is made in hardware or the combination of hardware and software. The skilled in the art shall appreciate that other suitable devices or systems shall be applicable to carry out the following process and the system 1000 are just used to be an illustration to carry out the process.
[0052] As shown in Fig. 5, multiple feature maps are extracted by the feature extractor 100 from at least one facial region of the face image in step S501. In another embodiment, the multiple feature maps may be extracted from the whole face image in step S501. Then, in step S502, a shared facial feature vector is generated from the multiple feature maps extracted in step S501. In step S503, facial landmark locations of the face image is predicted from the shared facial feature vector generated in step S502. According to another embodiment, the shared facial feature vector may be used to predict corresponding target of at least one auxiliary task associated with the facial landmark detection. Then, the target predictions of all the auxiliary tasks are obtained simultaneously.
[0053] According to an embodiment, the feature extractor comprises a convolutional neural network comprising a plurality of convolution-pooling layers and a fully connect layer. Each of convolution-pooling layers is configured to perform
convolution and max-pooling operations,. In the embodiment, in step S501, the multiple feature maps may be extracted by the plurality of convolution-pooling layer consecutively, wherein the feature maps extracted by a previous layer of the convolution-pooling layers are inputted into a next layer of the convolution-pooling layers to extract feature maps different from the previously extracted feature maps. In step S502, the shared facial feature vector may be generated by the fully connect layer from all the multiple feature maps extracted in step S501.
[0054] In the embodiment, the method 500 further comprises a training step (not shown in Fig. 5), which will be discussed with reference to Fig. 6.
[0055] As shown in Fig. 6, in step S601, a training face image, its ground-truth landmark locations and its ground-truth target for each auxiliary task are sampled from the predetermined training set. For the training face image, its facial landmark prediction and the target predictions of all the auxiliary tasks may be obtained from the predicator 300 accordingly in step S602. Then, the dissimilarity between the predicted facial landmark locations and the ground-truth landmark locations is compared to generate a landmark error in step S603. In step S604, dissimilarities between the target predictions and the ground-truth target for each auxiliary task are compared respectively to generate at least one training task error. Then, the generated landmark error and all the training task errors are back-propagated through the convolutional neural network to adjust weights on connections between the neurons of the convolutional neural network in step S605. In step S606, it is determined that one of the auxiliary tasks is converged. If no, the process 600 turns back to step S606. If yes, the training process of the task is stopped in step S607 and proceeds to step S608. In the step S608, it is determined that the training process of the facial landmark detection is converged. If yes, the process 600 ends. If no, the process 600 turns back to step S601.
[0056] From this, the facial landmark detection can be optimized together with heterogeneous but subtly related tasks.
[0057] Although the preferred examples of the present invention have been described, those skilled in the art can make variations or modifications to these
examples upon knowing the basic inventive concept. The appended claims is intended to be considered as comprising the preferred examples and all the variations or modifications fell into the scope of the present invention.
[0058] Obviously, those skilled in the art can make variations or modifications to the present invention without departing the spirit and scope of the present invention. As such, if these variations or modifications belong to the scope of the claims and equivalent technique, they may also fall into the scope of the present invention.
Claims
1. A method for detecting facial landmarks of a face image, comprising:
extracting multiple feature maps from at least one facial region of the face image;
generating a shared facial feature vector from the extracted multiple feature maps; and
predicting facial landmark locations of the face image from the generated shared facial feature vector.
2. A method of claim 1, wherein the facial landmark comprises at least one selected from a group consisting of centers of the eyes, nose, corners of the mouth of a face image.
3. A method of claim 1, wherein in the step of predicting, the shared facial feature vector is used to predict corresponding target of at least one auxiliary task associated with the facial landmark detection, so as to obtain target predictions of all the auxiliary tasks simultaneously.
4. A method of claim 3, wherein the auxiliary tasks comprises at least one selected from a group consisting of head pose estimation, gender classification, age estimation, facial expression recognition or facial attribute inference.
5. A method of claim 4, wherein the step of extracting and generating are performed by a convolutional neural network comprising a plurality of convolution-pooling layers, each of which is configured to perform convolution and max-pooling operations, and
wherein the step of extracting further comprises:
extracting the multiple feature maps by the plurality of convolution-pooling layer consecutively, wherein the feature maps extracted by a previous layer of the convolution-pooling layers are inputted into a next layer of the convolution-pooling
layers to extract feature maps different from the previously extracted feature maps.
6. A method of claim 5, wherein the convolutional neural network further comprises a fully connect layer, and in the step of generating, the shared facial feature vector is generated by the fully connect layer from all the extracted multiple feature maps.
7. A method of claim 6, wherein each layer of the convolutional neural network has a plurality of neurons, and wherein the method further comprises:
training, with a predetermined training set, the network so as to adjust each weight on connections between the neurons of the network such that the shared facial feature vector is generated by the network with the adjusted weight.
8. A method of claim 7, wherein the step of training further comprises:
sampling a training face image, its ground-truth landmark locations and its ground-truth target for each auxiliary task from the predetermined training set;
comparing dissimilarities between the predicted facial landmark locations and the ground-truth landmark locations to generate a landmark error;
comparing dissimilarities between the target predictions and the ground-truth target for each auxiliary task, respectively, to generate at least one training task error; and
back-propagating the generated landmark error and the generated training task error through the convolutional neural network to adjust weights on connections between the neurons of the convolutional neural network;
repeating the steps of sampling, comparing and back-propagating until the generated landmark error is less than a first predetermined value and the generated training task error is less than a second predetermined value.
9. A method of claim 8, wherein the comparison to generate a landmark error is performed by rule of a least square process and the comparison to generate a training
task error is performed by rule of a cross-entropy process.
10. A method of claim 8, wherein, for each auxiliary task, the step of training further comprising:
sampling a validating face image and its ground-truth target for each auxiliary tasks from a predetermined validation set;
comparing dissimilarity between the target prediction and the ground-truth target to generate a validating task error;
repeating the sampling and the comparing until the generated training task error is less than a third predetermined value and the generated validating task error is less than a fourth predetermined value.
11. A method of claim 1, wherein in the step of predicting, the predicted facial landmark locations of the face image is determined by rule of
12. A system for detecting facial landmarks of a face image, comprising:
a feature extractor configured to,
extract multiple feature maps from at least one facial region of the face image; and
generate a shared facial feature vector from the extracted multiple feature maps; and
a predictor configured to predict facial landmark locations of the face image from the shared facial feature vector generated by the feature extractor.
13. A system of claim 12, wherein the predictor is further configured to obtain target predictions of at least one auxiliary task associated with the facial landmark detection by using the shared facial feature vector simultaneously.
14. A system of claim 12, wherein the feature extractor further comprises a convolutional neural network, wherein the convolutional neural network comprises:
a plurality of convolution-pooling layers configured to perform convolution and max-pooling operations, and wherein the feature maps extracted by a previous layer of the convolution-pooling layers are inputted into a next layer of the convolution-pooling layers to extract feature maps different from the previously extracted feature maps; and
a fully connect layer configured to generate the shared facial feature vector from all the extracted multiple feature maps.
15. A system of claim 13, wherein each layer of the convolutional neural network has a plurality of neurons, and wherein the system further comprises:
a training unit configured to train, with a predetermined training set, the network so as to adjust the weights on connections between the neurons of the network such that the trained network is capable of extracting the shared facial feature vector.
16. A system of claim 15, wherein the training unit further comprises:
a sampler configured to sample a training face image, its ground-truth landmark locations and its ground-truth target for each auxiliary task from the predetermined training set;
a comparator is configured to compare dissimilarity between the predicted facial landmark locations and the ground-truth landmark locations to generate a landmark error and to compare dissimilarities between the target predictions and the ground-truth target for each auxiliary task, respectively, to generate at least one training task error; and
a back-propagator configured to back-propagate the generated landmark error and the training task errors through the convolutional neural network to adjust weights on connections between the neurons of the convolutional neural network.
17. A system of claim 15, wherein the training unit further comprises:
a determiner configured to determine whether training process of the facial landmark detection is converged and whether training process of each task is converged.
18. A system of claim 12, wherein the facial landmark comprises at least one selected from a group consisting of centers of the eyes, nose, corners of the mouth of a face image.
19. A system of claim 13, wherein the auxiliary tasks comprises at least one selected from a group consisting of head pose estimation, gender classification, age estimation, facial expression recognition or facial attribute inference
20. A method for training a convolutional neural network for performing simultaneously facial landmark detection and at least one associated auxiliary task, comprising:
1) sampling a training face image, its ground-truth landmark locations and its ground-truth target for each auxiliary task from the predetermined training set;
2) comparing dissimilarity between the predicted facial landmark locations and the ground-truth landmark locations to generate a landmark error;
3) comparing dissimilarities between the target predictions and the ground-truth target for each auxiliary task, respectively, to generate at least one training task error;
4) back-propagating the generated landmark error and all the training task errors through the convolutional neural network to adjust weights on connections between neurons of the convolutional neural network;
5) sampling a validating face image and its ground-truth target for each auxiliary task from a predetermined validation set;
6) comparing dissimilarities between the target prediction and the ground-truth target to generate a validating task error;
7) determining if the generated training task error is less than a first
predetermined value and the generated validating task error is less than a second predetermined value; and
if yes, the method for training the convolutional neural network will be terminated, otherwise, the steps l)-7) will be repeated.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2014/000769 WO2016026063A1 (en) | 2014-08-21 | 2014-08-21 | A method and a system for facial landmark detection based on multi-task |
CN201480081241.1A CN106575367B (en) | 2014-08-21 | 2014-08-21 | Method and system for the face critical point detection based on multitask |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2014/000769 WO2016026063A1 (en) | 2014-08-21 | 2014-08-21 | A method and a system for facial landmark detection based on multi-task |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016026063A1 true WO2016026063A1 (en) | 2016-02-25 |
Family
ID=55350056
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2014/000769 WO2016026063A1 (en) | 2014-08-21 | 2014-08-21 | A method and a system for facial landmark detection based on multi-task |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN106575367B (en) |
WO (1) | WO2016026063A1 (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105957095A (en) * | 2016-06-15 | 2016-09-21 | 电子科技大学 | Gray-scale image based Spiking angular point detection method |
CN106951840A (en) * | 2017-03-09 | 2017-07-14 | 北京工业大学 | A kind of facial feature points detection method |
JP2017211799A (en) * | 2016-05-25 | 2017-11-30 | キヤノン株式会社 | Information processing device and information processing method |
JP2018055377A (en) * | 2016-09-28 | 2018-04-05 | 日本電信電話株式会社 | Multitask processing device, multitask model learning device, and program |
WO2018090905A1 (en) * | 2016-11-15 | 2018-05-24 | Huawei Technologies Co., Ltd. | Automatic identity detection |
CN108073910A (en) * | 2017-12-29 | 2018-05-25 | 百度在线网络技术(北京)有限公司 | For generating the method and apparatus of face characteristic |
CN108292363A (en) * | 2016-07-22 | 2018-07-17 | 日电实验室美国公司 | In vivo detection for anti-fraud face recognition |
CN109145798A (en) * | 2018-08-13 | 2019-01-04 | 浙江零跑科技有限公司 | A kind of Driving Scene target identification and travelable region segmentation integrated approach |
CN109635750A (en) * | 2018-12-14 | 2019-04-16 | 广西师范大学 | A kind of compound convolutional neural networks images of gestures recognition methods under complex background |
CN110163098A (en) * | 2019-04-17 | 2019-08-23 | 西北大学 | Based on the facial expression recognition model construction of depth of seam division network and recognition methods |
EP3529747A4 (en) * | 2016-10-19 | 2019-10-09 | Snap Inc. | Neural networks for facial modeling |
US10467459B2 (en) | 2016-09-09 | 2019-11-05 | Microsoft Technology Licensing, Llc | Object detection based on joint feature extraction |
WO2019221739A1 (en) * | 2018-05-17 | 2019-11-21 | Hewlett-Packard Development Company, L.P. | Image location identification |
CN111191675A (en) * | 2019-12-03 | 2020-05-22 | 深圳市华尊科技股份有限公司 | Pedestrian attribute recognition model implementation method and related device |
WO2020199931A1 (en) * | 2019-04-02 | 2020-10-08 | 腾讯科技(深圳)有限公司 | Face key point detection method and apparatus, and storage medium and electronic device |
WO2021036726A1 (en) * | 2019-08-29 | 2021-03-04 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Method, system, and computer-readable medium for using face alignment model based on multi-task convolutional neural network-obtained data |
CN112820382A (en) * | 2021-02-04 | 2021-05-18 | 上海小芃科技有限公司 | Breast cancer postoperative intelligent rehabilitation training method, device, equipment and storage medium |
CN107871106B (en) * | 2016-09-26 | 2021-07-06 | 北京眼神科技有限公司 | Face detection method and device |
WO2022003982A1 (en) * | 2020-07-03 | 2022-01-06 | 日本電気株式会社 | Detection device, learning device, detection method, and storage medium |
US11776323B2 (en) | 2022-02-15 | 2023-10-03 | Ford Global Technologies, Llc | Biometric task network |
US11954881B2 (en) | 2018-08-28 | 2024-04-09 | Apple Inc. | Semi-supervised learning using clustering as an additional constraint |
Families Citing this family (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107145857B (en) * | 2017-04-29 | 2021-05-04 | 深圳市深网视界科技有限公司 | Face attribute recognition method and device and model establishment method |
CN107038429A (en) * | 2017-05-03 | 2017-08-11 | 四川云图睿视科技有限公司 | A kind of multitask cascade face alignment method based on deep learning |
CN106951888B (en) * | 2017-05-09 | 2020-12-01 | 安徽大学 | Relative coordinate constraint method and positioning method of human face characteristic point |
CN107358149B (en) * | 2017-05-27 | 2020-09-22 | 深圳市深网视界科技有限公司 | Human body posture detection method and device |
CN107578055B (en) * | 2017-06-20 | 2020-04-14 | 北京陌上花科技有限公司 | Image prediction method and device |
CN108229288B (en) * | 2017-06-23 | 2020-08-11 | 北京市商汤科技开发有限公司 | Neural network training and clothes color detection method and device, storage medium and electronic equipment |
CN107563279B (en) * | 2017-07-22 | 2020-12-22 | 复旦大学 | Model training method for adaptive weight adjustment aiming at human body attribute classification |
US11341631B2 (en) | 2017-08-09 | 2022-05-24 | Shenzhen Keya Medical Technology Corporation | System and method for automatically detecting a physiological condition from a medical image of a patient |
CN107423727B (en) * | 2017-08-14 | 2018-07-10 | 河南工程学院 | Face complex expression recognition methods based on neural network |
CN107704848A (en) * | 2017-10-27 | 2018-02-16 | 深圳市唯特视科技有限公司 | A kind of intensive face alignment method based on multi-constraint condition convolutional neural networks |
CN108196535B (en) * | 2017-12-12 | 2021-09-07 | 清华大学苏州汽车研究院(吴江) | Automatic driving system based on reinforcement learning and multi-sensor fusion |
CN107992864A (en) * | 2018-01-15 | 2018-05-04 | 武汉神目信息技术有限公司 | A kind of vivo identification method and device based on image texture |
CN110060296A (en) * | 2018-01-18 | 2019-07-26 | 北京三星通信技术研究有限公司 | Estimate method, electronic equipment and the method and apparatus for showing virtual objects of posture |
CN108399373B (en) * | 2018-02-06 | 2019-05-10 | 北京达佳互联信息技术有限公司 | The model training and its detection method and device of face key point |
US10990820B2 (en) * | 2018-03-06 | 2021-04-27 | Dus Operating Inc. | Heterogeneous convolutional neural network for multi-problem solving |
CN108416314B (en) * | 2018-03-16 | 2022-03-08 | 中山大学 | Picture important face detection method |
CN108615016B (en) * | 2018-04-28 | 2020-06-19 | 北京华捷艾米科技有限公司 | Face key point detection method and face key point detection device |
CN109147940B (en) * | 2018-07-05 | 2021-05-25 | 科亚医疗科技股份有限公司 | Apparatus and system for automatically predicting physiological condition from medical image of patient |
CN109522910B (en) * | 2018-12-25 | 2020-12-11 | 浙江商汤科技开发有限公司 | Key point detection method and device, electronic equipment and storage medium |
CN109829431B (en) * | 2019-01-31 | 2021-02-12 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating information |
CN111563397B (en) * | 2019-02-13 | 2023-04-18 | 阿里巴巴集团控股有限公司 | Detection method, detection device, intelligent equipment and computer storage medium |
CN109902641B (en) * | 2019-03-06 | 2021-03-02 | 中国科学院自动化研究所 | Semantic alignment-based face key point detection method, system and device |
CN110136828A (en) * | 2019-05-16 | 2019-08-16 | 杭州健培科技有限公司 | A method of medical image multitask auxiliary diagnosis is realized based on deep learning |
CN110705419A (en) * | 2019-09-24 | 2020-01-17 | 新华三大数据技术有限公司 | Emotion recognition method, early warning method, model training method and related device |
CN111339813B (en) * | 2019-09-30 | 2022-09-27 | 深圳市商汤科技有限公司 | Face attribute recognition method and device, electronic equipment and storage medium |
CN111860101A (en) * | 2020-04-24 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | Training method and device for face key point detection model |
KR102538804B1 (en) * | 2020-11-16 | 2023-06-01 | 상명대학교 산학협력단 | Device and method for landmark detection using artificial intelligence |
CN112488003A (en) * | 2020-12-03 | 2021-03-12 | 深圳市捷顺科技实业股份有限公司 | Face detection method, model creation method, device, equipment and medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1352436A (en) * | 2000-11-15 | 2002-06-05 | 星创科技股份有限公司 | Real-time face identification system |
CN101673340A (en) * | 2009-08-13 | 2010-03-17 | 重庆大学 | Method for identifying human ear by colligating multi-direction and multi-dimension and BP neural network |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4778158B2 (en) * | 2001-05-31 | 2011-09-21 | オリンパス株式会社 | Image selection support device |
CN102831382A (en) * | 2011-06-15 | 2012-12-19 | 北京三星通信技术研究有限公司 | Face tracking apparatus and method |
CN103824054B (en) * | 2014-02-17 | 2018-08-07 | 北京旷视科技有限公司 | A kind of face character recognition methods based on cascade deep neural network |
-
2014
- 2014-08-21 WO PCT/CN2014/000769 patent/WO2016026063A1/en active Application Filing
- 2014-08-21 CN CN201480081241.1A patent/CN106575367B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1352436A (en) * | 2000-11-15 | 2002-06-05 | 星创科技股份有限公司 | Real-time face identification system |
CN101673340A (en) * | 2009-08-13 | 2010-03-17 | 重庆大学 | Method for identifying human ear by colligating multi-direction and multi-dimension and BP neural network |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10909455B2 (en) | 2016-05-25 | 2021-02-02 | Canon Kabushiki Kaisha | Information processing apparatus using multi-layer neural network and method therefor |
JP2017211799A (en) * | 2016-05-25 | 2017-11-30 | キヤノン株式会社 | Information processing device and information processing method |
CN105957095A (en) * | 2016-06-15 | 2016-09-21 | 电子科技大学 | Gray-scale image based Spiking angular point detection method |
CN108292363B (en) * | 2016-07-22 | 2022-05-24 | 日电实验室美国公司 | Living body detection for spoof-proof facial recognition |
CN108292363A (en) * | 2016-07-22 | 2018-07-17 | 日电实验室美国公司 | In vivo detection for anti-fraud face recognition |
JP2019508801A (en) * | 2016-07-22 | 2019-03-28 | エヌイーシー ラボラトリーズ アメリカ インクNEC Laboratories America, Inc. | Biometric detection for anti-spoofing face recognition |
US10467459B2 (en) | 2016-09-09 | 2019-11-05 | Microsoft Technology Licensing, Llc | Object detection based on joint feature extraction |
CN107871106B (en) * | 2016-09-26 | 2021-07-06 | 北京眼神科技有限公司 | Face detection method and device |
JP2018055377A (en) * | 2016-09-28 | 2018-04-05 | 日本電信電話株式会社 | Multitask processing device, multitask model learning device, and program |
EP4266249A3 (en) * | 2016-10-19 | 2024-01-17 | Snap Inc. | Neural networks for facial modeling |
US11100311B2 (en) | 2016-10-19 | 2021-08-24 | Snap Inc. | Neural networks for facial modeling |
EP3529747A4 (en) * | 2016-10-19 | 2019-10-09 | Snap Inc. | Neural networks for facial modeling |
US10460153B2 (en) | 2016-11-15 | 2019-10-29 | Futurewei Technologies, Inc. | Automatic identity detection |
WO2018090905A1 (en) * | 2016-11-15 | 2018-05-24 | Huawei Technologies Co., Ltd. | Automatic identity detection |
CN106951840A (en) * | 2017-03-09 | 2017-07-14 | 北京工业大学 | A kind of facial feature points detection method |
CN108073910A (en) * | 2017-12-29 | 2018-05-25 | 百度在线网络技术(北京)有限公司 | For generating the method and apparatus of face characteristic |
WO2019221739A1 (en) * | 2018-05-17 | 2019-11-21 | Hewlett-Packard Development Company, L.P. | Image location identification |
CN109145798A (en) * | 2018-08-13 | 2019-01-04 | 浙江零跑科技有限公司 | A kind of Driving Scene target identification and travelable region segmentation integrated approach |
CN109145798B (en) * | 2018-08-13 | 2021-10-22 | 浙江零跑科技股份有限公司 | Driving scene target identification and travelable region segmentation integration method |
US11954881B2 (en) | 2018-08-28 | 2024-04-09 | Apple Inc. | Semi-supervised learning using clustering as an additional constraint |
CN109635750A (en) * | 2018-12-14 | 2019-04-16 | 广西师范大学 | A kind of compound convolutional neural networks images of gestures recognition methods under complex background |
US11734851B2 (en) | 2019-04-02 | 2023-08-22 | Tencent Technology (Shenzhen) Company Limited | Face key point detection method and apparatus, storage medium, and electronic device |
WO2020199931A1 (en) * | 2019-04-02 | 2020-10-08 | 腾讯科技(深圳)有限公司 | Face key point detection method and apparatus, and storage medium and electronic device |
CN110163098A (en) * | 2019-04-17 | 2019-08-23 | 西北大学 | Based on the facial expression recognition model construction of depth of seam division network and recognition methods |
WO2021036726A1 (en) * | 2019-08-29 | 2021-03-04 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Method, system, and computer-readable medium for using face alignment model based on multi-task convolutional neural network-obtained data |
US12033364B2 (en) | 2019-08-29 | 2024-07-09 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Method, system, and computer-readable medium for using face alignment model based on multi-task convolutional neural network-obtained data |
CN111191675B (en) * | 2019-12-03 | 2023-10-24 | 深圳市华尊科技股份有限公司 | Pedestrian attribute identification model realization method and related device |
CN111191675A (en) * | 2019-12-03 | 2020-05-22 | 深圳市华尊科技股份有限公司 | Pedestrian attribute recognition model implementation method and related device |
WO2022003982A1 (en) * | 2020-07-03 | 2022-01-06 | 日本電気株式会社 | Detection device, learning device, detection method, and storage medium |
JP7513094B2 (en) | 2020-07-03 | 2024-07-09 | 日本電気株式会社 | DETECTION APPARATUS, LEARNING APPARATUS, DETECTION METHOD, AND PROGRAM |
CN112820382A (en) * | 2021-02-04 | 2021-05-18 | 上海小芃科技有限公司 | Breast cancer postoperative intelligent rehabilitation training method, device, equipment and storage medium |
US11776323B2 (en) | 2022-02-15 | 2023-10-03 | Ford Global Technologies, Llc | Biometric task network |
Also Published As
Publication number | Publication date |
---|---|
CN106575367A (en) | 2017-04-19 |
CN106575367B (en) | 2018-11-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2016026063A1 (en) | A method and a system for facial landmark detection based on multi-task | |
US20220392234A1 (en) | Training neural networks for vehicle re-identification | |
US9811718B2 (en) | Method and a system for face verification | |
CN106415594B (en) | Method and system for face verification | |
US11288835B2 (en) | Lighttrack: system and method for online top-down human pose tracking | |
EP3074918B1 (en) | Method and system for face image recognition | |
US11836931B2 (en) | Target detection method, apparatus and device for continuous images, and storage medium | |
CN109271958B (en) | Face age identification method and device | |
JP2023134499A (en) | Robust training in presence of label noise | |
Glauner | Deep convolutional neural networks for smile recognition | |
Gong et al. | Model-based oversampling for imbalanced sequence classification | |
US11488309B2 (en) | Robust machine learning for imperfect labeled image segmentation | |
US20120243779A1 (en) | Recognition device, recognition method, and computer program product | |
US10592786B2 (en) | Generating labeled data for deep object tracking | |
CN111914878B (en) | Feature point tracking training method and device, electronic equipment and storage medium | |
Dong et al. | Adaptive cascade deep convolutional neural networks for face alignment | |
CN113196303A (en) | Inappropriate neural network input detection and processing | |
US11625589B2 (en) | Residual semi-recurrent neural networks | |
CN111223128A (en) | Target tracking method, device, equipment and storage medium | |
Zhai et al. | Face verification across aging based on deep convolutional networks and local binary patterns | |
CN114998592A (en) | Method, apparatus, device and storage medium for instance partitioning | |
CN112836753A (en) | Methods, apparatus, devices, media and products for domain adaptive learning | |
Boursinos et al. | Improving prediction confidence in learning-enabled autonomous systems | |
WO2017079972A1 (en) | A method and a system for classifying objects in images | |
Jo et al. | Ransac versus cs-ransac |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14900141 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14900141 Country of ref document: EP Kind code of ref document: A1 |