CN112257578A - Face key point detection method and device, electronic equipment and storage medium - Google Patents

Face key point detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112257578A
CN112257578A CN202011133910.9A CN202011133910A CN112257578A CN 112257578 A CN112257578 A CN 112257578A CN 202011133910 A CN202011133910 A CN 202011133910A CN 112257578 A CN112257578 A CN 112257578A
Authority
CN
China
Prior art keywords
network
face
inputting
feature map
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011133910.9A
Other languages
Chinese (zh)
Other versions
CN112257578B (en
Inventor
陈嘉莉
周超勇
刘玉宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202011133910.9A priority Critical patent/CN112257578B/en
Publication of CN112257578A publication Critical patent/CN112257578A/en
Application granted granted Critical
Publication of CN112257578B publication Critical patent/CN112257578B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, and provides a face key point detection method, a face key point detection device, electronic equipment and a storage medium. The face key point detection method comprises the following steps: inputting the face picture training data into a first residual error network to obtain a first feature map; inputting the first characteristic diagram into a geometric perception network to obtain a first geometric relation matrix; inputting the first characteristic diagram into an attention model to obtain a first weighted characteristic diagram matrix; obtaining first input data according to the first geometric relation matrix and the first weighted feature map matrix; inputting first input data into a first low-rank learning network, training the first low-rank learning network to predict key points of the face, and obtaining a second low-rank learning network; predicting face key points in face picture test data using the first residual network, the geometric perception network, the attention model and the second low-rank learning network. The method can effectively extract the key points of the face in the face picture with the shielding.

Description

Face key point detection method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of face recognition in artificial intelligence, in particular to a face key point detection method and device, electronic equipment and a storage medium.
Background
In the prior art, the detection of the key points of the face mainly depends on neural network models such as a residual error network, and the key points of the face cannot be well detected when a blocked face image is processed.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a method, an apparatus, an electronic device, and a storage medium for detecting face key points, so as to achieve fast extraction of face key points of a face image with occlusion.
A first aspect of the present application provides a method for detecting a face key point, where the method for detecting a face key point includes:
inputting face picture training data into a first residual error network, and processing the face picture training data by the first residual error network to obtain a first feature map, wherein the face picture training data comprises a face image with an occlusion flaw, and the first residual error network comprises a convolution layer, a maximum pooling layer and a residual error calculation module consisting of at least one residual error unit and is used for acquiring face image features from the face image;
inputting the first characteristic diagram into a geometric perception network, and obtaining a first geometric relation matrix through processing of the geometric perception network;
inputting the first characteristic diagram into an attention model, and processing the first characteristic diagram by the attention model to obtain a first weighted characteristic diagram matrix;
obtaining first input data according to the first geometric relation matrix and the first weighted feature map matrix;
inputting the first input data into a first low-rank learning network, training the first low-rank learning network to predict human face key points in the human face picture training data, and obtaining a trained second low-rank learning network;
inputting face picture test data into the first residual error network, processing the face picture test data by the first residual error network to obtain a second feature map, inputting the second feature map into the geometric perception network, processing the second feature map by the geometric perception network to obtain a second geometric relation matrix, inputting the second feature map into the attention model, processing the second feature map by the attention model to obtain a second weighted feature map matrix, and obtaining second input data according to the second geometric relation matrix and the second weighted feature map matrix;
inputting the second input data to the second low-rank learning network, wherein the second low-rank learning network predicts human face key points in the human face picture test data;
and outputting the face key points in the face picture test data.
Preferably, the inputting the face picture training data into a first residual error network, and obtaining a first feature map by processing through the first residual error network includes:
inputting the face picture training data into the convolutional layer in the first residual error network, and obtaining a first calculation result through calculation of the convolutional layer;
inputting the first calculation result into the maximum pooling layer of the first residual error network, and calculating by the maximum pooling layer to obtain a second calculation result;
and inputting the second calculation result into the residual calculation module of the first residual network, and calculating by the residual calculation module to obtain the first feature map.
Preferably, inputting the first feature map into a geometry-aware network, and obtaining a first geometric relationship matrix through processing by the geometry-aware network includes:
inputting the first feature map into a first convolution neural network in the geometric perception network, and obtaining a first matrix through processing of the first convolution neural network, wherein the first convolution neural network is used for obtaining a long-distance geometric relationship between face parts in a face image;
inputting the first feature map into a second convolutional neural network in the geometric perception network, and processing the first feature map by the second convolutional neural network to obtain a second matrix, wherein the second convolutional neural network is used for acquiring a local geometric relationship between face parts in a face image;
and calculating the outer product of the first matrix and the second matrix to obtain the first geometric relation matrix.
Preferably, the inputting the first feature map into an attention model and the obtaining a first weighted feature map matrix by the attention model processing include:
inputting the first feature map into a second residual error network in the attention model, and processing the first feature map by the second residual error network to obtain a feature vector, wherein the second residual error network comprises a residual error unit used for further extracting the features of the face image;
inputting the first feature map into a third convolutional neural network in the attention model, and processing the first feature map by the third convolutional neural network to obtain a single-channel feature vector, wherein the third convolutional neural network is used for extracting the weight of features in a face image;
calculating the single-channel feature vector by using a sigmoid function to obtain a probability distribution vector;
and performing element-by-element multiplication calculation on the feature vector and the probability distribution vector to obtain the first weighted feature map matrix.
Preferably, obtaining the first input data according to the first geometric relationship matrix and the first weighted feature map matrix includes:
and splicing the first geometric relation matrix and the first weighted feature map matrix to obtain the first input data.
Preferably, inputting the first input data into a first low-rank learning network, training the first low-rank learning network to predict face key points in the face picture training data, and obtaining a trained second low-rank learning network includes:
inputting the input data to a fully connected layer of the first low rank learning network;
training the first low-rank learning network to predict the face key points in the face picture training data by taking the input data as the input of the first low-rank learning network and taking the face key points in the face picture training data as the output;
and optimizing the first low-rank learning network according to a preset loss function to obtain a trained second low-rank learning network, wherein the second low-rank learning network can predict face key points in a face picture.
Preferably, optimizing the first low-rank learning network according to a preset loss function to obtain a trained second low-rank learning network includes:
according to the formula
Figure BDA0002736044580000031
Optimizing the weight of the full-connection layer of the first low-rank learning network to obtain a second low-rank learning network after training, wherein N is the number of samples of the face picture training data,
Figure BDA0002736044580000032
face keypoints predicted for the first low-rank learning network, wherein,
Figure BDA0002736044580000033
WTtransposing a weight matrix for a fully connected layer of the first low-rank learning network, MTFor the transpose of the structure matrix, X is the input data, S ═ S1,S2,...,SLThe L is the number of the face key points in the face picture training data,
Figure BDA0002736044580000034
is composed of
Figure BDA0002736044580000035
Is the square of the F norm, β is the regularization parameter for the rank of the structural matrix, rank (m) is the rank of the structural matrix.
A second aspect of the present application provides a face keypoint detection apparatus, comprising:
the residual error network computing module is used for inputting face picture training data into a first residual error network and obtaining a first feature map through processing of the first residual error network, wherein the face picture training data comprises a face image with an occlusion flaw, the first residual error network comprises a convolution layer, a maximum pooling layer and a residual error computing module consisting of at least one residual error unit and is used for obtaining face image features from the face image;
the geometric perception network computing module is used for inputting the first characteristic diagram into a geometric perception network and obtaining a first geometric relation matrix through processing of the geometric perception network;
the attention model calculation module is used for inputting the first characteristic diagram into an attention model and obtaining a first weighted characteristic diagram matrix through the attention model processing;
the splicing module is used for obtaining first input data according to the first geometric relation matrix and the first weighted feature map matrix;
the low-rank learning network training module is used for inputting the first input data into a first low-rank learning network, training the first low-rank learning network to predict face key points in the face picture training data, and obtaining a trained second low-rank learning network;
the test input construction module is used for inputting the face picture test data into the first residual error network, obtaining a second feature map through the processing of the first residual error network, inputting the second feature map into the geometric perception network, obtaining a second geometric relation matrix through the processing of the geometric perception network, inputting the second feature map into the attention model, obtaining a second weighted feature map matrix through the processing of the attention model, and obtaining second input data according to the second geometric relation matrix and the second weighted feature map matrix;
the low-rank learning network prediction module is used for inputting the second input data into the second low-rank learning network, and the second low-rank learning network predicts the human face key points in the human face picture test data;
and the output module is used for outputting the human face key points in the human face picture test data.
A third aspect of the present application provides an electronic device, comprising:
a memory storing at least one instruction; and
and the processor executes the instructions stored in the memory to realize the face key point detection method.
A fourth invention of the present application provides a computer storage medium having computer readable instructions stored thereon, which when executed by a processor, implement the face keypoint detection method.
In the invention, face picture training data is input into a first residual error network to obtain a first characteristic diagram, and the first characteristic diagram is input into a geometric perception network to obtain a first geometric relation matrix; inputting the first characteristic diagram into an attention model to obtain a first weighted characteristic diagram matrix; obtaining first input data according to the first geometric relation matrix and the first weighted feature map matrix; inputting first input data into a first low-rank learning network, training the first low-rank learning network to predict key points of the face, and obtaining a second low-rank learning network after training; the method comprises the steps of using the first residual error network, the geometric perception network, the attention model and the second low-rank learning network to predict face key points in face picture test data, obtaining a feature map of a face image through the first residual error network, capturing geometric relations among different face components through the geometric perception network, filtering irrelevant information of a background through the attention model to obtain clean feature representation, replying the shielded key points in the face image through the low-rank information network, effectively extracting the face key points in the face picture under the condition that the face picture has shielding, and solving the technical problem that the face key points cannot be detected under the condition that the face picture has shielding in the prior art.
Drawings
Fig. 1 is a flowchart of a method for detecting key points of a human face according to an embodiment of the present invention.
Fig. 2 is a block diagram of a face key point detection device according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of an electronic device according to an embodiment of the invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth to provide a thorough understanding of the present invention, and the described embodiments are merely a subset of the embodiments of the present invention, rather than a complete embodiment. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
Preferably, the face key point detection method is applied to one or more electronic devices. The electronic device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The electronic device may be a desktop computer, a notebook computer, a tablet computer, a cloud server, or other computing device. The device can be in man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.
Example 1
Fig. 1 is a flowchart of a method for detecting key points of a human face according to an embodiment of the present invention. The order of the steps in the flow chart may be changed and some steps may be omitted according to different needs.
Referring to fig. 1, the method for detecting key points of a human face specifically includes the following steps:
step S11, inputting face picture training data into a first residual error network, and obtaining a first feature map through the processing of the first residual error network, wherein the face picture training data comprises a face image with an occlusion flaw, the first residual error network comprises a convolution layer, a maximum pooling layer and a residual error calculation module consisting of at least one residual error unit, and the residual error calculation module is used for obtaining the face image feature from the face image.
In at least one embodiment of the present invention, inputting face picture training data to a first residual error network, and obtaining a first feature map by processing the face picture training data by the first residual error network includes:
inputting the face picture training data into the convolutional layer in the first residual error network, and obtaining a first calculation result through calculation of the convolutional layer;
inputting the first calculation result into the maximum pooling layer of the first residual error network, and calculating by the maximum pooling layer to obtain a second calculation result;
and inputting the second calculation result into the residual calculation module of the first residual network, and calculating by the residual calculation module to obtain the first feature map.
Specifically, inputting the face image training data into the convolutional layer in the first residual error network, and obtaining a first calculation result through calculation by the convolutional layer includes:
converting the face picture training data into a matrix form;
and carrying out convolution operation on the face picture training data in the converted form by using the convolution kernel of the convolution layer to obtain a first calculation result.
For example, the convolutional layer may be a convolutional layer in a depth residual error network RESNET-18, convolution operation is performed on the converted face picture training data with 2 as a step size, and the first calculation result may be a matrix with a size of 112 × 112.
Specifically, inputting the first calculation result into the maximum pooling layer of the first residual network, and calculating by the maximum pooling layer to obtain a second calculation result includes:
inputting the first calculation result into the maximum pooling layer;
and the maximum pooling layer performs maximum pooling operation on the first calculation result to obtain a second calculation result.
For example, the maximum pooling layer of the first residual network may be a maximum pooling layer in the deep residual network RESNET-18, and the maximum pooling operation is performed on the first calculation result by using 2 as a step size.
Specifically, inputting the second calculation result into the residual calculation module of the first residual network, and calculating the first feature map by the residual calculation module includes:
and inputting the second calculation result into the residual calculation module, sequentially calculating by at least one residual unit in the residual calculation module, and taking the output of the last unit in the at least one residual unit as the output of the residual calculation module.
The residual unit is represented as:
yi=h(xi)+F(xi,wi)
xi+1=f(yi),
where F is the residual function, F is the ReLU function, wiIs a weight matrix, xiIs the input of the l-th layer, yiIs the output of the l-th layer; the function h is formulated as h (x)i)=xi(ii) a The residual function F is formulated as F (x)i,wi)=ωi·σ(B(w′i)·B(xi) Wherein B is batch normalization, w'iIs wiThe transpose of (c) represents a convolution.
For example, the residual calculation module of the first residual network may be the first three residual units in the depth residual network RESNET-18.
Step S12, inputting the first characteristic diagram into a geometric perception network, and obtaining a first geometric relation matrix through the processing of the geometric perception network.
In at least one embodiment of the present invention, inputting the first feature map into a geometry-aware network, and obtaining a first geometry relationship matrix through processing by the geometry-aware network includes:
inputting the first feature map into a first convolution neural network in the geometric perception network, and obtaining a first matrix through processing of the first convolution neural network, wherein the first convolution neural network is used for obtaining a long-distance geometric relationship between face parts in a face image;
inputting the first feature map into a second convolutional neural network in the geometric perception network, and processing the first feature map by the second convolutional neural network to obtain a second matrix, wherein the second convolutional neural network is used for acquiring a local geometric relationship between face parts in a face image;
and calculating the outer product of the first matrix and the second matrix to obtain the first geometric relation matrix. Specifically, inputting the first feature map into a first convolutional neural network in the geometric sense network, and obtaining a first matrix through processing by the first convolutional neural network includes:
inputting the first feature map into a first convolution layer of the first convolution neural network, and performing convolution operation on the first feature map by using a convolution core of the first convolution layer of the first convolution neural network to obtain a first convolution result;
inputting the first convolution result into a second convolution layer of the first convolution neural network, and performing convolution operation on the first convolution result by using a convolution core of the second convolution layer of the first convolution neural network to obtain a second convolution result;
and inputting the second convolution result into a third convolution layer of the first convolution neural network, and performing convolution operation on the second convolution result by using a convolution core of the third convolution layer of the first convolution neural network to obtain the first matrix.
For example, the size of the convolution kernel of the first convolution layer of the first convolution neural network may be 1 × 1, the size of the convolution kernel of the second convolution layer of the first convolution neural network may be 5 × 5, and the size of the convolution kernel of the third convolution layer of the first convolution neural network may be 1 × 1.
Specifically, inputting the first feature map into a second convolutional neural network in the geometric sense network, and obtaining a second matrix through processing by the second convolutional neural network includes:
inputting the first feature map into a first convolution layer of the second convolutional neural network, and performing convolution operation on the first feature map by using a convolution core of the first convolution layer of the second convolutional neural network to obtain a third convolution result;
inputting the third convolution result into a second convolution layer of the second convolution neural network, and performing convolution operation on the third convolution result by using a convolution core of the second convolution layer of the second convolution neural network to obtain a fourth convolution result;
and inputting the fourth convolution result into a third convolution layer of the second convolution neural network, and performing convolution operation on the fourth convolution result by using a convolution core of the third convolution layer of the second convolution neural network to obtain the second matrix.
For example, the size of the convolution kernel of the first convolution layer of the second convolutional neural network may be 1 × 1, the size of the convolution kernel of the second convolution layer of the second convolutional neural network may be 3 × 3, and the size of the convolution kernel of the third convolution layer of the second convolutional neural network may be 1 × 1.
Step S13, inputting the first feature map into an attention model, and obtaining a first weighted feature map matrix through processing by the attention model.
In at least one embodiment of the present invention, the inputting the first feature map into an attention model, and the obtaining a first weighted feature map matrix by the attention model processing includes:
inputting the first feature map into a second residual error network in the attention model, and processing the first feature map by the second residual error network to obtain a feature vector, wherein the second residual error network comprises a residual error unit used for further extracting the features of the face image;
inputting the first feature map into a third convolutional neural network in the attention model, and processing the first feature map by the third convolutional neural network to obtain a single-channel feature vector, wherein the third convolutional neural network is used for extracting the weight of features in a face image;
calculating the single-channel feature vector by using a sigmoid function to obtain a probability distribution vector;
and performing element-by-element multiplication calculation on the feature vector and the probability distribution vector to obtain the first weighted feature map matrix.
Specifically, inputting the first feature map into a second residual error network in the attention model, and obtaining a feature vector through processing of the second residual error network includes:
and inputting the first feature map into a residual error unit of a second residual error network in the attention model, and calculating by the residual error unit to obtain the feature vector.
Specifically, inputting the first feature map into a third convolutional neural network in the attention model, and obtaining a single-channel feature vector through processing by the third convolutional neural network includes:
inputting the first feature map into a first convolution layer of the third convolutional neural network, and performing convolution operation on the first feature map by using a convolution core of the first convolution layer of the third convolutional neural network to obtain a fifth convolution result;
inputting the third convolution result into a second convolution layer of the third convolutional neural network, and performing convolution operation on the fifth convolution result by using a convolution core of the second convolution layer of the third convolutional neural network to obtain a sixth convolution result;
and inputting the fourth convolution result into a third convolution layer of the third convolutional neural network, and performing convolution operation on the sixth convolution result by using a convolution core of the third convolution layer of the third convolutional neural network to obtain the feature vector.
For example, the size of the convolution kernel of the first convolution layer of the third convolutional neural network may be 1 × 1, the size of the convolution kernel of the second convolution layer of the third convolutional neural network may be 3 × 3, and the size of the convolution kernel of the third convolution layer of the third convolutional neural network may be 1 × 1.
Specifically, calculating the single-channel feature vector by using a sigmoid function, and obtaining a probability distribution vector comprises:
and for each element in the single-channel feature vector, calculating the corresponding sigmoid value of each element by using the sigmoid function, and taking the sigmoid value corresponding to each element as the probability distribution vector.
Wherein the sigmoid function is in the form of
Figure BDA0002736044580000091
Wherein e is a natural logarithm, and x is an element to be processed.
And step S14, obtaining first input data according to the first geometric relation matrix and the first weighted feature map matrix.
In at least one embodiment of the present invention, obtaining the first input data according to the first geometric relationship matrix and the first weighted feature map matrix comprises:
and splicing the first geometric relation matrix and the first weighted feature map matrix to obtain the first input data.
Step S15, inputting the first input data into a first low-rank learning network, training the first low-rank learning network to predict the key points of the face in the face picture training data, and obtaining a trained second low-rank learning network.
In at least one embodiment of the present invention, inputting the first input data to a first low-rank learning network, training the first low-rank learning network to predict face key points in the face picture training data, and obtaining a trained second low-rank learning network includes:
inputting the input data to a fully connected layer of the first low rank learning network;
training the first low-rank learning network to predict the face key points in the face picture training data by taking the input data as the input of the first low-rank learning network and taking the face key points in the face picture training data as the output;
and optimizing the first low-rank learning network according to a preset loss function to obtain a trained second low-rank learning network, wherein the second low-rank learning network can predict face key points in a face picture.
In at least one embodiment of the present invention, optimizing the first low-rank learning network according to a preset loss function to obtain a trained second low-rank learning network includes:
according to the formula
Figure BDA0002736044580000101
Optimizing the weight of the full-connection layer of the first low-rank learning network to obtain a second low-rank learning network after training, wherein N is the number of samples of the face picture training data,
Figure BDA0002736044580000105
face keypoints predicted for the first low-rank learning network, wherein,
Figure BDA0002736044580000102
WTtransposing a weight matrix for a fully connected layer of the first low-rank learning network, MTFor the transpose of the structure matrix, X is the input data, S ═ S1,S2,., SL is the face key points in the face picture training data, L is the number of face key points in the face picture training data,
Figure BDA0002736044580000103
is composed of
Figure BDA0002736044580000104
Is the square of the F norm, β is the regularization parameter for the rank of the structural matrix, rank (m) is the rank of the structural matrix.
Step S16, inputting the face picture test data into the first residual error network, obtaining a second feature map through the first residual error network processing, inputting the second feature map into the geometric perception network, obtaining a second geometric relation matrix through the geometric perception network processing, inputting the second feature map into the attention model, obtaining a second weighted feature map matrix through the attention model processing, and obtaining second input data according to the second geometric relation matrix and the second weighted feature map matrix.
Step S17, inputting the second input data to the second low-rank learning network, where the second low-rank learning network predicts the face key points in the face picture test data.
Specifically, inputting the second input data to the second low-rank learning network, where the second low-rank learning network predicting the face key points in the face picture test data includes:
inputting the second input data to a fully connected layer of the second low rank learning network;
and calculating to obtain the face key points in the face picture test data through the full connection layer of the second low-rank learning network.
And step S18, outputting the face key points in the face picture test data.
Specifically, the outputting the face key points may include:
displaying the face picture test data, identifying the face key points on the face picture test data, and outputting the coordinates of the face key points in the face picture test data.
It should be noted that, in order to ensure the privacy and security of the data and the output result in the processing process, the data and the output result in the processing process may be stored in a block chain, such as the face image training data, the first feature map, the first geometric relationship matrix, the face image test data, the second input data, the face key point, and the like.
The invention obtains a first characteristic diagram by inputting face picture training data into a first residual error network, wherein the face picture training data comprises a face image with a shielding flaw, the first characteristic diagram is input into a geometric perception network to obtain a first geometric relation matrix, the first characteristic diagram is input into an attention model to obtain a first weighted characteristic diagram matrix, first input data is obtained according to the first geometric relation matrix and the first weighted characteristic diagram matrix, the first input data is input into a first low-rank learning network, the first low-rank learning network is trained to predict key points of the face, a second low-rank learning network after training is obtained, the first residual error network, the geometric perception network, the attention model and the second low-rank learning network are used to predict key points of the face in face picture test data, the invention obtains the characteristic diagram of the face image through the first residual error network, the geometric relation between different face components is captured through a geometric perception network, irrelevant information of a background is filtered through an attention model to obtain clean feature representation, shielded key points in a face image are replied through a low-rank information network, the face key points in the face image can be effectively extracted under the condition that the face image has shielding, and the technical problem that the face key points cannot be detected under the condition that the face image has shielding in the prior art is solved.
Example 2
Fig. 2 is a block diagram of a face key point detection device 30 according to an embodiment of the present invention.
In some embodiments, the face keypoint detection apparatus 30 is implemented in an electronic device. The face keypoint detection apparatus 30 may comprise a plurality of functional modules consisting of program code segments. The program codes of the respective program segments in the face keypoint detection apparatus 30 may be stored in a memory and executed by at least one processor for performing a face keypoint detection function.
In this embodiment, the face keypoint detection apparatus 30 may be divided into a plurality of functional modules according to the functions executed by the face keypoint detection apparatus. Referring to fig. 2, the face keypoint detection apparatus 30 may include a residual network computing module 301, a geometric perception network computing module 302, an attention model computing module 303, a splicing module 304, a low-rank learning network training module 305, a test input construction module 306, a low-rank learning network prediction module 307, and an output module 308. The module referred to herein is a series of computer readable instruction segments stored in a memory that can be executed by at least one processor and that can perform a fixed function. In some embodiments, the functionality of the modules will be described in greater detail in subsequent embodiments.
The residual network computing module 301 inputs the face image training data into a first residual network, and obtains a first feature map by processing the face image training data through the first residual network, where the face image training data includes a face image with an occlusion defect, the first residual network includes a convolution layer, a maximum pooling layer, and a residual computing module composed of at least one residual unit, and is used to obtain a face image feature from the face image.
In at least one embodiment of the present invention, inputting face picture training data to a first residual error network, and obtaining a first feature map by processing the face picture training data by the first residual error network includes:
inputting the face picture training data into the convolutional layer in the first residual error network, and obtaining a first calculation result through calculation of the convolutional layer;
inputting the first calculation result into the maximum pooling layer of the first residual error network, and calculating by the maximum pooling layer to obtain a second calculation result;
and inputting the second calculation result into the residual calculation module of the first residual network, and calculating by the residual calculation module to obtain the first feature map.
Specifically, inputting the face image training data into the convolutional layer in the first residual error network, and obtaining a first calculation result through calculation by the convolutional layer includes:
converting the face picture training data into a matrix form;
and carrying out convolution operation on the face picture training data in the converted form by using the convolution kernel of the convolution layer to obtain a first calculation result.
Specifically, inputting the first calculation result into the maximum pooling layer of the first residual network, and calculating by the maximum pooling layer to obtain a second calculation result includes:
inputting the first calculation result into the maximum pooling layer;
and the maximum pooling layer performs maximum pooling operation on the first calculation result to obtain a second calculation result.
Specifically, inputting the second calculation result into the residual calculation module of the first residual network, and calculating the first feature map by the residual calculation module includes:
and inputting the second calculation result into the residual calculation module, sequentially calculating by at least one residual unit in the residual calculation module, and taking the output of the last unit in the at least one residual unit as the output of the residual calculation module.
The residual unit is represented as:
yi=h(xi)+F(xi,wi)
xi+1=f(yi),
where F is the residual function, F is the ReLU function, wiIs a weight matrix, xiIs the input of the l-th layer, yiIs the output of the l-th layer; the function h is formulated as h (x)i)=xi(ii) a The residual function F is formulated as F (x)i,wi)=ωi·σ(B(w′i)·B(xi) Wherein B is batch normalization, w'iIs wiThe transpose of (c) represents a convolution.
The geometric sense network computing module 302 inputs the first feature map to a geometric sense network, and obtains a first geometric relationship matrix through processing of the geometric sense network.
In at least one embodiment of the present invention, inputting the first feature map into a geometry-aware network, and obtaining a first geometry relationship matrix through processing by the geometry-aware network includes:
inputting the first feature map into a first convolution neural network in the geometric perception network, and obtaining a first matrix through processing of the first convolution neural network, wherein the first convolution neural network is used for obtaining a long-distance geometric relationship between face parts in a face image;
inputting the first feature map into a second convolutional neural network in the geometric perception network, and processing the first feature map by the second convolutional neural network to obtain a second matrix, wherein the second convolutional neural network is used for acquiring a local geometric relationship between face parts in a face image;
and calculating the outer product of the first matrix and the second matrix to obtain the first geometric relation matrix.
Specifically, inputting the first feature map into a first convolutional neural network in the geometric sense network, and obtaining a first matrix through processing by the first convolutional neural network includes:
inputting the first feature map into a first convolution layer of the first convolution neural network, and performing convolution operation on the first feature map by using a convolution core of the first convolution layer of the first convolution neural network to obtain a first convolution result;
inputting the first convolution result into a second convolution layer of the first convolution neural network, and performing convolution operation on the first convolution result by using a convolution core of the second convolution layer of the first convolution neural network to obtain a second convolution result;
and inputting the second convolution result into a third convolution layer of the first convolution neural network, and performing convolution operation on the second convolution result by using a convolution core of the third convolution layer of the first convolution neural network to obtain the first matrix.
Specifically, inputting the first feature map into a second convolutional neural network in the geometric sense network, and obtaining a second matrix through processing by the second convolutional neural network includes:
inputting the first feature map into a first convolution layer of the second convolutional neural network, and performing convolution operation on the first feature map by using a convolution core of the first convolution layer of the second convolutional neural network to obtain a third convolution result;
inputting the third convolution result into a second convolution layer of the second convolution neural network, and performing convolution operation on the third convolution result by using a convolution core of the second convolution layer of the second convolution neural network to obtain a fourth convolution result; and inputting the fourth convolution result into a third convolution layer of the second convolution neural network, and performing convolution operation on the fourth convolution result by using a convolution core of the third convolution layer of the second convolution neural network to obtain the second matrix.
The attention model calculating module 303 inputs the first feature map into an attention model, and obtains a first weighted feature map matrix through processing of the attention model.
In at least one embodiment of the present invention, the inputting the first feature map into an attention model, and the obtaining a first weighted feature map matrix by the attention model processing includes:
inputting the first feature map into a second residual error network in the attention model, and processing the first feature map by the second residual error network to obtain a feature vector, wherein the second residual error network comprises a residual error unit used for further extracting the features of the face image;
inputting the first feature map into a third convolutional neural network in the attention model, and processing the first feature map by the third convolutional neural network to obtain a single-channel feature vector, wherein the third convolutional neural network is used for extracting the weight of features in a face image;
calculating the single-channel feature vector by using a sigmoid function to obtain a probability distribution vector;
and performing element-by-element multiplication calculation on the feature vector and the probability distribution vector to obtain the first weighted feature map matrix.
Specifically, inputting the first feature map into a second residual error network in the attention model, and obtaining a feature vector through processing of the second residual error network includes:
and inputting the first feature map into a residual error unit of a second residual error network in the attention model, and calculating by the residual error unit to obtain the feature vector.
Specifically, inputting the first feature map into a third convolutional neural network in the attention model, and obtaining a single-channel feature vector through processing by the third convolutional neural network includes:
inputting the first feature map into a first convolution layer of the third convolutional neural network, and performing convolution operation on the first feature map by using a convolution core of the first convolution layer of the third convolutional neural network to obtain a fifth convolution result;
inputting the third convolution result into a second convolution layer of the third convolutional neural network, and performing convolution operation on the fifth convolution result by using a convolution core of the second convolution layer of the third convolutional neural network to obtain a sixth convolution result;
and inputting the fourth convolution result into a third convolution layer of the third convolutional neural network, and performing convolution operation on the sixth convolution result by using a convolution core of the third convolution layer of the third convolutional neural network to obtain the feature vector.
Specifically, calculating the single-channel feature vector by using a sigmoid function, and obtaining a probability distribution vector comprises:
and for each element in the single-channel feature vector, calculating the corresponding sigmoid value of each element by using the sigmoid function, and taking the sigmoid value corresponding to each element as the probability distribution vector.
Wherein the sigmoid function is in the form of
Figure BDA0002736044580000141
Wherein e is a natural logarithm, and x is an element to be processed.
The splicing module 304 obtains first input data according to the first geometric relationship matrix and the first weighted feature map matrix.
In at least one embodiment of the present invention, obtaining the first input data according to the first geometric relationship matrix and the first weighted feature map matrix comprises:
and splicing the first geometric relation matrix and the first weighted feature map matrix to obtain the first input data.
The low-rank learning network training module 305 inputs the first input data to a first low-rank learning network, trains the first low-rank learning network to predict face key points in the face picture training data, and obtains a trained second low-rank learning network.
In at least one embodiment of the present invention, inputting the first input data to a first low-rank learning network, training the first low-rank learning network to predict face key points in the face picture training data, and obtaining a trained second low-rank learning network includes:
inputting the input data to a fully connected layer of the first low rank learning network;
training the first low-rank learning network to predict the face key points in the face picture training data by taking the input data as the input of the first low-rank learning network and taking the face key points in the face picture training data as the output;
and optimizing the first low-rank learning network according to a preset loss function to obtain a trained second low-rank learning network, wherein the second low-rank learning network can predict face key points in a face picture.
In at least one embodiment of the present invention, optimizing the first low-rank learning network according to a preset loss function to obtain a trained second low-rank learning network includes:
according to the formula
Figure BDA0002736044580000151
Optimizing the weight of the full-connection layer of the first low-rank learning network to obtain a second low-rank learning network after training, wherein N is the number of samples of the face picture training data,
Figure BDA0002736044580000154
is the firstThe low rank learning network predicts the key points of the face, wherein,
Figure BDA0002736044580000152
WTtransposing a weight matrix for a fully connected layer of the first low-rank learning network, MTFor the transpose of the structure matrix, X is the input data, S ═ S1,S2,...,SLThe L is the number of the face key points in the face picture training data,
Figure BDA0002736044580000153
is composed of
Figure BDA0002736044580000155
Is the square of the F norm, β is the regularization parameter for the rank of the structural matrix, rank (m) is the rank of the structural matrix.
The test input construction module 306 inputs the face picture test data to the first residual error network, obtains a second feature map through the processing of the first residual error network, inputs the second feature map to the geometric perception network, obtains a second geometric relationship matrix through the processing of the geometric perception network, inputs the second feature map to the attention model, obtains a second weighted feature map matrix through the processing of the attention model, and obtains second input data according to the second geometric relationship matrix and the second weighted feature map matrix.
The low-rank learning network prediction module 307 inputs the second input data to the second low-rank learning network, and the second low-rank learning network predicts the face key points in the face picture test data.
Specifically, inputting the second input data to the second low-rank learning network, where the second low-rank learning network predicting the face key points in the face picture test data includes:
inputting the second input data to a fully connected layer of the second low rank learning network;
and calculating to obtain the face key points in the face picture test data through the full connection layer of the second low-rank learning network.
The output module 308 outputs the face key points in the face picture test data.
Specifically, the outputting the face key points may include:
displaying the face picture test data, identifying the face key points on the face picture test data, and outputting the coordinates of the face key points in the face picture test data.
It should be noted that, in order to ensure the privacy and security of the data and the output result in the processing process, the data and the output result in the processing process may be stored in a block chain, such as the face image training data, the first feature map, the first geometric relationship matrix, the face image test data, the second input data, the face key point, and the like.
The invention obtains a first characteristic diagram by inputting face picture training data into a first residual error network, wherein the face picture training data comprises a face image with a shielding flaw, the first characteristic diagram is input into a geometric perception network to obtain a first geometric relation matrix, the first characteristic diagram is input into an attention model to obtain a first weighted characteristic diagram matrix, first input data is obtained according to the first geometric relation matrix and the first weighted characteristic diagram matrix, the first input data is input into a first low-rank learning network, the first low-rank learning network is trained to predict key points of the face, a second low-rank learning network after training is obtained, the first residual error network, the geometric perception network, the attention model and the second low-rank learning network are used to predict key points of the face in face picture test data, the invention obtains the characteristic diagram of the face image through the first residual error network, the geometric relation between different face components is captured through a geometric perception network, irrelevant information of a background is filtered through an attention model to obtain clean feature representation, shielded key points in a face image are replied through a low-rank information network, the face key points in the face image can be effectively extracted under the condition that the face image has shielding, and the technical problem that the face key points cannot be detected under the condition that the face image has shielding in the prior art is solved.
Example 3
Fig. 3 is a schematic diagram of an electronic device 6 according to an embodiment of the invention.
The electronic device 6 comprises a memory 61, a processor 62 and computer readable instructions stored in the memory 61 and executable on the processor 62. The processor 62, when executing the computer readable instructions, implements the steps in the above-mentioned embodiment of the face keypoint detection method, such as the steps S11 to S18 shown in fig. 1. Alternatively, the processor 62, when executing the computer readable instructions, implements the functions of the modules/units in the above-mentioned embodiment of the face keypoint detection apparatus, such as the modules 301 to 308 in fig. 2.
Illustratively, the computer readable instructions may be partitioned into one or more modules/units that are stored in the memory 61 and executed by the processor 62 to implement the present invention. The one or more modules/units may be a series of computer-readable instruction segments capable of performing specific functions, which are used for describing the execution process of the computer-readable instructions in the electronic device 6. For example, the computer readable instructions can be divided into a residual network computing module 301, a geometry-aware network computing module 302, an attention model computing module 303, a splicing module 304, a low-rank learning network training module 305, a test input construction module 306, a low-rank learning network prediction module 307, and an output module 308 in fig. 2, and the specific functions of each module are described in embodiment 2.
In this embodiment, the electronic device 6 may be a computing device such as a desktop computer, a notebook, a palm computer, a server, and a cloud terminal device. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the electronic device 6, and does not constitute a limitation of the electronic device 6, and may include more or less components than those shown, or combine certain components, or different components, for example, the electronic device 6 may further include an input-output device, a network access device, a bus, etc.
The Processor 62 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor 62 may be any conventional processor or the like, the processor 62 being the control center for the electronic device 6, with various interfaces and lines connecting the various parts of the overall electronic device 6.
The memory 61 may be used for storing the computer readable instructions and/or modules/units, and the processor 62 implements various functions of the electronic device 6 by executing or executing the computer readable instructions and/or modules/units stored in the memory 61 and calling data stored in the memory 61. The memory 61 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to the use of the electronic device 6, and the like. In addition, the memory 61 may include volatile memory and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other storage device.
The integrated modules/units of the electronic device 6, if implemented in the form of software functional modules and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the above embodiments may be implemented by using computer readable instructions to instruct the related hardware, where the computer readable instructions may be stored in a computer readable storage medium, and when the computer readable instructions are executed by a processor, the steps of the above methods embodiments may be implemented. Wherein the computer readable instructions comprise computer readable instruction code which may be in source code form, object code form, an executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying the computer-readable instruction code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), and the like.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
In addition, each functional module in each embodiment of the present invention may be integrated into the same processing module, or each module may exist alone physically, or two or more modules may be integrated into the same module. The integrated module can be realized in a hardware form, and can also be realized in a form of hardware and a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is to be understood that the word "comprising" does not exclude other modules or steps, and the singular does not exclude the plural. Several modules or electronic devices recited in the electronic device claims may also be implemented by one and the same module or electronic device by means of software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A face key point detection method is characterized by comprising the following steps:
inputting face picture training data into a first residual error network, and processing the face picture training data by the first residual error network to obtain a first feature map, wherein the face picture training data comprises a face image with an occlusion flaw, and the first residual error network comprises a convolution layer, a maximum pooling layer and a residual error calculation module consisting of at least one residual error unit and is used for acquiring face image features from the face image;
inputting the first characteristic diagram into a geometric perception network, and obtaining a first geometric relation matrix through processing of the geometric perception network;
inputting the first characteristic diagram into an attention model, and processing the first characteristic diagram by the attention model to obtain a first weighted characteristic diagram matrix;
obtaining first input data according to the first geometric relation matrix and the first weighted feature map matrix;
inputting the first input data into a first low-rank learning network, training the first low-rank learning network to predict human face key points in the human face picture training data, and obtaining a trained second low-rank learning network;
inputting face picture test data into the first residual error network, processing the face picture test data by the first residual error network to obtain a second feature map, inputting the second feature map into the geometric perception network, processing the second feature map by the geometric perception network to obtain a second geometric relation matrix, inputting the second feature map into the attention model, processing the second feature map by the attention model to obtain a second weighted feature map matrix, and obtaining second input data according to the second geometric relation matrix and the second weighted feature map matrix;
inputting the second input data to the second low-rank learning network, wherein the second low-rank learning network predicts human face key points in the human face picture test data;
and outputting the face key points in the face picture test data.
2. The method of claim 1, wherein the inputting the face image training data into a first residual error network and obtaining a first feature map by processing the face image training data through the first residual error network comprises:
inputting the face picture training data into the convolutional layer in the first residual error network, and obtaining a first calculation result through calculation of the convolutional layer;
inputting the first calculation result into the maximum pooling layer of the first residual error network, and calculating by the maximum pooling layer to obtain a second calculation result;
and inputting the second calculation result into the residual calculation module of the first residual network, and calculating by the residual calculation module to obtain the first feature map.
3. The method for detecting face key points according to claim 1, wherein inputting the first feature map into a geometric sense network, and obtaining a first geometric relationship matrix through processing of the geometric sense network comprises:
inputting the first feature map into a first convolution neural network in the geometric perception network, and obtaining a first matrix through processing of the first convolution neural network, wherein the first convolution neural network is used for obtaining a long-distance geometric relationship between face parts in a face image;
inputting the first feature map into a second convolutional neural network in the geometric perception network, and processing the first feature map by the second convolutional neural network to obtain a second matrix, wherein the second convolutional neural network is used for acquiring a local geometric relationship between face parts in a face image;
and calculating the outer product of the first matrix and the second matrix to obtain the first geometric relation matrix.
4. The method of claim 1, wherein the inputting the first feature map into an attention model and obtaining a first weighted feature map matrix by the attention model processing comprises:
inputting the first feature map into a second residual error network in the attention model, and processing the first feature map by the second residual error network to obtain a feature vector, wherein the second residual error network comprises a residual error unit used for further extracting the features of the face image;
inputting the first feature map into a third convolutional neural network in the attention model, and processing the first feature map by the third convolutional neural network to obtain a single-channel feature vector, wherein the third convolutional neural network is used for extracting the weight of features in a face image;
calculating the single-channel feature vector by using a sigmoid function to obtain a probability distribution vector;
and performing element-by-element multiplication calculation on the feature vector and the probability distribution vector to obtain the first weighted feature map matrix.
5. The method of claim 1, wherein obtaining first input data according to the first geometric relationship matrix and the first weighted feature map matrix comprises:
and splicing the first geometric relation matrix and the first weighted feature map matrix to obtain the first input data.
6. The method of claim 1, wherein inputting the first input data into a first low-rank learning network, training the first low-rank learning network to predict face keypoints in the face picture training data, and obtaining a trained second low-rank learning network comprises:
inputting the input data to a fully connected layer of the first low rank learning network;
training the first low-rank learning network to predict the face key points in the face picture training data by taking the input data as the input of the first low-rank learning network and taking the face key points in the face picture training data as the output;
and optimizing the first low-rank learning network according to a preset loss function to obtain a trained second low-rank learning network, wherein the second low-rank learning network can predict face key points in a face picture.
7. The method for detecting the key points of the human face according to claim 6, wherein the step of optimizing the first low-rank learning network according to a preset loss function to obtain a trained second low-rank learning network comprises the following steps:
according to the formula
Figure FDA0002736044570000031
Optimizing the weight of the full-connection layer of the first low-rank learning network to obtain a second low-rank learning network after training, wherein N is the number of samples of the face picture training data,
Figure FDA0002736044570000032
face keypoints predicted for the first low-rank learning network, wherein,
Figure FDA0002736044570000033
WTtransposing a weight matrix for a fully connected layer of the first low-rank learning network, MTFor the transpose of the structure matrix, X is the input data, S ═ S1,S2,...,SLThe L is the number of the face key points in the face picture training data,
Figure FDA0002736044570000034
is composed of
Figure FDA0002736044570000035
Is the square of the F norm, β is the regularization parameter for the rank of the structural matrix, rank (m) is the rank of the structural matrix.
8. A face keypoint detection device, characterized in that it comprises:
the residual error network computing module is used for inputting face picture training data into a first residual error network and obtaining a first feature map through processing of the first residual error network, wherein the face picture training data comprises a face image with an occlusion flaw, the first residual error network comprises a convolution layer, a maximum pooling layer and a residual error computing module consisting of at least one residual error unit and is used for obtaining face image features from the face image;
the geometric perception network computing module is used for inputting the first characteristic diagram into a geometric perception network and obtaining a first geometric relation matrix through processing of the geometric perception network;
the attention model calculation module is used for inputting the first characteristic diagram into an attention model and obtaining a first weighted characteristic diagram matrix through the attention model processing;
the splicing module is used for obtaining first input data according to the first geometric relation matrix and the first weighted feature map matrix;
the low-rank learning network training module is used for inputting the first input data into a first low-rank learning network, training the first low-rank learning network to predict face key points in the face picture training data, and obtaining a trained second low-rank learning network;
the test input construction module is used for inputting the face picture test data into the first residual error network, obtaining a second feature map through the processing of the first residual error network, inputting the second feature map into the geometric perception network, obtaining a second geometric relation matrix through the processing of the geometric perception network, inputting the second feature map into the attention model, obtaining a second weighted feature map matrix through the processing of the attention model, and obtaining second input data according to the second geometric relation matrix and the second weighted feature map matrix;
the low-rank learning network prediction module is used for inputting the second input data into the second low-rank learning network, and the second low-rank learning network predicts the human face key points in the human face picture test data;
and the output module is used for outputting the human face key points in the human face picture test data.
9. An electronic device, characterized in that the electronic device comprises:
a memory storing at least one instruction; and
a processor executing instructions stored in the memory to implement the face keypoint detection method of any of claims 1 to 7.
10. A computer storage medium having computer readable instructions stored thereon which, when executed by a processor, implement a method of face keypoint detection as claimed in any one of claims 1 to 7.
CN202011133910.9A 2020-10-21 2020-10-21 Face key point detection method and device, electronic equipment and storage medium Active CN112257578B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011133910.9A CN112257578B (en) 2020-10-21 2020-10-21 Face key point detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011133910.9A CN112257578B (en) 2020-10-21 2020-10-21 Face key point detection method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112257578A true CN112257578A (en) 2021-01-22
CN112257578B CN112257578B (en) 2023-07-07

Family

ID=74263071

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011133910.9A Active CN112257578B (en) 2020-10-21 2020-10-21 Face key point detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112257578B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112767383A (en) * 2021-01-29 2021-05-07 深圳艾摩米智能科技有限公司 Face pox positioning and recognition method
CN112949576A (en) * 2021-03-29 2021-06-11 北京京东方技术开发有限公司 Attitude estimation method, attitude estimation device, attitude estimation equipment and storage medium
CN113095310A (en) * 2021-06-10 2021-07-09 杭州魔点科技有限公司 Face position detection method, electronic device and storage medium
CN113469111A (en) * 2021-07-16 2021-10-01 中国银行股份有限公司 Image key point detection method and system, electronic device and storage medium
CN115063874A (en) * 2022-08-16 2022-09-16 深圳市海清视讯科技有限公司 Control method, device and equipment of intelligent household equipment and storage medium
CN112257578B (en) * 2020-10-21 2023-07-07 平安科技(深圳)有限公司 Face key point detection method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846343A (en) * 2018-06-05 2018-11-20 北京邮电大学 Multi-task collaborative analysis method based on three-dimensional video
CN110188227A (en) * 2019-05-05 2019-08-30 华南理工大学 A kind of hashing image search method based on deep learning and low-rank matrix optimization
CN115311730A (en) * 2022-09-23 2022-11-08 北京智源人工智能研究院 Face key point detection method and system and electronic equipment
CN115830414A (en) * 2022-12-08 2023-03-21 北京龙智数科科技服务有限公司 Face key point regression model training method, device, equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112257578B (en) * 2020-10-21 2023-07-07 平安科技(深圳)有限公司 Face key point detection method and device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846343A (en) * 2018-06-05 2018-11-20 北京邮电大学 Multi-task collaborative analysis method based on three-dimensional video
CN110188227A (en) * 2019-05-05 2019-08-30 华南理工大学 A kind of hashing image search method based on deep learning and low-rank matrix optimization
CN115311730A (en) * 2022-09-23 2022-11-08 北京智源人工智能研究院 Face key point detection method and system and electronic equipment
CN115830414A (en) * 2022-12-08 2023-03-21 北京龙智数科科技服务有限公司 Face key point regression model training method, device, equipment and storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112257578B (en) * 2020-10-21 2023-07-07 平安科技(深圳)有限公司 Face key point detection method and device, electronic equipment and storage medium
CN112767383A (en) * 2021-01-29 2021-05-07 深圳艾摩米智能科技有限公司 Face pox positioning and recognition method
CN112767383B (en) * 2021-01-29 2024-02-27 深圳艾摩米智能科技有限公司 Positioning and identifying method for facial acne
CN112949576A (en) * 2021-03-29 2021-06-11 北京京东方技术开发有限公司 Attitude estimation method, attitude estimation device, attitude estimation equipment and storage medium
CN112949576B (en) * 2021-03-29 2024-04-23 北京京东方技术开发有限公司 Attitude estimation method, apparatus, device and storage medium
CN113095310A (en) * 2021-06-10 2021-07-09 杭州魔点科技有限公司 Face position detection method, electronic device and storage medium
CN113469111A (en) * 2021-07-16 2021-10-01 中国银行股份有限公司 Image key point detection method and system, electronic device and storage medium
CN115063874A (en) * 2022-08-16 2022-09-16 深圳市海清视讯科技有限公司 Control method, device and equipment of intelligent household equipment and storage medium

Also Published As

Publication number Publication date
CN112257578B (en) 2023-07-07

Similar Documents

Publication Publication Date Title
CN112257578B (en) Face key point detection method and device, electronic equipment and storage medium
CN107977665A (en) The recognition methods of key message and computing device in a kind of invoice
CN112016312B (en) Data relation extraction method and device, electronic equipment and storage medium
CN112863683B (en) Medical record quality control method and device based on artificial intelligence, computer equipment and storage medium
CN111461168A (en) Training sample expansion method and device, electronic equipment and storage medium
CN114943673A (en) Defect image generation method and device, electronic equipment and storage medium
CN112214515A (en) Data automatic matching method and device, electronic equipment and storage medium
CN115222443A (en) Client group division method, device, equipment and storage medium
CN111858891A (en) Question-answer library construction method and device, electronic equipment and storage medium
CN113298931B (en) Reconstruction method and device of object model, terminal equipment and storage medium
CN112783825A (en) Data archiving method, data archiving device, computer device and storage medium
CN116402166B (en) Training method and device of prediction model, electronic equipment and storage medium
CN116503608A (en) Data distillation method based on artificial intelligence and related equipment
CN116629423A (en) User behavior prediction method, device, equipment and storage medium
CN116402625A (en) Customer evaluation method, apparatus, computer device and storage medium
CN116168403A (en) Medical data classification model training method, classification method, device and related medium
CN116150185A (en) Data standard extraction method, device, equipment and medium based on artificial intelligence
CN115661472A (en) Image duplicate checking method and device, computer equipment and storage medium
CN115496993A (en) Target detection method, device and equipment based on frequency domain fusion and storage medium
CN114581177A (en) Product recommendation method, device, equipment and storage medium
CN111859985B (en) AI customer service model test method and device, electronic equipment and storage medium
CN109614854B (en) Video data processing method and device, computer device and readable storage medium
CN113705749A (en) Two-dimensional code identification method, device and equipment based on deep learning and storage medium
CN116364223B (en) Feature processing method, device, computer equipment and storage medium
CN113268562B (en) Text emotion recognition method, device and equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant