CN112257578A

CN112257578A - Face key point detection method and device, electronic equipment and storage medium

Info

Publication number: CN112257578A
Application number: CN202011133910.9A
Authority: CN
Inventors: 陈嘉莉; 周超勇; 刘玉宇
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-10-21
Filing date: 2020-10-21
Publication date: 2021-01-22
Anticipated expiration: 2040-10-21
Also published as: CN112257578B

Abstract

The invention relates to the technical field of artificial intelligence, and provides a face key point detection method, a face key point detection device, electronic equipment and a storage medium. The face key point detection method comprises the following steps: inputting the face picture training data into a first residual error network to obtain a first feature map; inputting the first characteristic diagram into a geometric perception network to obtain a first geometric relation matrix; inputting the first characteristic diagram into an attention model to obtain a first weighted characteristic diagram matrix; obtaining first input data according to the first geometric relation matrix and the first weighted feature map matrix; inputting first input data into a first low-rank learning network, training the first low-rank learning network to predict key points of the face, and obtaining a second low-rank learning network; predicting face key points in face picture test data using the first residual network, the geometric perception network, the attention model and the second low-rank learning network. The method can effectively extract the key points of the face in the face picture with the shielding.

Description

Face key point detection method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of face recognition in artificial intelligence, in particular to a face key point detection method and device, electronic equipment and a storage medium.

Background

In the prior art, the detection of the key points of the face mainly depends on neural network models such as a residual error network, and the key points of the face cannot be well detected when a blocked face image is processed.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a method, an apparatus, an electronic device, and a storage medium for detecting face key points, so as to achieve fast extraction of face key points of a face image with occlusion.

A first aspect of the present application provides a method for detecting a face key point, where the method for detecting a face key point includes:

inputting face picture training data into a first residual error network, and processing the face picture training data by the first residual error network to obtain a first feature map, wherein the face picture training data comprises a face image with an occlusion flaw, and the first residual error network comprises a convolution layer, a maximum pooling layer and a residual error calculation module consisting of at least one residual error unit and is used for acquiring face image features from the face image;

inputting the first characteristic diagram into a geometric perception network, and obtaining a first geometric relation matrix through processing of the geometric perception network;

inputting the first characteristic diagram into an attention model, and processing the first characteristic diagram by the attention model to obtain a first weighted characteristic diagram matrix;

obtaining first input data according to the first geometric relation matrix and the first weighted feature map matrix;

inputting the first input data into a first low-rank learning network, training the first low-rank learning network to predict human face key points in the human face picture training data, and obtaining a trained second low-rank learning network;

inputting face picture test data into the first residual error network, processing the face picture test data by the first residual error network to obtain a second feature map, inputting the second feature map into the geometric perception network, processing the second feature map by the geometric perception network to obtain a second geometric relation matrix, inputting the second feature map into the attention model, processing the second feature map by the attention model to obtain a second weighted feature map matrix, and obtaining second input data according to the second geometric relation matrix and the second weighted feature map matrix;

inputting the second input data to the second low-rank learning network, wherein the second low-rank learning network predicts human face key points in the human face picture test data;

and outputting the face key points in the face picture test data.

Preferably, the inputting the face picture training data into a first residual error network, and obtaining a first feature map by processing through the first residual error network includes:

inputting the face picture training data into the convolutional layer in the first residual error network, and obtaining a first calculation result through calculation of the convolutional layer;

inputting the first calculation result into the maximum pooling layer of the first residual error network, and calculating by the maximum pooling layer to obtain a second calculation result;

and inputting the second calculation result into the residual calculation module of the first residual network, and calculating by the residual calculation module to obtain the first feature map.

Preferably, inputting the first feature map into a geometry-aware network, and obtaining a first geometric relationship matrix through processing by the geometry-aware network includes:

inputting the first feature map into a first convolution neural network in the geometric perception network, and obtaining a first matrix through processing of the first convolution neural network, wherein the first convolution neural network is used for obtaining a long-distance geometric relationship between face parts in a face image;

inputting the first feature map into a second convolutional neural network in the geometric perception network, and processing the first feature map by the second convolutional neural network to obtain a second matrix, wherein the second convolutional neural network is used for acquiring a local geometric relationship between face parts in a face image;

and calculating the outer product of the first matrix and the second matrix to obtain the first geometric relation matrix.

Preferably, the inputting the first feature map into an attention model and the obtaining a first weighted feature map matrix by the attention model processing include:

inputting the first feature map into a second residual error network in the attention model, and processing the first feature map by the second residual error network to obtain a feature vector, wherein the second residual error network comprises a residual error unit used for further extracting the features of the face image;

inputting the first feature map into a third convolutional neural network in the attention model, and processing the first feature map by the third convolutional neural network to obtain a single-channel feature vector, wherein the third convolutional neural network is used for extracting the weight of features in a face image;

calculating the single-channel feature vector by using a sigmoid function to obtain a probability distribution vector;

and performing element-by-element multiplication calculation on the feature vector and the probability distribution vector to obtain the first weighted feature map matrix.

Preferably, obtaining the first input data according to the first geometric relationship matrix and the first weighted feature map matrix includes:

and splicing the first geometric relation matrix and the first weighted feature map matrix to obtain the first input data.

Preferably, inputting the first input data into a first low-rank learning network, training the first low-rank learning network to predict face key points in the face picture training data, and obtaining a trained second low-rank learning network includes:

inputting the input data to a fully connected layer of the first low rank learning network;

training the first low-rank learning network to predict the face key points in the face picture training data by taking the input data as the input of the first low-rank learning network and taking the face key points in the face picture training data as the output;

and optimizing the first low-rank learning network according to a preset loss function to obtain a trained second low-rank learning network, wherein the second low-rank learning network can predict face key points in a face picture.

Preferably, optimizing the first low-rank learning network according to a preset loss function to obtain a trained second low-rank learning network includes:

according to the formula

Optimizing the weight of the full-connection layer of the first low-rank learning network to obtain a second low-rank learning network after training, wherein N is the number of samples of the face picture training data,

face keypoints predicted for the first low-rank learning network, wherein,

W^Ttransposing a weight matrix for a fully connected layer of the first low-rank learning network, M^TFor the transpose of the structure matrix, X is the input data, S ═ S₁，S₂，...，S_LThe L is the number of the face key points in the face picture training data,

is composed of

Is the square of the F norm, β is the regularization parameter for the rank of the structural matrix, rank (m) is the rank of the structural matrix.

A second aspect of the present application provides a face keypoint detection apparatus, comprising:

the residual error network computing module is used for inputting face picture training data into a first residual error network and obtaining a first feature map through processing of the first residual error network, wherein the face picture training data comprises a face image with an occlusion flaw, the first residual error network comprises a convolution layer, a maximum pooling layer and a residual error computing module consisting of at least one residual error unit and is used for obtaining face image features from the face image;

the geometric perception network computing module is used for inputting the first characteristic diagram into a geometric perception network and obtaining a first geometric relation matrix through processing of the geometric perception network;

the attention model calculation module is used for inputting the first characteristic diagram into an attention model and obtaining a first weighted characteristic diagram matrix through the attention model processing;

the splicing module is used for obtaining first input data according to the first geometric relation matrix and the first weighted feature map matrix;

the low-rank learning network training module is used for inputting the first input data into a first low-rank learning network, training the first low-rank learning network to predict face key points in the face picture training data, and obtaining a trained second low-rank learning network;

the test input construction module is used for inputting the face picture test data into the first residual error network, obtaining a second feature map through the processing of the first residual error network, inputting the second feature map into the geometric perception network, obtaining a second geometric relation matrix through the processing of the geometric perception network, inputting the second feature map into the attention model, obtaining a second weighted feature map matrix through the processing of the attention model, and obtaining second input data according to the second geometric relation matrix and the second weighted feature map matrix;

the low-rank learning network prediction module is used for inputting the second input data into the second low-rank learning network, and the second low-rank learning network predicts the human face key points in the human face picture test data;

and the output module is used for outputting the human face key points in the human face picture test data.

A third aspect of the present application provides an electronic device, comprising:

a memory storing at least one instruction; and

and the processor executes the instructions stored in the memory to realize the face key point detection method.

A fourth invention of the present application provides a computer storage medium having computer readable instructions stored thereon, which when executed by a processor, implement the face keypoint detection method.

In the invention, face picture training data is input into a first residual error network to obtain a first characteristic diagram, and the first characteristic diagram is input into a geometric perception network to obtain a first geometric relation matrix; inputting the first characteristic diagram into an attention model to obtain a first weighted characteristic diagram matrix; obtaining first input data according to the first geometric relation matrix and the first weighted feature map matrix; inputting first input data into a first low-rank learning network, training the first low-rank learning network to predict key points of the face, and obtaining a second low-rank learning network after training; the method comprises the steps of using the first residual error network, the geometric perception network, the attention model and the second low-rank learning network to predict face key points in face picture test data, obtaining a feature map of a face image through the first residual error network, capturing geometric relations among different face components through the geometric perception network, filtering irrelevant information of a background through the attention model to obtain clean feature representation, replying the shielded key points in the face image through the low-rank information network, effectively extracting the face key points in the face picture under the condition that the face picture has shielding, and solving the technical problem that the face key points cannot be detected under the condition that the face picture has shielding in the prior art.

Drawings

Fig. 1 is a flowchart of a method for detecting key points of a human face according to an embodiment of the present invention.

Fig. 2 is a block diagram of a face key point detection device according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of an electronic device according to an embodiment of the invention.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth to provide a thorough understanding of the present invention, and the described embodiments are merely a subset of the embodiments of the present invention, rather than a complete embodiment. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

Preferably, the face key point detection method is applied to one or more electronic devices. The electronic device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The electronic device may be a desktop computer, a notebook computer, a tablet computer, a cloud server, or other computing device. The device can be in man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

Example 1

Fig. 1 is a flowchart of a method for detecting key points of a human face according to an embodiment of the present invention. The order of the steps in the flow chart may be changed and some steps may be omitted according to different needs.

Referring to fig. 1, the method for detecting key points of a human face specifically includes the following steps:

step S11, inputting face picture training data into a first residual error network, and obtaining a first feature map through the processing of the first residual error network, wherein the face picture training data comprises a face image with an occlusion flaw, the first residual error network comprises a convolution layer, a maximum pooling layer and a residual error calculation module consisting of at least one residual error unit, and the residual error calculation module is used for obtaining the face image feature from the face image.

In at least one embodiment of the present invention, inputting face picture training data to a first residual error network, and obtaining a first feature map by processing the face picture training data by the first residual error network includes:

Specifically, inputting the face image training data into the convolutional layer in the first residual error network, and obtaining a first calculation result through calculation by the convolutional layer includes:

converting the face picture training data into a matrix form;

and carrying out convolution operation on the face picture training data in the converted form by using the convolution kernel of the convolution layer to obtain a first calculation result.

For example, the convolutional layer may be a convolutional layer in a depth residual error network RESNET-18, convolution operation is performed on the converted face picture training data with 2 as a step size, and the first calculation result may be a matrix with a size of 112 × 112.

Specifically, inputting the first calculation result into the maximum pooling layer of the first residual network, and calculating by the maximum pooling layer to obtain a second calculation result includes:

inputting the first calculation result into the maximum pooling layer;

and the maximum pooling layer performs maximum pooling operation on the first calculation result to obtain a second calculation result.

For example, the maximum pooling layer of the first residual network may be a maximum pooling layer in the deep residual network RESNET-18, and the maximum pooling operation is performed on the first calculation result by using 2 as a step size.

Specifically, inputting the second calculation result into the residual calculation module of the first residual network, and calculating the first feature map by the residual calculation module includes:

and inputting the second calculation result into the residual calculation module, sequentially calculating by at least one residual unit in the residual calculation module, and taking the output of the last unit in the at least one residual unit as the output of the residual calculation module.

The residual unit is represented as:

y_i＝h(x_i)+F(x_i，w_i)

x_i+1＝f(y_i)，

where F is the residual function, F is the ReLU function, w_iIs a weight matrix, x_iIs the input of the l-th layer, y_iIs the output of the l-th layer; the function h is formulated as h (x)_i)＝x_i(ii) a The residual function F is formulated as F (x)_i，w_i)＝ω_i·σ(B(w′_i)·B(x_i) Wherein B is batch normalization, w'_iIs w_iThe transpose of (c) represents a convolution.

For example, the residual calculation module of the first residual network may be the first three residual units in the depth residual network RESNET-18.

Step S12, inputting the first characteristic diagram into a geometric perception network, and obtaining a first geometric relation matrix through the processing of the geometric perception network.

In at least one embodiment of the present invention, inputting the first feature map into a geometry-aware network, and obtaining a first geometry relationship matrix through processing by the geometry-aware network includes:

and calculating the outer product of the first matrix and the second matrix to obtain the first geometric relation matrix. Specifically, inputting the first feature map into a first convolutional neural network in the geometric sense network, and obtaining a first matrix through processing by the first convolutional neural network includes:

inputting the first feature map into a first convolution layer of the first convolution neural network, and performing convolution operation on the first feature map by using a convolution core of the first convolution layer of the first convolution neural network to obtain a first convolution result;

inputting the first convolution result into a second convolution layer of the first convolution neural network, and performing convolution operation on the first convolution result by using a convolution core of the second convolution layer of the first convolution neural network to obtain a second convolution result;

and inputting the second convolution result into a third convolution layer of the first convolution neural network, and performing convolution operation on the second convolution result by using a convolution core of the third convolution layer of the first convolution neural network to obtain the first matrix.

For example, the size of the convolution kernel of the first convolution layer of the first convolution neural network may be 1 × 1, the size of the convolution kernel of the second convolution layer of the first convolution neural network may be 5 × 5, and the size of the convolution kernel of the third convolution layer of the first convolution neural network may be 1 × 1.

Specifically, inputting the first feature map into a second convolutional neural network in the geometric sense network, and obtaining a second matrix through processing by the second convolutional neural network includes:

inputting the first feature map into a first convolution layer of the second convolutional neural network, and performing convolution operation on the first feature map by using a convolution core of the first convolution layer of the second convolutional neural network to obtain a third convolution result;

inputting the third convolution result into a second convolution layer of the second convolution neural network, and performing convolution operation on the third convolution result by using a convolution core of the second convolution layer of the second convolution neural network to obtain a fourth convolution result;

and inputting the fourth convolution result into a third convolution layer of the second convolution neural network, and performing convolution operation on the fourth convolution result by using a convolution core of the third convolution layer of the second convolution neural network to obtain the second matrix.

For example, the size of the convolution kernel of the first convolution layer of the second convolutional neural network may be 1 × 1, the size of the convolution kernel of the second convolution layer of the second convolutional neural network may be 3 × 3, and the size of the convolution kernel of the third convolution layer of the second convolutional neural network may be 1 × 1.

Step S13, inputting the first feature map into an attention model, and obtaining a first weighted feature map matrix through processing by the attention model.

In at least one embodiment of the present invention, the inputting the first feature map into an attention model, and the obtaining a first weighted feature map matrix by the attention model processing includes:

Specifically, inputting the first feature map into a second residual error network in the attention model, and obtaining a feature vector through processing of the second residual error network includes:

and inputting the first feature map into a residual error unit of a second residual error network in the attention model, and calculating by the residual error unit to obtain the feature vector.

Specifically, inputting the first feature map into a third convolutional neural network in the attention model, and obtaining a single-channel feature vector through processing by the third convolutional neural network includes:

inputting the first feature map into a first convolution layer of the third convolutional neural network, and performing convolution operation on the first feature map by using a convolution core of the first convolution layer of the third convolutional neural network to obtain a fifth convolution result;

inputting the third convolution result into a second convolution layer of the third convolutional neural network, and performing convolution operation on the fifth convolution result by using a convolution core of the second convolution layer of the third convolutional neural network to obtain a sixth convolution result;

and inputting the fourth convolution result into a third convolution layer of the third convolutional neural network, and performing convolution operation on the sixth convolution result by using a convolution core of the third convolution layer of the third convolutional neural network to obtain the feature vector.

For example, the size of the convolution kernel of the first convolution layer of the third convolutional neural network may be 1 × 1, the size of the convolution kernel of the second convolution layer of the third convolutional neural network may be 3 × 3, and the size of the convolution kernel of the third convolution layer of the third convolutional neural network may be 1 × 1.

Specifically, calculating the single-channel feature vector by using a sigmoid function, and obtaining a probability distribution vector comprises:

and for each element in the single-channel feature vector, calculating the corresponding sigmoid value of each element by using the sigmoid function, and taking the sigmoid value corresponding to each element as the probability distribution vector.

Wherein the sigmoid function is in the form of

Wherein e is a natural logarithm, and x is an element to be processed.

And step S14, obtaining first input data according to the first geometric relation matrix and the first weighted feature map matrix.

In at least one embodiment of the present invention, obtaining the first input data according to the first geometric relationship matrix and the first weighted feature map matrix comprises:

Step S15, inputting the first input data into a first low-rank learning network, training the first low-rank learning network to predict the key points of the face in the face picture training data, and obtaining a trained second low-rank learning network.

In at least one embodiment of the present invention, inputting the first input data to a first low-rank learning network, training the first low-rank learning network to predict face key points in the face picture training data, and obtaining a trained second low-rank learning network includes:

In at least one embodiment of the present invention, optimizing the first low-rank learning network according to a preset loss function to obtain a trained second low-rank learning network includes:

according to the formula

face keypoints predicted for the first low-rank learning network, wherein,

W^Ttransposing a weight matrix for a fully connected layer of the first low-rank learning network, M^TFor the transpose of the structure matrix, X is the input data, S ═ S₁,S₂,., SL is the face key points in the face picture training data, L is the number of face key points in the face picture training data,

is composed of

Step S16, inputting the face picture test data into the first residual error network, obtaining a second feature map through the first residual error network processing, inputting the second feature map into the geometric perception network, obtaining a second geometric relation matrix through the geometric perception network processing, inputting the second feature map into the attention model, obtaining a second weighted feature map matrix through the attention model processing, and obtaining second input data according to the second geometric relation matrix and the second weighted feature map matrix.

Step S17, inputting the second input data to the second low-rank learning network, where the second low-rank learning network predicts the face key points in the face picture test data.

Specifically, inputting the second input data to the second low-rank learning network, where the second low-rank learning network predicting the face key points in the face picture test data includes:

inputting the second input data to a fully connected layer of the second low rank learning network;

and calculating to obtain the face key points in the face picture test data through the full connection layer of the second low-rank learning network.

And step S18, outputting the face key points in the face picture test data.

Specifically, the outputting the face key points may include:

displaying the face picture test data, identifying the face key points on the face picture test data, and outputting the coordinates of the face key points in the face picture test data.

It should be noted that, in order to ensure the privacy and security of the data and the output result in the processing process, the data and the output result in the processing process may be stored in a block chain, such as the face image training data, the first feature map, the first geometric relationship matrix, the face image test data, the second input data, the face key point, and the like.

The invention obtains a first characteristic diagram by inputting face picture training data into a first residual error network, wherein the face picture training data comprises a face image with a shielding flaw, the first characteristic diagram is input into a geometric perception network to obtain a first geometric relation matrix, the first characteristic diagram is input into an attention model to obtain a first weighted characteristic diagram matrix, first input data is obtained according to the first geometric relation matrix and the first weighted characteristic diagram matrix, the first input data is input into a first low-rank learning network, the first low-rank learning network is trained to predict key points of the face, a second low-rank learning network after training is obtained, the first residual error network, the geometric perception network, the attention model and the second low-rank learning network are used to predict key points of the face in face picture test data, the invention obtains the characteristic diagram of the face image through the first residual error network, the geometric relation between different face components is captured through a geometric perception network, irrelevant information of a background is filtered through an attention model to obtain clean feature representation, shielded key points in a face image are replied through a low-rank information network, the face key points in the face image can be effectively extracted under the condition that the face image has shielding, and the technical problem that the face key points cannot be detected under the condition that the face image has shielding in the prior art is solved.

Example 2

Fig. 2 is a block diagram of a face key point detection device 30 according to an embodiment of the present invention.

In some embodiments, the face keypoint detection apparatus 30 is implemented in an electronic device. The face keypoint detection apparatus 30 may comprise a plurality of functional modules consisting of program code segments. The program codes of the respective program segments in the face keypoint detection apparatus 30 may be stored in a memory and executed by at least one processor for performing a face keypoint detection function.

In this embodiment, the face keypoint detection apparatus 30 may be divided into a plurality of functional modules according to the functions executed by the face keypoint detection apparatus. Referring to fig. 2, the face keypoint detection apparatus 30 may include a residual network computing module 301, a geometric perception network computing module 302, an attention model computing module 303, a splicing module 304, a low-rank learning network training module 305, a test input construction module 306, a low-rank learning network prediction module 307, and an output module 308. The module referred to herein is a series of computer readable instruction segments stored in a memory that can be executed by at least one processor and that can perform a fixed function. In some embodiments, the functionality of the modules will be described in greater detail in subsequent embodiments.

The residual network computing module 301 inputs the face image training data into a first residual network, and obtains a first feature map by processing the face image training data through the first residual network, where the face image training data includes a face image with an occlusion defect, the first residual network includes a convolution layer, a maximum pooling layer, and a residual computing module composed of at least one residual unit, and is used to obtain a face image feature from the face image.

converting the face picture training data into a matrix form;

inputting the first calculation result into the maximum pooling layer;

The residual unit is represented as:

y_i＝h(x_i)+F(x_i，w_i)

x_i+1＝f(y_i)，

The geometric sense network computing module 302 inputs the first feature map to a geometric sense network, and obtains a first geometric relationship matrix through processing of the geometric sense network.

Specifically, inputting the first feature map into a first convolutional neural network in the geometric sense network, and obtaining a first matrix through processing by the first convolutional neural network includes:

inputting the third convolution result into a second convolution layer of the second convolution neural network, and performing convolution operation on the third convolution result by using a convolution core of the second convolution layer of the second convolution neural network to obtain a fourth convolution result; and inputting the fourth convolution result into a third convolution layer of the second convolution neural network, and performing convolution operation on the fourth convolution result by using a convolution core of the third convolution layer of the second convolution neural network to obtain the second matrix.

The attention model calculating module 303 inputs the first feature map into an attention model, and obtains a first weighted feature map matrix through processing of the attention model.

Wherein the sigmoid function is in the form of

Wherein e is a natural logarithm, and x is an element to be processed.

The splicing module 304 obtains first input data according to the first geometric relationship matrix and the first weighted feature map matrix.

The low-rank learning network training module 305 inputs the first input data to a first low-rank learning network, trains the first low-rank learning network to predict face key points in the face picture training data, and obtains a trained second low-rank learning network.

according to the formula

is the firstThe low rank learning network predicts the key points of the face, wherein,

is composed of

The test input construction module 306 inputs the face picture test data to the first residual error network, obtains a second feature map through the processing of the first residual error network, inputs the second feature map to the geometric perception network, obtains a second geometric relationship matrix through the processing of the geometric perception network, inputs the second feature map to the attention model, obtains a second weighted feature map matrix through the processing of the attention model, and obtains second input data according to the second geometric relationship matrix and the second weighted feature map matrix.

The low-rank learning network prediction module 307 inputs the second input data to the second low-rank learning network, and the second low-rank learning network predicts the face key points in the face picture test data.

The output module 308 outputs the face key points in the face picture test data.

Specifically, the outputting the face key points may include:

Example 3

Fig. 3 is a schematic diagram of an electronic device 6 according to an embodiment of the invention.

The electronic device 6 comprises a memory 61, a processor 62 and computer readable instructions stored in the memory 61 and executable on the processor 62. The processor 62, when executing the computer readable instructions, implements the steps in the above-mentioned embodiment of the face keypoint detection method, such as the steps S11 to S18 shown in fig. 1. Alternatively, the processor 62, when executing the computer readable instructions, implements the functions of the modules/units in the above-mentioned embodiment of the face keypoint detection apparatus, such as the modules 301 to 308 in fig. 2.

Illustratively, the computer readable instructions may be partitioned into one or more modules/units that are stored in the memory 61 and executed by the processor 62 to implement the present invention. The one or more modules/units may be a series of computer-readable instruction segments capable of performing specific functions, which are used for describing the execution process of the computer-readable instructions in the electronic device 6. For example, the computer readable instructions can be divided into a residual network computing module 301, a geometry-aware network computing module 302, an attention model computing module 303, a splicing module 304, a low-rank learning network training module 305, a test input construction module 306, a low-rank learning network prediction module 307, and an output module 308 in fig. 2, and the specific functions of each module are described in embodiment 2.

In this embodiment, the electronic device 6 may be a computing device such as a desktop computer, a notebook, a palm computer, a server, and a cloud terminal device. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the electronic device 6, and does not constitute a limitation of the electronic device 6, and may include more or less components than those shown, or combine certain components, or different components, for example, the electronic device 6 may further include an input-output device, a network access device, a bus, etc.

The Processor 62 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor 62 may be any conventional processor or the like, the processor 62 being the control center for the electronic device 6, with various interfaces and lines connecting the various parts of the overall electronic device 6.

The memory 61 may be used for storing the computer readable instructions and/or modules/units, and the processor 62 implements various functions of the electronic device 6 by executing or executing the computer readable instructions and/or modules/units stored in the memory 61 and calling data stored in the memory 61. The memory 61 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to the use of the electronic device 6, and the like. In addition, the memory 61 may include volatile memory and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other storage device.

The integrated modules/units of the electronic device 6, if implemented in the form of software functional modules and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the above embodiments may be implemented by using computer readable instructions to instruct the related hardware, where the computer readable instructions may be stored in a computer readable storage medium, and when the computer readable instructions are executed by a processor, the steps of the above methods embodiments may be implemented. Wherein the computer readable instructions comprise computer readable instruction code which may be in source code form, object code form, an executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying the computer-readable instruction code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), and the like.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

In addition, each functional module in each embodiment of the present invention may be integrated into the same processing module, or each module may exist alone physically, or two or more modules may be integrated into the same module. The integrated module can be realized in a hardware form, and can also be realized in a form of hardware and a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is to be understood that the word "comprising" does not exclude other modules or steps, and the singular does not exclude the plural. Several modules or electronic devices recited in the electronic device claims may also be implemented by one and the same module or electronic device by means of software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A face key point detection method is characterized by comprising the following steps:

and outputting the face key points in the face picture test data.

2. The method of claim 1, wherein the inputting the face image training data into a first residual error network and obtaining a first feature map by processing the face image training data through the first residual error network comprises:

3. The method for detecting face key points according to claim 1, wherein inputting the first feature map into a geometric sense network, and obtaining a first geometric relationship matrix through processing of the geometric sense network comprises:

4. The method of claim 1, wherein the inputting the first feature map into an attention model and obtaining a first weighted feature map matrix by the attention model processing comprises:

5. The method of claim 1, wherein obtaining first input data according to the first geometric relationship matrix and the first weighted feature map matrix comprises:

6. The method of claim 1, wherein inputting the first input data into a first low-rank learning network, training the first low-rank learning network to predict face keypoints in the face picture training data, and obtaining a trained second low-rank learning network comprises:

7. The method for detecting the key points of the human face according to claim 6, wherein the step of optimizing the first low-rank learning network according to a preset loss function to obtain a trained second low-rank learning network comprises the following steps:

according to the formula

face keypoints predicted for the first low-rank learning network, wherein,

is composed of

8. A face keypoint detection device, characterized in that it comprises:

9. An electronic device, characterized in that the electronic device comprises:

a memory storing at least one instruction; and

a processor executing instructions stored in the memory to implement the face keypoint detection method of any of claims 1 to 7.

10. A computer storage medium having computer readable instructions stored thereon which, when executed by a processor, implement a method of face keypoint detection as claimed in any one of claims 1 to 7.