CN111126268B

CN111126268B - Key point detection model training method and device, electronic equipment and storage medium

Info

Publication number: CN111126268B
Application number: CN201911346309.5A
Authority: CN
Inventors: 钟韬
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2019-12-24
Filing date: 2019-12-24
Publication date: 2023-04-25
Anticipated expiration: 2039-12-24
Also published as: CN111126268A

Abstract

The embodiment of the application provides a key point detection model training method, a device, electronic equipment and a storage medium, which relate to the technical field of computers, and are characterized in that the distance from a predicted key point to a preset virtual straight line passing through the predicted key point and a real key point with semantic relation with the predicted key point is calculated, the shortest distance is taken as the distance between the point lines of the predicted key point, a loss value is calculated according to the distance between the point lines, so that the key point error caused by semantic ambiguity is fully considered, the preset key point detection model is trained according to the loss value, the key point error caused by the semantic ambiguity is fully considered in the training process of the preset key point detection model, the problem that the key point detection precision is poor is solved, and the accuracy of the key point detection is improved.

Description

Key point detection model training method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and apparatus for training a key point detection model, an electronic device, and a storage medium.

Background

Along with the development of computer vision technology, the target detection application is more and more extensive, and the target detection method has wide application prospects in various fields such as face recognition, safety monitoring and dynamic tracking. The key points refer to the parts with stable and important semantic information on the target, and the target key point positioning refers to the positions of the key points of the specific target after the specific target is detected, and the position information of the key points of the specific target is output. The positioning of the target key points has very important practical application value in a plurality of fields such as target attribute analysis, gesture recognition, gesture correction and the like.

For example, in most face applications, accurate detection of face key points is required, and the main principle is that an input face picture is positioned to locate the positions of all face key points in the picture, and common key points include 21 points, 68 points, 106 points, 240 points and the like. The loss function in the process of training the key point detection model is an important parameter for quantifying whether the key points are accurate, and the currently commonly used loss functions include MSE (Mean Squared Error, mean square error), MAE (Mean Absolute Error ), wing loss function and the like.

The inventor finds that the error problem of the key points at the semantic ambiguity is not considered by adopting the loss function in the research, so that the error exists in the key points predicted by the trained key point detection model, and the precision is poor.

Disclosure of Invention

The embodiment of the application aims to provide a key point detection model training method, device, electronic equipment and storage medium, so as to solve the problem of poor key point detection precision and improve the accuracy of key point detection.

The specific technical scheme is as follows:

in a first aspect, an embodiment of the present application provides a method for training a keypoint detection model, where the method includes:

Obtaining a sample picture, wherein the sample picture corresponds to a sample picture mark, the sample picture mark comprises a preset number of real key points, and each real key point corresponds to a position coordinate;

inputting the sample picture into a preset key point detection model for processing to obtain a preset number of predicted key points, wherein each predicted key point corresponds to a position coordinate;

calculating the point line distance of each predicted key point according to the position coordinates of each predicted key point and the position coordinates of each real key point, wherein the point line distance of each predicted key point is the shortest distance from any predicted key point to a preset virtual straight line, the preset virtual straight line is a straight line passing through the real key point with semantic relation to the predicted key point, the real key point with semantic relation to the predicted key point is a target real key point, and the preset virtual straight line also passes through the real key point with the same semantic type as the target real key point and adjacent to the target real key point;

calculating a loss value according to each point-line distance;

and training the preset key point detection model according to the loss value to obtain a trained key point detection model.

Optionally, the sample picture is a picture with facial features, a picture with human features or a picture with gesture features.

Optionally, before the step of calculating the loss value according to the distance between the dotted lines, the method further includes:

according to the position coordinates of the predicted key points and the position coordinates of the real key points, the Euclidean distance between the predicted key points and the real key points with the same semantic meaning is calculated, and the Euclidean distance of the predicted key points is obtained;

the calculating a loss value according to the point-line distance comprises the following steps:

and calculating a loss value according to the line distance of each point, the Euclidean distance, the preset point line distance weight and the preset Euclidean distance weight.

Optionally, the calculating the loss value according to the line distance of each point, the euclidean distance, the preset line distance weight and the preset euclidean distance weight includes:

calculating the average value of all the point line distances according to each point line distance to obtain the average value of the point line distances;

calculating the average value of all Euclidean distances according to each Euclidean distance to obtain an average value of Euclidean distances;

and calculating a loss value according to the average value of the dot line distance, the average value of the Euclidean distance, the preset weight of the dot line distance and the preset weight of the Euclidean distance.

Optionally, the calculating the loss value according to the average value of the distance between the dotted line and the average value of the euclidean distance, the preset distance weight of the dotted line and the preset euclidean distance weight includes:

the loss value is calculated according to the following formula:

l＝α×l _sa +β×l _mse

wherein ,

wherein, alpha is the preset point line distance weight, beta is the preset Euclidean distance weight, and l _sa For the point-line distance average value, l _mse The Euclidean distance average value is obtained, and l is a loss value; n represents the number of predicted key points, i represents the ith predicted key point, P _i Represents the ith prediction key point, G _i Represents the i-th true key point, F (P _i ,G _i ) Representing P _i Wherein P is _i And G _i Has a semantic relationship with respect to the other,

x-coordinate value representing the ith predicted key,/->

Y-coordinate value representing the ith predicted key, for example>

X-coordinate value representing the ith real key,/->

A y-coordinate value representing the ith real key point,

and representing the Euclidean distance of the i-th predicted key point.

In a second aspect, an embodiment of the present application provides a method for detecting a keypoint, where the method includes:

acquiring a picture to be detected;

inputting the picture to be detected into a preset key point detection model for analysis to obtain a plurality of preset key points of the picture to be detected, wherein the preset key point detection model is trained by the key point detection model training method according to any one of the first aspect.

In a third aspect, an embodiment of the present application provides a device for training a keypoint detection model, where the device includes:

the acquisition module is used for acquiring a sample picture, wherein the sample picture corresponds to a sample picture mark, the sample picture mark comprises a preset number of real key points, and each real key point corresponds to a position coordinate;

the processing module is used for inputting the sample picture into a preset key point detection model for processing to obtain a preset number of predicted key points, and each predicted key point corresponds to a position coordinate;

the first calculation module is used for calculating the point line distance of each predicted key point according to the position coordinates of each predicted key point and the position coordinates of each real key point, wherein the point line distance of each predicted key point is the shortest distance from any predicted key point to a preset virtual straight line, the preset virtual straight line is a straight line passing through the real key point with semantic relation with the predicted key point, the real key point with semantic relation with the predicted key point is a target real key point, and the preset virtual straight line also passes through the real key point with the same semantic type as the target real key point and adjacent to the target real key point;

The second calculation module is used for calculating a loss value according to the point-line distance;

and the training module is used for training the preset key point detection model according to the loss value to obtain a trained key point detection model.

Optionally, the apparatus further includes:

the third calculation module is used for calculating the Euclidean distance between each predicted key point and the real key point with the same semantic according to the position coordinates of each predicted key point and the position coordinates of each real key point to obtain the Euclidean distance of each predicted key point;

the second computing module is specifically configured to:

Optionally, the second computing module is specifically configured to:

the loss value is calculated according to the following formula:

l＝α×l _sa +β×l _mse

wherein ,

x-coordinate value representing the ith predicted key,/->

Y-coordinate value representing the ith predicted key, for example>

X-coordinate value representing the ith real key,/->

A y-coordinate value representing the ith real key point,

and representing the Euclidean distance of the i-th predicted key point.

In a fourth aspect, embodiments of the present application provide a keypoint detection device, including:

the acquisition module is used for acquiring a picture to be detected;

the prediction module is configured to input the picture to be detected into a preset key point detection model for analysis, so as to obtain a plurality of preset key points of the picture to be detected, where the preset key point detection model is trained by using the key point detection model training method according to any one of the first aspect.

In a fifth aspect, embodiments of the present application provide an electronic device, including: processor, communication interface, memory and communication bus, wherein:

the processor, the communication interface, the memory accomplish the mutual communication through the communication bus;

the memory is used for storing a computer program;

the processor is configured to implement any one of the above-described key point detection model training methods when executing the program stored in the memory.

In a sixth aspect, an embodiment of the present application provides an electronic device, including: processor, communication interface, memory and communication bus, wherein:

the memory is used for storing a computer program;

the processor is configured to implement any one of the key point detection methods described in the second aspect when executing the program stored in the memory.

In a seventh aspect, embodiments of the present application provide a storage medium having instructions stored therein, which when executed on a computer, cause the computer to perform the keypoint detection model training method of any of the first aspects described above.

In an eighth aspect, embodiments of the present application provide a storage medium having stored therein instructions that, when executed on a computer, cause the computer to perform the keypoint detection method of any of the above second aspects.

In a ninth aspect, embodiments of the present application provide a computer program product comprising instructions that, when run on a computer, cause the computer to perform the keypoint detection model training method of any of the above-described first aspects.

In a tenth aspect, embodiments of the present application provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the keypoint detection method of any of the above second aspects.

According to the method, the device, the electronic equipment, the storage medium and the computer program product containing the instructions for training the key point detection model, the shortest distance is taken as the point line distance of the predicted key point by calculating the distance from the predicted key point to the preset virtual straight line passing through the predicted key point and the real key point with semantic relation with the predicted key point, the key point error caused by semantic ambiguity is fully considered, the preset key point detection model is trained according to the loss value, the key point error caused by semantic ambiguity is fully considered in the training process of the preset key point detection model, the problem that the key point detection precision is poor is solved, and the accuracy of the key point detection is improved. Of course, not all of the above-described advantages need be achieved simultaneously in practicing any one of the products or methods of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1a is a first schematic diagram of a training method of a key point detection model according to an embodiment of the present application;

FIG. 1b is a second schematic diagram of a training method of a keypoint detection model according to an embodiment of the present application;

FIG. 1c is a third schematic diagram of a training method of a keypoint detection model according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a method for detecting key points according to an embodiment of the present application;

FIG. 3a is a schematic diagram of a first exemplary embodiment of a training device for a keypoint detection model;

FIG. 3b is a second schematic diagram of a training device for a keypoint detection model according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a key point detecting device according to an embodiment of the present application;

fig. 5 is a schematic diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

In order to solve the problem of poor detection precision of key points and improve the detection precision of the key points, the application discloses a training method of a key point detection model, which comprises the following steps:

Calculating a loss value according to each point-line distance;

The distance from the predicted key point to the preset virtual straight line passing through the predicted key point and the real key point with semantic relation with the predicted key point is calculated, the shortest distance is taken as the point line distance of the predicted key, the loss value is calculated according to the point line distance, so that the key point error caused by semantic ambiguity is fully considered, the preset key point detection model is trained according to the loss value, the key point error caused by semantic ambiguity is fully considered in the training process of the preset key point detection model, and the problem that the key point detection precision is poor is solved, and the accuracy of the key point detection is improved.

The embodiment of the application provides a method for training a key point detection model, referring to fig. 1a, fig. 1a is a first schematic diagram of the method for training a key point detection model according to the embodiment of the application, which includes the following steps:

step 110, obtaining a sample picture, wherein the sample picture corresponds to a sample picture mark, the sample picture mark comprises a preset number of real key points, and each real key point corresponds to a position coordinate.

The method for training the key point detection model in the embodiment of the application can be realized through electronic equipment, and specifically, the electronic equipment can be a server or the like.

In the training process of the preset key point detection model, sample pictures are required to be obtained, so that the preset key point detection model is trained according to the sample pictures, and the sample pictures can be pictures with facial features, pictures with human body features or pictures with gesture features. The sample picture corresponds to a sample picture mark, the sample picture mark comprises a preset number of real key points, and each real key point corresponds to a position coordinate. For example, in face recognition, if key points of a face need to be detected, the preset sample picture set is a picture set with a face feature, and includes a plurality of sample pictures with a face feature, where the sample pictures correspond to sample picture marks, the sample picture marks include 21 real key points, and the real key points correspond to position coordinates. The real key points of the sample picture of the face feature are generated through the five sense organs and the face outline of the face of the person, the number of the real key points and the positions of the key points are set according to actual needs, for example, the key positions of eyes, upper lips, lower lips and cheeks of the face of the person are marked, a sample picture mark is generated, wherein the key points of each part are key points of the same type of semantic relation, the semantic meaning can be simply regarded as meaning of concepts represented by the key points corresponding to the data, the relation among the meanings is interpretation and logic representation of the key points, and the semantic relation type refers to the type of the semantic relation divided according to different classification conditions. Classifying the semantic relationship may help understand the meaning and characteristics of the semantic relationship so as to play a role of the semantic relationship in information organization, for example, in a face, a part may be divided according to the key points, and a key point of a certain part is called a key point of the same type of semantic relationship, for example, a key point of a cheek may be called a key point of the same type of semantic relationship, a key point of a left eye may be called a key point of the same type of semantic relationship, and the like.

And 120, inputting the sample picture into a preset key point detection model for processing to obtain a preset number of predicted key points, wherein each predicted key point corresponds to a position coordinate.

And inputting the sample picture into a preset key point detection model for processing to obtain a preset number of predicted key points, wherein each predicted key point corresponds to a position coordinate, for example, the sample picture is a picture with face characteristics, wherein the sample picture mark comprises 68 real key points, the sample picture is input into the preset key point detection model for processing to obtain 68 predicted key points, each predicted key point corresponds to a position coordinate, wherein each predicted key point corresponds to the 68 real key points one by one, has a semantic relation, predicts key points and real key points in one-to-one correspondence, and can be numbered when the predicted key points and the real key points are generated, and the predicted key points and the real key points are represented by the numbers, for example, the left eye has 6 real key points, the mark is a left eye true 1, a left eye true 2, a left eye true 3, a left eye true 4, a left eye 5, a left eye has 6 predicted key points, the mark is a left eye predicted 1, the left eye predicted 2, the left eye predicted 3 and the true key points have a semantic relation, and the left eye predicted key points have a true semantic relation of the left eye predicted 1, the left eye predicted 3 and the left eye predicted 2, and the left eye predicted 4 have a true semantic relation of the left eye predicted key point and the left eye predicted 2.

And 130, calculating the point-line distance of each predicted key point according to the position coordinates of each predicted key point and the position coordinates of each real key point, wherein the point-line distance of each predicted key point is the shortest distance from any predicted key point to a preset virtual straight line, the preset virtual straight line is a straight line passing through the real key point with semantic relation to the predicted key point, the real key point with semantic relation to the predicted key point is a target real key point, and the preset virtual straight line also passes through the real key point with the same semantic type as the target real key point and adjacent to the target real key point.

For any predicted key point, the point line distance of the predicted key point is the shortest distance between the predicted key point and a preset virtual straight line, wherein the preset virtual straight line represents the spatial relationship between two points, the preset virtual straight line passes through a real key point with semantic relationship with the predicted key point, and in addition, the preset virtual straight line also passes through a real key point which has the same semantic relationship type with the real key point with semantic relationship with the predicted key point and is adjacent to the real key point with semantic relationship with the predicted key point. The key points of each part are key points of the same type of semantic relation, for example, when the real key points of the cheek are generated, the real key points can be marked with numbers in sequence, each adjacent real key point with the same semantic relation type is connected according to the numbers of the real key points to obtain each preset virtual straight line, and in addition, each real key point with the shortest distance in the real key points with the same type of semantic relation type can be connected in a straight line according to the position coordinates of each real key point to obtain each preset virtual straight line. For example, the sample picture is a sample picture with a face feature, wherein the sample picture mark includes 68 real key points respectively located on the left eye, the right eye, the nose, the upper lip, the lower lip, the left eyebrow, the right eyebrow, and the cheek. Wherein the left eye has 6 key points, the right eye has 6 key points, and the cheek has 16 key points. According to the position coordinates and the semantic relation types of the real key points, connecting adjacent real key points with the same semantic relation type to obtain preset virtual straight lines, wherein the same semantic relation type refers to the same type of real key points, for example, the real key points of the outer contour of the cheek are the same semantic relation type of real key points, the real key points on the outer contour line of the upper lip are the same semantic relation type of real key points, the real key points on the outer contour line of the lower lip are the same semantic relation type of real key points, for example, connecting adjacent real key points of 16 key points of the cheek to obtain preset virtual straight lines of the cheek part, and connecting adjacent real key points of 6 key points of the left eye to obtain preset virtual straight lines of the left eye part.

And connecting adjacent real key points with the same semantic relation type according to the position coordinates of the real key points to obtain preset virtual straight lines, and calculating the shortest distance from the predicted key points to the preset virtual straight lines according to the position coordinates of the predicted key points.

For example, the sample picture is a picture with facial features, wherein the cheek part contains n key points, G _i Represents the i-th real key point, i e { 1..the n }, n>3, then i>1, with G _i Adjacent real key points are G _i-1 and G_i+1 Will G _i and G_i-1 Connecting to obtain a straight line X _i-1,i Will G _i and G_i+1 Connecting to obtain a straight line X _i,i+1 . Calculating the point-line distance of each predicted key point according to the position coordinates of the predicted key point, e.g. P _i Represents the ith prediction key point because of P _i Represents the ith prediction key point, G _i Representing the ith real key point, representing P if the predicted key point and the real key point have semantic relations according to the representation of the pass number _i Is G _i Corresponding prediction key point, P _i And G _i With semantic relationships, then calculate P separately _i To straight line X _i-1,i And straight line X _i,i+1 Wherein the shortest distance is P _i Is denoted as F (P) _i ,G _i )。When i=1, and G ₁ Adjacent real key points are G ₂ Will G ₁ and G₂ Connecting to obtain a straight line X _1,2 . Predicting key point P ₁ Calculating P ₁ Point-to-line distance, P ₁ And G ₁ Semantically identical, calculate P ₁ To straight line X _1,2, wherein ,P₁ To X _1,2 Is a distance P _i Is a line spacing of points (a). The loss value between the real key point and the predicted key point can be further calculated through the point line distance, because the shortest distance from the point line distance to the predicted key point to the preset virtual straight line passing through the real key point with semantic relation with the predicted key point, the key point error caused by semantic ambiguity can be fully considered, and the preset key point detection model is trained according to the loss value, so that the key point error caused by semantic ambiguity is fully considered in the training process of the preset key point detection model, the problem that the key point detection precision is poor is further solved, and the accuracy of the key point detection is improved.

And 140, calculating a loss value according to each point-line distance.

The sum of the dot-line distances is obtained by calculating the sum of the dot-line distances, the average value of the dot-line distances is calculated according to the sum of the dot-line distances, and the average value of the dot-line distances is used as a loss value, so that the key point errors caused by semantic ambiguity are fully considered, and the preset key point detection model is trained according to the loss value, so that the key point errors caused by the semantic ambiguity are fully considered in the training process of the preset key point detection model, the problem of poor key point detection precision is solved, and the accuracy of key point detection is improved.

And step 150, training the preset key point detection model according to the loss value to obtain a trained key point detection model.

And adjusting parameters of the preset key point detection model according to the loss value, and then continuing training according to the sample picture until a preset training ending condition is met, for example, the preset training ending condition is that 500 sample pictures are trained, or the preset training ending condition is that the loss value does not exceed a preset threshold value, and the like, so as to obtain a trained key point detection model, and training the preset key point detection model according to the loss value, so that the key point error caused by semantic ambiguity is fully considered in the training process of the preset key point detection model, the problem that the key point detection precision is poor is solved, and the accuracy of the key point detection is improved. Specifically, the training method of the preset key point detection model may refer to the training method of the model in the existing/related technology.

Calculating the distance from a predicted key point to a straight line passing through the predicted key point and a real key point with semantic relation with the predicted key point, taking the shortest distance as the distance between the point lines of the predicted key point, calculating a loss value according to the distance between the point lines, thereby fully considering the key point error caused by semantic ambiguity, training a preset key point detection model according to the loss value, fully considering the key point error caused by semantic ambiguity in the training process of the preset key point detection model, and further solving the problem of poor key point detection precision and improving the accuracy of key point detection.

In one possible embodiment, the sample picture is a picture with facial features, a picture with human features, or a picture with gesture features.

The pictures with the facial features can be pictures with the facial features or pictures with the animal facial features. The sample picture is a picture with facial features, a picture with human body features or a picture with gesture features, and is used for detecting a target with facial features, a target with human body features or a target with gesture features.

Referring to fig. 1b, fig. 1b is a second schematic diagram of a method for training a keypoint detection model according to an embodiment of the present application, in one possible implementation, before the step of calculating the loss value according to each of the above-mentioned dotted distances, the method further includes:

step 160, calculating the Euclidean distance between each predicted key point and the true key point with the same semantic meaning according to the position coordinates of each predicted key point and the position coordinates of each true key point, so as to obtain the Euclidean distance of each predicted key point;

the calculating a loss value according to each of the dot line distances includes:

step 141, calculating a loss value according to each of the above-mentioned dot-line distances, each of the above-mentioned Euclidean distances, a preset dot-line distance weight, and a preset Euclidean distance weight.

Based on the position coordinates of each of the predicted key points, the euclidean distance between each of the predicted key points and the true key point having the same meaning is calculated, for example,

x-coordinate value representing the ith predicted key,/->

Y-coordinate value representing the ith predicted key, for example>

X-coordinate value representing the ith real key,/->

Y-coordinate value representing the ith real key point

And representing the Euclidean distance of the i-th predicted key point.

And calculating a loss value according to the line distance of each point, the Euclidean distance, the preset point-line distance weight and the preset Euclidean distance weight. And finally, obtaining a weighted sum of the dot line distances by presetting the dot line distance weight as a sum giving coefficient of the dot line distances, obtaining a weighted sum of the Euclidean distances by presetting the Euclidean distance weight as a sum giving coefficient of the Euclidean distances, and calculating a loss value by calculating the sum of the weighted sum of the dot line distances and the weighted sum of the Euclidean distances. According to the line distance of each point, the Euclidean distance, the preset point distance weight and the preset Euclidean distance weight, a loss value is calculated, so that the loss value for adjusting the parameters of the preset key point detection model considers the point distance from a predicted key point to a preset virtual straight line passing through a real key point with a semantic relation with the predicted key point and also considers the distance from the predicted key point to two points of the real key point with the semantic relation, thereby fully considering the key point error caused by semantic ambiguity, training the preset key point detection model according to the loss value, fully considering the key point error caused by semantic ambiguity in the training process of the preset key point detection model, solving the problem of poor key point detection precision and improving the accuracy of key point detection.

Alternatively, the loss value is calculated according to the following formula:

l＝α×l _sa +β×l _mse

wherein ,

wherein, alpha is the preset point line distance weight, beta is the preset Euclidean distance weight, and l _sa For the average value of the distance between the dotted lines, l _mse The Euclidean distance average value is represented by l, which is a loss value; n represents the number of predicted key points, i represents the ith predicted key point, P _i Represents the ith prediction key point, G _i Represents the i-th true key point, F (P _i ,G _i ) Representing P _i Wherein P is _i And G _i Has a semantic relationship with respect to the other,

x-coordinate value representing the ith predicted key,/->

Y-coordinate value representing the ith predicted key, for example>

X-coordinate value representing the ith real key,/->

A y-coordinate value representing the ith real key point,

and representing the Euclidean distance of the i-th predicted key point.

The embodiment of the present application provides a method for training a keypoint detection model, referring to fig. 1c, fig. 1c is a third schematic diagram of the method for training a keypoint detection model according to the embodiment of the present application, in one possible implementation manner, the calculating a loss value according to each of the above-mentioned dot line distances, each of the above-mentioned euclidean distances, a preset dot line distance weight, and a preset euclidean distance weight includes:

step 1411, calculating an average value of all the line pitches according to each line pitch to obtain an average value of the line pitches;

Step 1412, calculating an average value of all Euclidean distances according to each Euclidean distance to obtain an average value of Euclidean distances;

step 1413, calculating a loss value according to the average value of the dot line distance, the average value of the Euclidean distance, the preset dot line distance weight and the preset Euclidean distance weight.

The cheek part comprises n key points, the average value of all the line distances of the points is calculated according to the line distances of the points, and the average value of the line distances of the points is obtained, and the average value of the line distances of the points is expressed as l _sa ，

Calculating the average value of all Euclidean distances according to the Euclidean distances to obtain an average value of Euclidean distances, wherein the average value of Euclidean distances is denoted as l _mse ，/>

The preset point distance weight is alpha, the preset Euclidean distance weight is beta, and the loss value is calculated according to the point distance average value, the Euclidean distance average value, the preset point distance weight and the preset Euclidean distance weightFor example, the loss value is expressed as l, l=α×l _sa +β×l _mse 。

The embodiment of the application provides a key point detection method, referring to fig. 2, fig. 2 is a schematic diagram of the key point detection method of the embodiment of the application, including the following steps:

step 210, obtaining a picture to be detected.

The key point detection method of the embodiment of the application may be implemented by an electronic device, and in particular, the electronic device may be a server or the like.

And acquiring a picture to be detected, so that the picture to be detected is input into a preset key point detection model, and a plurality of preset key points of the picture to be detected are obtained.

Step 220, inputting the picture to be detected into a preset key point detection model to obtain a plurality of preset key points of the picture to be detected, wherein the preset key point detection model is trained by the key point detection model training method according to any one of the embodiments.

After the preset key point detection model is trained by the key point detection model training method in any one of the above embodiments, inputting the picture to be detected into the preset key point detection model for analysis, so as to obtain a plurality of preset key points of the picture to be detected. For example, after the sample picture is a face picture and the preset key point detection model is trained by the key point detection model training method according to any one of the embodiments, the preset key point detection model may be used to predict the key point on the face picture, and then the picture to be detected is input into the preset key point detection model for analysis, so as to obtain a plurality of preset key points of the picture to be detected, and implement prediction of the key point of the picture to be detected.

The embodiment of the application further provides an apparatus, referring to fig. 3a, fig. 3a is a first schematic diagram of a device for training a keypoint detection model according to the embodiment of the application, where the apparatus includes:

an obtaining module 310, configured to obtain a sample picture, where the sample picture corresponds to a sample picture mark, the sample picture mark includes a preset number of real key points, and each of the real key points corresponds to a position coordinate;

the processing module 320 is configured to input the sample picture into a preset key point detection model for processing, so as to obtain a preset number of predicted key points, where each of the predicted key points corresponds to a position coordinate;

a first calculation module 330, configured to calculate a point-line distance of each of the predicted key points according to the position coordinates of each of the predicted key points and the position coordinates of each of the real key points, where, for any one of the predicted key points, the point-line distance of the predicted key point is a shortest distance among distances from the predicted key point to a preset virtual straight line, the preset virtual straight line is a straight line passing through the real key point having a semantic relationship with the predicted key point, the real key point having a semantic relationship with the predicted key point is a target real key point, and the preset virtual straight line also passes through the real key point having the same semantic type as the target real key point and being adjacent to the target real key point;

A second calculation module 340, configured to calculate a loss value according to each of the dot line distances;

the training module 350 is configured to train the preset keypoint detection model according to the loss value, so as to obtain a trained keypoint detection model.

Referring to fig. 3b, fig. 3b is a second schematic diagram of a keypoint detection model training device according to an embodiment of the present application, and in one possible implementation, the sample picture is a picture with facial features, a picture with human body features, or a picture with gesture features.

In one possible embodiment, the apparatus further includes:

a third calculation module 360, configured to calculate the euclidean distance between each predicted key point and the true key point with the same semantic meaning according to the position coordinates of each predicted key point and the position coordinates of each true key point, so as to obtain the euclidean distance between each predicted key point;

the second computing module 340 is specifically configured to:

and calculating a loss value according to the line distance of each point, the Euclidean distance, the preset point-line distance weight and the preset Euclidean distance weight.

In one possible implementation, the second computing module 340 is specifically configured to:

calculating the average value of all the line distances of the points according to the line distances of the points to obtain the average value of the line distances of the points;

and calculating a loss value according to the average value of the dot line distance, the average value of the Euclidean distance, the preset dot line distance weight and the preset Euclidean distance weight.

the loss value is calculated according to the following formula:

l＝α×l _sa +β×l _mse

wherein ,

x-coordinate value representing the ith predicted key,/->

Y-coordinate value representing the ith predicted key, for example>

X-coordinate value representing the ith real key,/->

A y-coordinate value representing the ith real key point,

and representing the Euclidean distance of the i-th predicted key point.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

The embodiment of the present application further provides an apparatus, referring to fig. 4, fig. 4 is a schematic diagram of a key point detection apparatus according to an embodiment of the present application, where the apparatus includes:

the acquisition module 410 is configured to acquire a picture to be detected;

the prediction module 420 is configured to input the picture to be detected into a preset key point detection model for analysis, so as to obtain a plurality of preset key points of the picture to be detected, where the preset key point detection model is trained by using any one of the key point detection model training methods described in the foregoing embodiments.

The embodiment of the application further provides an electronic device, referring to fig. 5, fig. 5 is a schematic diagram of the electronic device in the embodiment of the application, including: processor 510, communication interface 520, memory 530, and communication bus 540, wherein processor 510, communication interface 520, memory 530 communicate with each other via communication bus 540,

the memory 530 is used for storing a computer program;

The processor 510 is configured to execute the computer program stored in the memory 530, and implement the following steps:

Calculating a loss value according to each point-line distance;

Optionally, the processor 510 is configured to execute the program stored in the memory 530, and may implement any of the above-mentioned keypoint detection model training methods.

The embodiment of the application also provides electronic equipment, which comprises: the processor, the communication interface, the memory and the communication bus, wherein the processor, the communication interface and the memory complete the communication with each other through the communication bus,

the memory is used for storing a computer program;

the processor is configured to execute the computer program stored in the memory, and implement the following steps:

acquiring a picture to be detected;

inputting the picture to be detected into a preset key point detection model to obtain a plurality of preset key points of the picture to be detected, wherein the preset key point detection model is trained by the key point detection model training method according to any one of claims 1-5.

The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the electronic device and other devices.

The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

In an embodiment of the present application, there is further provided a storage medium having instructions stored therein, which when run on a computer, cause the computer to perform any one of the above-described keypoint detection model training methods of the above-described embodiments.

In an embodiment of the present application, there is further provided a storage medium having stored therein instructions that, when executed on a computer, cause the computer to perform any one of the above-described key point detection methods of the embodiments.

In an embodiment of the present application, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform the keypoint detection model training method of any of the above embodiments.

In an embodiment of the present application, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform the keypoint detection method of any of the above embodiments.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product described above includes one or more computer instructions. When the above-described computer program instructions are loaded and executed on a computer, the processes or functions described above according to embodiments of the present invention are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, from one website, computer, server, or data center via a wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer readable storage media may be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

It should be noted that, in this document, the technical features in each alternative may be combined to form a solution, so long as they are not contradictory, and all such solutions are within the scope of the disclosure of the present application. Relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for embodiments of the apparatus, electronic device and storage medium, the description is relatively simple as it is substantially similar to the method embodiments, where relevant see the section description of the method embodiments.

The foregoing description is merely illustrative of the preferred embodiments of the present application, and is not intended to limit the scope of the present application. Any modifications, equivalent substitutions, improvements, etc. that are within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A method for training a keypoint detection model, the method comprising:

Calculating a loss value according to each point-line distance;

2. The method of claim 1, wherein the sample picture is a picture with facial features, a picture with human features, or a picture with gesture features.

3. The method according to claim 1 or 2, wherein prior to the step of calculating a loss value from each of the dotted distances, the method further comprises:

4. A method according to claim 3, wherein said calculating a loss value based on each of said point-to-line distances, each of said euclidean distances, a preset point-to-line distance weight, and a preset euclidean distance weight comprises:

5. The method of claim 4, wherein the calculating a loss value based on the point-to-average value, the euclidean distance average value, a preset point-to-point distance weight, and a preset euclidean distance weight comprises:

the loss value is calculated according to the following formula:

l＝α×l _sa +β×l _mse

wherein ,

wherein, alpha is the preset point line distance weight, beta is the preset Euclidean distance weight, and l _sa For the point-line distance average value, l _mse The Euclidean distance average value is obtained, and l is a loss value; n represents the number of predicted key points, i represents the ith predicted key point, P _i Represents the ith prediction key point, G _i Represents the i-th true key point, F (P _i ，G _i ) Representing P _i Wherein P is _i And G _i Has a semantic relationship with respect to the other,

x-coordinate value representing the ith predicted key,/->

Y-coordinate value representing the ith predicted key, for example >

X-coordinate value representing the ith real key,/->

A y-coordinate value representing the ith real key point,

and representing the Euclidean distance of the i-th predicted key point.

6. A method for key point detection, the method comprising:

acquiring a picture to be detected;

7. A keypoint detection model training device, the device comprising:

8. The apparatus of claim 7, wherein the sample picture is a picture with facial features, a picture with human features, or a picture with gesture features.

9. The apparatus according to claim 7 or 8, characterized in that the apparatus further comprises:

the second computing module is specifically configured to:

10. The apparatus of claim 9, wherein the second computing module is specifically configured to:

11. The apparatus of claim 10, wherein the second computing module is specifically configured to:

the loss value is calculated according to the following formula:

l＝α×l _sa +β×l _mse

wherein ,

x-coordinate value representing the ith predicted key,/->

Y-coordinate value representing the ith predicted key, for example>

X-coordinate value representing the ith real key,/->

A y-coordinate value representing the ith real key point,

and representing the Euclidean distance of the i-th predicted key point.

12. A keypoint detection device, said device comprising:

The acquisition module is used for acquiring a picture to be detected;

the prediction module is configured to input the picture to be detected into a preset key point detection model to obtain a plurality of preset key points of the picture to be detected, where the preset key point detection model is trained by using the key point detection model training method according to any one of claims 1-5.

13. An electronic device, comprising: the device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

a memory for storing a computer program;

a processor for implementing the keypoint detection model training method of any one of claims 1-5 when executing a program stored on a memory.

14. An electronic device, comprising: the device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

a memory for storing a computer program;

a processor for implementing the key point detection method of claim 6 when executing a program stored on a memory.

15. A storage medium having stored therein a computer program which when executed by a processor implements the keypoint detection model training method of any one of claims 1-5.

16. A storage medium having a computer program stored therein, which when executed by a processor, implements the keypoint detection method of claim 6.