CN113569627B

CN113569627B - Human body posture prediction model training method, human body posture prediction method and device

Info

Publication number: CN113569627B
Application number: CN202110658308.5A
Authority: CN
Inventors: 杜昂昂; 王志成
Original assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Priority date: 2021-06-11
Filing date: 2021-06-11
Publication date: 2024-06-14
Anticipated expiration: 2041-06-11
Also published as: CN113569627A

Abstract

The application provides a human body posture prediction model training method, a human body posture prediction method and a human body posture prediction device. The method comprises the following steps: acquiring an annotation training set and an unlabeled training set, wherein the annotation training set comprises a plurality of first human body images containing annotation data, and the annotation data is used for representing real gesture information in the first human body images; inputting the first human body image into a generator in the human body posture prediction model to obtain a corresponding first human body posture prediction result, and calculating a first loss value of the generator according to the labeling data and the first human body posture prediction result; inputting the second human body image into a generator to obtain a corresponding second human body posture prediction result; calculating a second loss value corresponding to the discriminator in the human body posture prediction model according to the first human body image, the labeling data, the second human body image and the second human body posture prediction result; and optimizing the generator and the discriminator according to the first loss value and the second loss value to obtain a human body posture prediction model.

Description

Human body posture prediction model training method, human body posture prediction method and device

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a human body posture prediction model training method, a human body posture prediction method and a human body posture prediction device.

Background

Human body posture estimation refers to predicting human body posture and key points in an image. In essence, human body pose estimation abstracts the location of various parts of the human body in an image into a set of structured coordinates. The human body posture estimation technology has important application in the fields of human-computer interaction, image retrieval, anomaly detection, motion prediction and the like.

The existing human body posture estimation method mostly depends on a large amount of labeling data. The size of the data volume of the labeling data has great influence on the final effect of human body posture estimation. Although the data volume of the human body posture estimation data set is greatly improved than that of the prior human body posture estimation data set, the construction of a large human body posture estimation data set is still very difficult. For example, MPII human body pose estimation datasets contain about 2 thousands of 5 thousands of images, about 4 thousands of human body poses, far less millions of magnitude datasets than image classification and image detection requirements. This is because labeling of the human body posture estimation dataset requires higher, finer, and more complex, and thus requires significant labor and time costs.

Deep learning has gained a lot of attention since birth and has also been applied in the field of human body posture estimation. The current practice of using a large amount of training data for supervised training has become the mainstream of human body posture estimation technology, in line with deep learning. However, relying too much on labeled training data has prevented the advancement of human posture estimation techniques to some extent. The internet technology is rapidly rising and brings a large amount of data, and the use of manpower to label such a large amount of data is not conceivable. Therefore, how to utilize massive unmarked data to perform human body posture estimation is also a problem to be solved in the prior art.

Disclosure of Invention

The embodiment of the application aims to provide a human body posture prediction model training method, a human body posture prediction method and a human body posture prediction device, which can combine a plurality of unlabeled second human body images to perform countermeasure training, can improve accuracy and reduce the demand for labeled first human body images.

In a first aspect, an embodiment of the present application provides a human body posture prediction model training method, including: acquiring an annotation training set and an unlabeled training set, wherein the annotation training set comprises a plurality of first human body images containing annotation data, and the annotation data is used for representing real gesture information in the first human body images; the unlabeled training set comprises a plurality of second human body images which do not contain labeling data; inputting the first human body image into a generator in a human body posture prediction model to obtain a corresponding first human body posture prediction result, and calculating a first loss value of the generator according to the labeling data and the first human body posture prediction result; inputting the second human body image into the generator to obtain a corresponding second human body posture prediction result; calculating a second loss value corresponding to a discriminator in the human body posture prediction model according to the first human body image, the labeling data, the second human body image and the second human body posture prediction result; and optimizing the generator and the discriminator according to the first loss value and the second loss value to obtain the human body posture prediction model.

Optionally, in the human body posture prediction model training method according to the embodiment of the present application, the inputting the first human body image into a generator in a human body posture prediction model to obtain a corresponding first human body posture prediction result, and calculating a first loss value of the generator according to the labeling data and the first human body posture prediction result includes: inputting the first human body image into the generator to obtain a predicted human body posture heat map corresponding to multiple channels; the predicted human body posture heat map of each channel predicts a human body key point position; generating a corresponding reference human body posture heat map based on the labeling data corresponding to the first human body image; and calculating the first loss value according to the predicted human body posture heat map and the reference human body posture heat map.

Optionally, in the method for training a human body posture prediction model according to the embodiment of the present application, the calculating a second loss value corresponding to a discriminator in the human body posture prediction model according to the first human body image, the labeling data, the second human body image, and the second human body posture prediction result includes: inputting the first human body image and the reference human body posture heat map as true data sequences into the discriminator, and inputting the second human body image and the second human body posture prediction result as false data sequences into the discriminator to respectively obtain discrimination results output by the discriminator; and calculating the second loss value according to the judging result.

Optionally, in the human body posture prediction model training method according to the embodiment of the present application, the optimizing the generator and the discriminator according to the first loss value and the second loss value includes: and updating the network parameters of the generator according to the first loss value and the second loss value, and updating the network parameters of the discriminator according to the second loss value.

Optionally, in the method for training a human body posture detection model according to the embodiment of the present application, the obtaining the labeled training set and the unlabeled training set includes: acquiring an original marked training set containing marked data and an original unmarked training set not containing marked data; respectively carrying out human body detection on the images in the original marked training set and the images in the original unmarked training set by utilizing a pre-trained human body detection model to obtain the first human body image in the marked training set and the second human body image in the unmarked training set; wherein the first human body image and the second human body image are single person images.

In a second aspect, an embodiment of the present application provides a human body posture prediction method, including: acquiring a third human body image; and inputting the third human body image into a generator in a human body posture prediction model obtained by the human body posture prediction model training method in the first aspect, so as to obtain a third human body posture prediction result.

In a third aspect, an embodiment of the present application provides a human body posture prediction model training device, including: the first acquisition module is used for acquiring an annotation training set and an unlabeled training set, wherein the annotation training set comprises a plurality of first human images containing annotation data, and the annotation data is used for representing real gesture information in the first human images; the unlabeled training set comprises a plurality of second human body images which do not contain labeling data; the first input module is used for inputting the first human body image into a generator in the human body posture prediction model to obtain a corresponding first human body posture prediction result, and calculating a first loss value of the generator according to the labeling data and the first human body posture prediction result; the second input module is used for inputting the second human body image into the generator to obtain a corresponding second human body posture prediction result; the calculation module is used for calculating a second loss value corresponding to the discriminator in the human body posture prediction model according to the first human body image, the labeling data, the second human body image and the second human body posture prediction result; and the optimizing module is used for optimizing the generator and the discriminator according to the first loss value and the second loss value to obtain the human body posture prediction model.

Optionally, in the human body posture detection model training device according to the embodiment of the present application, the first input module is specifically configured to: inputting the first human body image into the generator to obtain a predicted human body posture heat map corresponding to multiple channels; the predicted human body posture heat map of each channel predicts a human body key point position; generating a corresponding reference human body posture heat map based on the labeling data corresponding to the first human body image; and calculating the first loss value according to the predicted human body posture heat map and the reference human body posture heat map.

Optionally, in the human body posture detection model training device according to the embodiment of the present application, the calculation module is specifically configured to: inputting the first human body image and the reference human body posture heat map as true data sequences into the discriminator, and inputting the second human body image and the second human body posture prediction result as false data sequences into the discriminator to respectively obtain discrimination results output by the discriminator; and calculating the second loss value according to the judging result.

Optionally, in the human body posture detection model training device according to the embodiment of the present application, the optimization module is specifically configured to: and updating the network parameters of the generator according to the first loss value and the second loss value, and updating the network parameters of the discriminator according to the second loss value.

Optionally, in the human body posture detection model training device according to the embodiment of the present application, the first obtaining module is specifically configured to: the obtaining the marked training set and the unmarked training set comprises the following steps: acquiring an original marked training set containing marked data and an original unmarked training set not containing marked data; respectively carrying out human body detection on the images in the original marked training set and the images in the original unmarked training set by utilizing a pre-trained human body detection model to obtain the first human body image in the marked training set and the second human body image in the unmarked training set; wherein the first human body image and the second human body image are single person images.

In a fourth aspect, an embodiment of the present application provides a human body posture prediction apparatus, including: the second acquisition module is used for acquiring a third human body image; and the prediction module is used for inputting the third human body image into a generator in the human body posture prediction model obtained by the human body posture prediction model training method in the first aspect to obtain a third human body posture prediction result.

In a fifth aspect, an embodiment of the application provides an electronic device comprising a processor and a memory storing computer readable instructions which, when executed by the processor, perform a method as described in the first aspect above or a method as described in the second aspect above.

In a sixth aspect, embodiments of the present application provide a storage medium having stored thereon a computer program which, when executed by a processor, performs a method as described in the first aspect above or a method as described in the second aspect above.

As can be seen from the above, the human body posture prediction model training method, the human body posture prediction method and the device provided by the embodiment of the application are implemented by acquiring the labeled training set and the unlabeled training set, wherein the labeled training set comprises a plurality of first human body images containing labeled data, and the labeled data is used for representing real posture information in the first human body images; the unlabeled training set comprises a plurality of second human body images which do not contain labeling data; inputting the first human body image into a generator in a human body posture prediction model to obtain a corresponding first human body posture prediction result, and calculating a first loss value of the generator according to the labeling data and the first human body posture prediction result; inputting the second human body image into the generator to obtain a corresponding second human body posture prediction result; calculating a second loss value corresponding to a discriminator in the human body posture prediction model according to the first human body image, the labeling data, the second human body image and the second human body posture prediction result; and optimizing the generator and the discriminator according to the first loss value and the second loss value to obtain the human body posture prediction model. Therefore, as the unlabeled training set is applied in the training process, under the condition that the labeled first human body images are insufficient, the countertraining is carried out by combining a plurality of unlabeled second human body images, so that the accuracy can be improved, and the demand for the labeled first human body images can be reduced.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the embodiments of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a human body posture prediction model training method provided by an embodiment of the present application;

FIG. 2 is a flowchart of a human body posture prediction method according to an embodiment of the present application;

Fig. 3 is a schematic structural diagram of a training device for a human body posture prediction model according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a human body posture predicting device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.

In recent years, technology research such as computer vision, deep learning, machine learning, image processing, image recognition and the like based on artificial intelligence has been advanced significantly. Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is an emerging scientific technology for studying and developing theories, methods, techniques and application systems for simulating and extending human intelligence. The artificial intelligence discipline is a comprehensive discipline and relates to various technical categories such as chips, big data, cloud computing, internet of things, distributed storage, deep learning, machine learning, neural networks and the like. Computer vision is an important branch of artificial intelligence, and particularly, machine recognition is a world, and computer vision technologies generally include technologies such as face recognition, living body detection, fingerprint recognition and anti-counterfeit verification, biometric feature recognition, face detection, pedestrian detection, object detection, pedestrian recognition, image processing, image recognition, image semantic understanding, image retrieval, word recognition, video processing, video content recognition, behavior recognition, three-dimensional reconstruction, virtual reality, augmented reality, synchronous positioning and map building (SLAM), computational photography, robot navigation and positioning, and the like. With research and progress of artificial intelligence technology, the technology expands application in various fields, such as security protection, city management, traffic management, building management, park management, face passing, face attendance, logistics management, warehouse management, robots, intelligent marketing, computed photography, mobile phone images, cloud services, intelligent home, wearing equipment, unmanned driving, automatic driving, intelligent medical treatment, face payment, face unlocking, fingerprint unlocking, personnel verification, intelligent screen, intelligent television, camera, mobile internet, network living broadcast, beauty, make-up, medical beauty, intelligent temperature measurement and the like.

Referring to fig. 1, fig. 1 is a flowchart of a human body posture prediction model training method according to an embodiment of the present application. The human body posture prediction model training method can comprise the following steps:

s101, acquiring a marked training set and an unmarked training set.

S102, inputting the first human body image into a generator in the human body posture prediction model to obtain a corresponding first human body posture prediction result, and calculating a first loss value of the generator according to the labeling data and the first human body posture prediction result.

S103, inputting the second human body image into a generator to obtain a corresponding second human body posture prediction result.

S104, calculating a second loss value corresponding to the discriminator in the human body posture prediction model according to the first human body image, the labeling data, the second human body image and the second human body posture prediction result.

And S105, optimizing the generator and the discriminator according to the first loss value and the second loss value to obtain a human body posture prediction model.

Specifically, in the step S101, the annotation training set includes a plurality of first human images including annotation data. As one embodiment, a plurality of first human images containing annotation data may be obtained from a public database; as another embodiment, a human body image containing no annotation data may be acquired from the internet, and then a plurality of first human body images containing annotation data may be obtained from the human body image containing no annotation data.

The annotation data is used for representing real gesture information in the first human body image, and the real gesture information can comprise human body key node information (human body key points comprise positions such as head top, neck, left shoulder, left elbow, left wrist, right shoulder, right elbow, right wrist and the like), human body gesture information and the like.

It will be appreciated that there are a number of ways in which the first human image may be derived from a human image that does not contain annotation data, for example: processing the human body image by using the existing human body posture prediction model; or manually annotating the human body image, etc., which is not particularly limited in the embodiment of the present application.

Similarly, in the step S101, the unlabeled training set includes a plurality of second human body images that do not include labeling data, that is, the second human body images are original images including only human body poses that have not undergone labeling processing. It will be appreciated that the second body image may be obtained through a variety of means such as unmarked portions (e.g., COCO unlabeled) in the public dataset.

Further, the first human body image and the second human body image may each be a human body image containing only one person. Therefore, for a scene image with multiple people, preprocessing can be performed through a pre-trained human body detection model, and then the image of each person is scratched out to obtain a corresponding human body image, and the specific steps are as follows:

first, an original marked training set containing marked data and an original unmarked training set not containing marked data are obtained.

And secondly, respectively carrying out human body detection on the images in the original marked training set and the images in the original unmarked training set by utilizing a pre-trained human body detection model to obtain a first human body image in the marked training set and a second human body image in the unmarked training set.

In the step S102-step S105, the human body posture prediction model provided by the embodiment of the present application may be a generation countermeasure network (GAN, generative Adversarial Networks), and the human body posture prediction model may include two parts, namely a generator and a discriminator.

In step S102, for the first human body image input to the generator, the generator outputs a corresponding first human body posture prediction result. As one embodiment, the first human posture prediction result output by the generator may include human posture heat maps of a plurality of channels. The human body posture heat map of each channel can be used for representing one human body key point position predicted by the generator, and therefore, the human body posture heat maps of a plurality of channels are respectively used for representing different human body key point positions predicted by the generator. It will be appreciated that on a human body pose heat map, the probability that the position is a predicted human body keypoint position may be represented in different colors, for example: the closer the color on the human body pose heat map is to red, the more likely that location is the predicted corresponding human body keypoint location.

After the generator outputs the first human posture prediction result, a first loss value of the generator may be calculated according to the first human posture prediction result and labeling data included in the first human image. The step S102 may specifically include the following steps:

The first step is to input a first human body image into a generator to obtain a predicted human body posture heat map corresponding to multiple channels.

And a second step of generating a corresponding reference human body posture heat map based on the labeling data corresponding to the first human body image.

And thirdly, calculating a first loss value according to the predicted human body posture heat map and the reference human body posture heat map.

In the above steps, first, a first human body image may be input into a generator to obtain a predicted human body posture heat map; then, the labeling data corresponding to the first human body image can be directly mapped into a real human body posture heat map through an algorithm; finally, based on the predicted human body posture heat map and the real human body posture heat map, a first loss value of the generator may be calculated, wherein the first loss value characterizes a difference between the predicted human body posture heat map and the real human body posture heat map.

In step S103, for the second human body image input to the generator, the generator outputs a corresponding second human body posture prediction result. Similar to the first human posture prediction result, as an embodiment, the second human posture prediction result output by the generator may also include a human posture heat map of a plurality of channels. Because the second human body image does not contain labeling data, the confidence of the human body posture heat map corresponding to the second human body posture prediction result output by the generator is low.

In this step S104, the step S104 may specifically include the following steps:

The method comprises the steps that a first human body image and a reference human body posture heat map are input into a discriminator as real data sequences, a second human body image and a second human body posture prediction result are input into the discriminator as false data sequences, and discrimination results output by the discriminator are respectively obtained;

and a second step of calculating a second loss value according to the discrimination result.

In the above step, the discriminator is a key module for assigning a field to the second human body image not including the labeling data. Similar to existing challenge training network models, the input of the arbiter provided by embodiments of the present application includes two parts, "true" and "false". The first human body image containing the annotation data and the reference human body posture heat map generated according to the corresponding annotation data form the input of a true end; the second human body image without the labeling data and the corresponding second human body posture prediction result output by the generator form a false end input. The discriminator can make a true and false discrimination on the input data and output a probability of true and false between 0 and 1, so as to calculate a second loss value corresponding to the discriminator.

In this step S105, the corresponding total loss value may be calculated using the first loss value and the second loss value obtained in the above-described embodiment. Then, the total loss value can be adopted to optimize the generator and the discriminator in the human body posture prediction model provided by the embodiment of the application, so that a trained human body posture prediction model is obtained.

As an embodiment, the second penalty value would be optimized for the arbiter and the generator, respectively, while the first penalty value is optimized for the generator only. The step S105 may specifically include the following steps:

and updating the network parameters of the generator according to the first loss value and the second loss value, and updating the network parameters of the discriminator according to the second loss value.

In the above steps, when optimizing the arbiter, the second loss value will force the arbiter to make correct true and false discrimination, and when the input is true data, the second loss value will make the output of the arbiter approach 1 as much as possible; when the input is false, the second loss value will make the output of the discriminator approach 0 as much as possible.

When optimizing the generator, the first loss value and the second loss value can make the human body posture heat map predicted by the generator conform to the real human body posture heat map distribution as much as possible (as true as possible).

The first loss value and the second loss value are used to make the two arbiter and generator modules counter each other and progress together by alternately updating the network parameters of the arbiter and generator.

Further, in some embodiments, the step of obtaining the annotation training set in step S101 may include the following sub-steps:

the method comprises the steps of firstly, obtaining a plurality of first original images with annotation data; each first original image includes pose data of a human body.

And secondly, scaling the plurality of first original images to obtain first human body images with the same size specification.

In the above steps, by setting the plurality of first human body images to the same size specification, the accuracy in the subsequent training of the generator can be improved, and the loss can be reduced. Of course, correspondingly, for the unlabeled training set, the same processing manner as for the labeled training set may be adopted, and will not be described herein.

As can be seen from the above, in the human body posture detection model training method provided by the embodiment of the present application, by acquiring the labeled training set and the unlabeled training set, the labeled training set includes a plurality of first human body images including labeled data, where the labeled data is used to represent real posture information in the first human body images; the unlabeled training set comprises a plurality of second human body images which do not contain labeling data; inputting the first human body image into a generator in the human body posture prediction model to obtain a corresponding first human body posture prediction result, and calculating a first loss value of the generator according to the labeling data and the first human body posture prediction result; inputting the second human body image into a generator to obtain a corresponding second human body posture prediction result; calculating a second loss value corresponding to the discriminator in the human body posture prediction model according to the first human body image, the labeling data, the second human body image and the second human body posture prediction result; and optimizing the generator and the discriminator according to the first loss value and the second loss value to obtain a human body posture prediction model. Therefore, as the unlabeled training set is applied in the training process, under the condition that the labeled first human body images are insufficient, the countertraining is carried out by combining a plurality of unlabeled second human body images, so that the accuracy can be improved, and the demand for the labeled first human body images can be reduced.

Referring to fig. 2, fig. 2 is a flowchart of a human body posture prediction method according to an embodiment of the present application. The human body posture prediction method may include the steps of:

s201, acquiring a third human body image.

S202, inputting the third human body image into a generator in a human body posture prediction model obtained by adopting a human body posture prediction model training method to obtain a third human body posture prediction result.

Specifically, in the step S201, a third human body image to be predicted may be obtained, where the third human body image may be an original image that does not include labeling data, and the human body posture in the third human body image may be predicted by using the human body posture prediction method provided by the embodiment of the present application.

In the step S202, the third human body image obtained in the step S201 may be input into a generator of a pre-trained human body posture prediction model, where the human body posture prediction model may be obtained by training using the human body posture prediction model training method in the above embodiment. The generator can output a third human body posture prediction result corresponding to the third human body image according to the third human body image, so that the prediction of the human body posture in the third human body image is realized.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a human body posture prediction model training device provided by the present application, and the human body posture prediction model training device 300 includes: a first acquisition module 301, a first input module 302, a second input module 303, a calculation module 304, and an optimization module 305.

The first obtaining module 301 is configured to obtain a labeled training set and an unlabeled training set.

The annotation training set comprises a plurality of first human images comprising annotation data. As one embodiment, a plurality of first human images containing annotation data may be obtained from a public database; as another embodiment, a human body image containing no annotation data may be acquired from the internet, and then a plurality of first human body images containing annotation data may be obtained from the human body image containing no annotation data.

Similarly, the unlabeled training set includes a plurality of second human body images that do not include labeling data, that is, the second human body images are original images including only human body poses that have not been subjected to labeling processing. It will be appreciated that the second body image may be obtained through a variety of means such as unmarked portions (e.g., COCO unlabeled) in the public dataset.

The first input module 302 is configured to input a first human body image into a generator in a human body posture prediction model, obtain a corresponding first human body posture prediction result, and calculate a first loss value of the generator according to the labeling data and the first human body posture prediction result.

For the first human body image input to the generator, the generator outputs a corresponding first human body posture prediction result. As one embodiment, the first human posture prediction result output by the generator may include human posture heat maps of a plurality of channels. The human body posture heat map of each channel can be used for representing one human body key point position predicted by the generator, so that the human body posture heat maps of a plurality of channels are respectively used for representing different human body key point positions predicted by the generator. It will be appreciated that the closer the color on the human body pose heat map is to red, the more likely that location is the predicted corresponding human body keypoint location.

After the generator outputs the first human posture prediction result, a first loss value of the generator may be calculated according to the first human posture prediction result and labeling data included in the first human image.

In the human body posture detection model training device 300 according to the embodiment of the present application, the first input module 302 is specifically configured to: inputting the first human body image into a generator to obtain a predicted human body posture heat map corresponding to the multiple channels; the predicted human body posture heat map of each channel predicts a human body key point position; generating a corresponding reference human body posture heat map based on the labeling data corresponding to the first human body image; and calculating a first loss value according to the predicted human body posture heat map and the reference human body posture heat map.

Firstly, a first human body image can be input into a generator to obtain a predicted human body posture heat map; then, the labeling data corresponding to the first human body image can be directly mapped into a real human body posture heat map through an algorithm; finally, based on the predicted human body pose heat map and the real human body pose heat map, a first loss value of the generator may be calculated.

The second input module 303 is configured to input a second human body image into the generator, and obtain a corresponding second human body posture prediction result.

For the second human body image input to the generator, the generator outputs a corresponding second human body posture prediction result. Similar to the first human posture prediction result, as an embodiment, the second human posture prediction result output by the generator may also include a human posture heat map of a plurality of channels. Because the second human body image does not contain labeling data, the confidence of the human body posture heat map corresponding to the second human body posture prediction result output by the generator is low.

The calculation module 304 is configured to calculate a second loss value corresponding to the discriminator in the human body posture prediction model according to the first human body image, the labeling data, the second human body image, and the second human body posture prediction result.

In the human body posture detection model training device 300 according to the embodiment of the present application, the calculation module 304 is specifically configured to: inputting the first human body image and the reference human body posture heat map as true data sequences into a discriminator, and inputting the second human body image and the second human body posture prediction result as false data sequences into the discriminator to respectively obtain discrimination results output by the discriminator; and calculating a second loss value according to the judging result.

In the above step, the discriminator is a key module for assigning a field to the second human body image not including the labeling data. Similar to existing challenge training network models, the input to the arbiter contains two parts, "true" and "false". The first human body image containing the annotation data and the reference human body posture heat map generated by the corresponding annotation data form the input of a true end; the second human body image without the labeling data and the corresponding second human body posture prediction result generated and output form a false end input. The discriminator makes a true or false discrimination for the input data and outputs a probability of true or false between 0 and 1, thereby calculating the counterdamage function.

The optimizing module 305 is configured to optimize the generator and the arbiter according to the first loss value and the second loss value, so as to obtain a human body posture prediction model.

The first loss value and the second loss value obtained in the above embodiment may be used to calculate a corresponding total loss function. Then, the total loss function can be adopted to optimize the generator and the discriminator in the human body posture prediction model provided by the embodiment of the application, so that a trained human body posture prediction model is obtained.

As an embodiment, the second penalty value would be optimized for the arbiter and the generator, respectively, while the first penalty value is optimized for the generator only. In the human body posture detection model training device 300 according to the embodiment of the present application, the optimization module 305 is specifically configured to: and updating the network parameters of the generator according to the first loss value and the second loss value, and updating the network parameters of the discriminator according to the second loss value.

When the arbiter is optimized, the second loss value forces the arbiter to make correct true and false discrimination, and when the input is true data, the second loss value enables the output of the arbiter to be close to 1 as much as possible; when the input is false, the second loss value will make the output of the discriminator approach 0 as much as possible.

Further, in some embodiments, the first obtaining module 301 is specifically configured to: acquiring a plurality of first original images with annotation data; each first original image comprises gesture data of a human body; and scaling the plurality of first original images to obtain first human body images with the same size specification.

By setting the plurality of first human images to the same size specification, the accuracy of subsequent training in the generator can be improved, and the loss can be reduced. Of course, correspondingly, for unlabeled training sets.

As can be seen from the above, the human body posture detection model training device 300 provided in the embodiment of the present application obtains the labeled training set and the unlabeled training set, where the labeled training set includes a plurality of first human body images including labeled data, and the labeled data is used to represent real posture information in the first human body images; the unlabeled training set comprises a plurality of second human body images which do not contain labeling data; inputting the first human body image into a generator in the human body posture prediction model to obtain a corresponding first human body posture prediction result, and calculating a first loss value of the generator according to the labeling data and the first human body posture prediction result; inputting the second human body image into a generator to obtain a corresponding second human body posture prediction result; calculating a second loss value corresponding to the discriminator in the human body posture prediction model according to the first human body image, the labeling data, the second human body image and the second human body posture prediction result; and optimizing the generator and the discriminator according to the first loss value and the second loss value to obtain a human body posture prediction model. Therefore, as the unlabeled training set is applied in the training process, under the condition that the labeled first human body images are insufficient, the countertraining is carried out by combining a plurality of unlabeled second human body images, so that the accuracy can be improved, and the demand for the labeled first human body images can be reduced.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a human body posture predicting device provided in an embodiment of the present application, and the human body posture predicting device 400 includes: a second acquisition module 401 and a prediction module 402.

Wherein, the second acquisition module 401 is configured to acquire a third human body image.

The third human body image to be predicted can be obtained, and the third human body image can be an original image which does not contain labeling data.

The prediction module 402 is configured to input a third human body image into a generator in a human body posture prediction model obtained by using a human body posture prediction model training method, so as to obtain a third human body posture prediction result.

The third human body image obtained in step S201 may be input into a generator of a pre-trained human body posture prediction model, where the human body posture prediction model may be obtained by training using the human body posture prediction model training method in the foregoing embodiment. The generator can output a third human body posture prediction result corresponding to the third human body image according to the third human body image, so that the prediction of the human body posture in the third human body image is realized.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application, and an electronic device 500 includes: processor 501 and memory 502, the processor 501 and memory 502 being interconnected and in communication with each other by a communication bus 503 and/or other form of connection mechanism (not shown), the memory 502 storing a computer program executable by the processor 501, which when run by a computing device, the processor 501 executes to perform the method in any of the alternative implementations of the embodiments described above.

The present application provides a storage medium that, when executed by a processor, performs the method of any of the alternative implementations of the above embodiments.

The storage medium may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as static random access Memory (Static Random Access Memory, SRAM), electrically erasable Programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM), erasable Programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), programmable Read-Only Memory (PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.

Further, the units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Furthermore, functional modules in various embodiments of the present application may be integrated together to form a single portion, or each module may exist alone, or two or more modules may be integrated to form a single portion.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A human body posture prediction model training method, comprising:

Acquiring an annotation training set and an unlabeled training set, wherein the annotation training set comprises a plurality of first human body images containing annotation data, and the annotation data is used for representing real gesture information in the first human body images; the unlabeled training set comprises a plurality of second human body images which do not contain labeling data;

Inputting the first human body image into a generator in a human body posture prediction model to obtain a corresponding first human body posture prediction result, and calculating a first loss value of the generator according to the labeling data and the first human body posture prediction result;

Inputting the second human body image into the generator to obtain a corresponding second human body posture prediction result;

calculating a second loss value corresponding to a discriminator in the human body posture prediction model according to the first human body image, the labeling data, the second human body image and the second human body posture prediction result;

Optimizing the generator and the discriminator according to the first loss value and the second loss value to obtain the human body posture prediction model;

the step of inputting the first human body image into a generator in a human body posture prediction model to obtain a corresponding first human body posture prediction result, and calculating a first loss value of the generator according to the labeling data and the first human body posture prediction result, including:

Inputting the first human body image into the generator to obtain a predicted human body posture heat map corresponding to multiple channels; the predicted human body posture heat map of each channel predicts a human body key point position;

Generating a corresponding reference human body posture heat map based on the labeling data corresponding to the first human body image;

Calculating the first loss value according to the predicted human body posture heat map and the reference human body posture heat map;

The calculating a second loss value corresponding to a discriminator in the human body posture prediction model according to the first human body image, the labeling data, the second human body image and the second human body posture prediction result includes:

Inputting the first human body image and the reference human body posture heat map as true data sequences into the discriminator, and inputting the second human body image and the second human body posture prediction result as false data sequences into the discriminator to respectively obtain discrimination results output by the discriminator;

And calculating the second loss value according to the judging result.

2. The human posture prediction model training method of claim 1, wherein the optimizing the generator and the arbiter according to the first loss value and the second loss value comprises:

3. The method for training a human body posture prediction model according to claim 1 or 2, wherein the obtaining the labeled training set and the unlabeled training set includes:

acquiring an original marked training set containing marked data and an original unmarked training set not containing marked data;

Respectively carrying out human body detection on the images in the original marked training set and the images in the original unmarked training set by utilizing a pre-trained human body detection model to obtain the first human body image in the marked training set and the second human body image in the unmarked training set; wherein the first human body image and the second human body image are single person images.

4. A human body posture prediction method, comprising:

Acquiring a third human body image;

Inputting the third human body image into a generator in a human body posture prediction model obtained by the human body posture prediction model training method according to any one of claims 1-3, so as to obtain a third human body posture prediction result.

5. A human body posture prediction model training device, characterized by comprising:

the first acquisition module is used for acquiring an annotation training set and an unlabeled training set, wherein the annotation training set comprises a plurality of first human images containing annotation data, and the annotation data is used for representing real gesture information in the first human images; the unlabeled training set comprises a plurality of second human body images which do not contain labeling data;

The first input module is used for inputting the first human body image into a generator in the human body posture prediction model to obtain a corresponding first human body posture prediction result, and calculating a first loss value of the generator according to the labeling data and the first human body posture prediction result;

The second input module is used for inputting the second human body image into the generator to obtain a corresponding second human body posture prediction result;

The calculation module is used for calculating a second loss value corresponding to the discriminator in the human body posture prediction model according to the first human body image, the labeling data, the second human body image and the second human body posture prediction result;

The optimizing module is used for optimizing the generator and the discriminator according to the first loss value and the second loss value to obtain the human body posture prediction model;

the first input module is specifically configured to:

The computing module is specifically configured to:

And calculating the second loss value according to the judging result.

6. A human body posture predicting apparatus, comprising:

the second acquisition module is used for acquiring a third human body image;

a prediction module, configured to input the third human body image into a generator in a human body posture prediction model obtained by using the human body posture prediction model training method according to any one of claims 1-3, so as to obtain a third human body posture prediction result.

7. An electronic device comprising a processor and a memory storing computer readable instructions that, when executed by the processor, perform the method of any of claims 1-3 or the method of claim 4.

8. A storage medium having stored thereon a computer program which, when executed by a processor, performs the method of any of claims 1-3 or the method of claim 4.