CN115100691A

CN115100691A - Method, device and equipment for acquiring key point detection model and detecting key points

Info

Publication number: CN115100691A
Application number: CN202211021088.6A
Authority: CN
Inventors: 付灿苗; 孙冲; 李琛; 吕静
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-08-24
Filing date: 2022-08-24
Publication date: 2022-09-23
Anticipated expiration: 2042-08-24
Also published as: CN115100691B

Abstract

The application discloses a method, a device and equipment for acquiring a key point detection model and detecting key points, and belongs to the technical field of computers. The method comprises the following steps: acquiring a training data set and an initial key point detection model, wherein the training data set comprises a sample image and standard position information corresponding to the sample key point, and the sample key point is a key point of a target part included in the sample image; calling an initial key point detection model to process the sample image to obtain a sample heat point diagram corresponding to the sample key point; determining sample position information corresponding to the sample key point according to the sample hot point diagram; determining a reference loss value according to the standard position information, the sample position information and the sample hotspot graph, wherein the reference loss value is used for indicating the detection precision of the initial key point detection model; and updating the initial key point detection model based on the reference loss value being larger than the loss threshold value to obtain the target key point detection model. The method improves the accuracy and precision of key point detection.

Description

Method, device and equipment for acquiring key point detection model and detecting key points

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method, a device and equipment for acquiring a key point detection model and detecting key points.

Background

With the continuous development of computer technology, more and more application scenes start to support human-computer interaction, for example, gesture interaction is a common human-computer interaction mode. The gesture interaction requires detection of hand key points, which refer to each joint point of the hand.

In the related technology, a hand image and first position information of a hand key point included in the hand image are obtained, and the first position information is obtained by manually marking the hand image based on experience; and calling the initial hand key point detection model to obtain a hot spot map corresponding to the hand key points included in the hand image, and taking the coordinates corresponding to the maximum numerical value in the hot spot map as second position information corresponding to the hand key points. And updating the initial hand key point detection model according to the first position information and the second position information to obtain a target hand key point detection model, wherein the target hand key point detection model is used for acquiring the position information of the hand key points included in the hand image.

However, in the above method, the coordinate corresponding to the maximum numerical value in the hot spot diagram is directly used as the second position information of the hand key point, so that the determined second position information is not accurate enough; and updating the initial hand key point detection model by adopting the second position information with lower accuracy, so that the obtained target hand key point detection model has lower detection precision and poorer stability, and the accuracy is lower when the target hand key point detection model is adopted to obtain the position information of the hand key points included in the hand image.

Disclosure of Invention

The embodiment of the application provides a method, a device and equipment for acquiring a key point detection model and detecting key points, and can be used for solving the problem of low accuracy of key point detection in the related technology.

In a first aspect, an embodiment of the present application provides a method for acquiring a keypoint detection model, where the method includes:

acquiring a training data set and an initial key point detection model, wherein the training data set comprises a sample image and standard position information corresponding to the sample key point, and the sample key point is a key point of a target part in the sample image;

calling the initial key point detection model to process the sample image to obtain a sample heat point diagram corresponding to the sample key point;

determining sample position information corresponding to the sample key point according to the sample hot point diagram;

determining a reference loss value according to the standard position information, the sample position information and the sample hotspot graph, wherein the reference loss value is used for indicating the detection precision of the initial key point detection model;

and updating the initial key point detection model based on the reference loss value being greater than the loss threshold value to obtain a target key point detection model, wherein the target key point detection model is used for detecting a target image so as to determine target position information corresponding to key points of a target part included in the target image.

In a second aspect, an embodiment of the present application provides a method for detecting a keypoint, where the method includes:

acquiring a target image and a target key point detection model, wherein the target image comprises a target part, and the target key point detection model is acquired by any one of the key point detection model acquisition methods in the first aspect;

calling the target key point detection model to process the target image to obtain a target heat point diagram corresponding to the target key point of the target part;

and determining target position information corresponding to the target key point according to the target heat point diagram.

In a third aspect, an embodiment of the present application provides an apparatus for acquiring a keypoint detection model, where the apparatus includes:

the system comprises an acquisition module, a detection module and a processing module, wherein the acquisition module is used for acquiring a training data set and an initial key point detection model, the training data set comprises a sample image and standard position information corresponding to a sample key point, and the sample key point is a key point of a target part included in the sample image;

the processing module is used for calling the initial key point detection model to process the sample image to obtain a sample heat point diagram corresponding to the sample key point;

the determining module is used for determining sample position information corresponding to the sample key point according to the sample hotspot graph;

the determining module is further configured to determine a reference loss value according to the standard position information, the sample position information, and the sample hotspot graph, where the reference loss value is used to indicate the detection accuracy of the initial keypoint detection model;

and the updating module is used for updating the initial key point detection model based on the reference loss value being greater than the loss threshold value to obtain a target key point detection model, and the target key point detection model is used for detecting a target image so as to determine target position information corresponding to the key point of the target part included in the target image.

In a possible implementation manner, the determining module is configured to obtain a first hotspot graph and a second hotspot graph, where the first hotspot graph is a hotspot graph corresponding to a first dimension, and the second hotspot graph is a hotspot graph corresponding to a second dimension;

and determining sample position information corresponding to the sample key points according to the sample hotspot graph, the first hotspot graph and the second hotspot graph.

In a possible implementation manner, the determining module is configured to determine a first numerical value according to the sample hotspot graph and the first hotspot graph, where the first numerical value is a numerical value of the sample keypoint in the first dimension;

determining a second numerical value according to the sample hotspot graph and the second hotspot graph, wherein the second numerical value is the numerical value of the sample key point in the second dimension;

and determining sample position information corresponding to the sample key points according to the first numerical value and the second numerical value.

In one possible implementation manner, the sample hotspot graph, the first hotspot graph and the second hotspot graph respectively comprise a plurality of numerical values, and the number of the numerical values included in the sample hotspot graph, the number of the numerical values included in the first hotspot graph and the number of the numerical values included in the second hotspot graph are the same;

the determining module is configured to multiply the sample hotspot graph and the numerical values located at the same position in the first hotspot graph to obtain a third numerical value corresponding to each position; determining the first numerical value according to the third numerical value corresponding to each position; multiplying the numerical values at the same position in the sample hotspot graph and the second hotspot graph to obtain a fourth numerical value corresponding to each position; and determining the second numerical value according to the fourth numerical value corresponding to each position.

In a possible implementation manner, the determining module is configured to determine a first loss value between the standard location information and the sample location information;

determining a second loss value according to the standard position information and the sample heat point diagram;

and determining the reference loss value according to the first loss value and the second loss value.

In a possible implementation manner, the determining module is configured to determine a third hotspot graph corresponding to the standard location information;

and determining the second loss value according to the sample heat point diagram and the third heat point diagram.

In a possible implementation manner, the obtaining module is configured to obtain the sample image;

identifying the sample image to obtain candidate position information corresponding to the sample key point;

adjusting the candidate position information to obtain standard position information corresponding to the sample key points;

and acquiring the training data set according to the sample image and the standard position information corresponding to the sample key points.

In a possible implementation manner, the obtaining module is configured to obtain a shape parameter and an attitude parameter corresponding to the target portion;

generating a target part model according to the shape parameters and the posture parameters, wherein the target part model comprises standard position information corresponding to the sample key points;

pasting a texture map on the target part model to obtain the target part model pasted with the texture map;

projecting the target part model attached with the texture map into a background map to obtain the sample image;

In a possible implementation manner, the updating module is configured to update the initial keypoint detection model based on that the reference loss value is greater than the loss threshold value, so as to obtain an intermediate keypoint detection model;

calling the intermediate key point detection model to process the sample image to obtain an intermediate hot point diagram corresponding to the sample key point;

determining intermediate position information corresponding to the sample key points according to the intermediate hot point diagram;

determining a candidate loss value according to the standard position information, the middle position information and the middle hot spot diagram;

and taking the intermediate key point detection model as the target key point detection model based on the candidate loss value not greater than the loss threshold value.

In a fourth aspect, an embodiment of the present application provides a keypoint detection apparatus, including:

an obtaining module, configured to obtain a target image and a target keypoint detection model, where the target image includes a target portion, and the target keypoint detection model is obtained by any one of the key point detection model obtaining devices in the third aspect;

the processing module is used for calling the target key point detection model to process the target image to obtain a target heat point diagram corresponding to the target key point of the target part;

and the determining module is used for determining target position information corresponding to the target key point according to the target heat point diagram.

In a possible implementation manner, the determining module is configured to determine, according to the target hotspot graph, first location information corresponding to the target keypoint;

taking the first position information as target position information corresponding to the target key point; or determining target position information corresponding to the target key point according to the first position information and reference position information, wherein the reference position information is position information of the target key point of a target part included in a reference image, and the acquisition time of the reference image is adjacent to the acquisition time of the target image and is before the acquisition time of the target image.

In a possible implementation manner, the obtaining module is further configured to obtain the reference image, where the reference image includes the target portion;

the processing module is further configured to call the target key point detection model to process the reference image, so as to obtain a reference hotspot graph corresponding to a target key point of the target part;

the determining module is further configured to determine, according to the reference hotspot graph, reference location information corresponding to the target keypoint.

In a possible implementation manner, the determining module is configured to obtain an optical flow compensation value between the reference image and the target image, where the optical flow compensation value is used to indicate a speed of the reference image to the target image;

determining the distance between the target key point in the reference image and the target image according to the reference position information and the first position information;

determining a distance weight parameter according to the distance, wherein the distance weight parameter is in direct proportion to the distance;

and determining target position information corresponding to the target key points according to the reference position information, the first position information, the optical flow compensation value, the distance and the distance weight parameter.

In a fifth aspect, an embodiment of the present application provides a computer device, where the computer device includes a processor and a memory, where the memory stores at least one program code, and the at least one program code is loaded into and executed by the processor, so as to enable the computer device to implement the method for acquiring a keypoint detection model according to the first aspect or any possible implementation manner of the first aspect, or to enable the computer device to implement the method for keypoint detection according to any possible implementation manner of the second aspect or the second aspect.

In a sixth aspect, a computer-readable storage medium is further provided, where at least one program code is stored, and the at least one program code is loaded and executed by a processor, so as to enable a computer to implement the method for acquiring a keypoint detection model according to the first aspect or any possible implementation manner of the first aspect, or to implement the method for keypoint detection according to any possible implementation manner of the second aspect or the second aspect.

In a seventh aspect, a computer program or a computer program product is further provided, where at least one computer instruction is stored, and the at least one computer instruction is loaded by a processor and executed to enable a computer to implement the method for acquiring a keypoint detection model according to the first aspect or any possible implementation manner of the first aspect, or to enable a computer to implement the method for keypoint detection according to the second aspect or any possible implementation manner of the second aspect.

The technical scheme provided by the embodiment of the application at least has the following beneficial effects.

According to the technical scheme, the sample position information corresponding to the sample key point of the target part included in the sample image is obtained through the initial key point detection model, and the loss value is determined by considering not only the standard position information corresponding to the sample key point and the sample position information corresponding to the sample key point, but also the sample heat point diagram corresponding to the sample key point, so that the accuracy of the determined loss value is high. And updating the initial key point detection model by using the loss value with higher accuracy, so that the obtained target key point detection model has higher detection accuracy, higher precision, better stability and higher accuracy and precision of key point detection.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application;

fig. 2 is a flowchart of an obtaining method of a keypoint detection model according to an embodiment of the present application;

FIG. 3 is a schematic diagram of key points of a hand according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a sample image acquiring process according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of an initial keypoint detection model provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of a sample hotspot graph, a first hotspot graph and a second hotspot graph provided by embodiments of the present disclosure;

FIG. 7 is a schematic diagram of a third hotspot graph provided by embodiments of the present application;

fig. 8 is a flowchart of a method for detecting a key point according to an embodiment of the present disclosure;

fig. 9 is a schematic diagram illustrating determination of target position information corresponding to a target key point according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a reference image and a target image provided by an embodiment of the present application;

FIG. 11 is a schematic diagram illustrating detection of key points of a reference image and a target image according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of an apparatus for acquiring a keypoint detection model according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of a key point detecting device according to an embodiment of the present disclosure;

fig. 14 is a schematic structural diagram of a terminal device according to an embodiment of the present application;

fig. 15 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in this application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be implemented in sequences other than those illustrated or described herein. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with aspects of the present application.

In an exemplary embodiment, the method for acquiring a key point detection model and the key point detection method provided by the embodiments of the present application may be applied to various scenarios including, but not limited to, cloud technology, artificial intelligence, smart transportation, driving assistance, games, and the like.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science, which attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and the like.

The scheme provided by the embodiment of the application relates to a Machine Learning technology in an artificial intelligence technology, Machine Learning (ML) is a multi-field cross subject, and relates to multi subjects such as probability theory, statistics, approximation theory, convex analysis and algorithm complexity theory. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and researched in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical services, smart customer service, internet of vehicles, automatic driving, smart traffic and the like.

Fig. 1 is a schematic diagram of an implementation environment provided in an embodiment of the present application, and as shown in fig. 1, the implementation environment includes: a terminal device 101 and a server 102.

The method for acquiring the keypoint detection model provided by the embodiment of the present application may be executed by the terminal device 101, or may be executed by the server 102, or may be executed by both the terminal device 101 and the server 102, which is not limited in the embodiment of the present application. In the case that the method for acquiring the key point detection model provided by the embodiment of the present application is executed by the terminal device 101 and the server 102 together, the server 102 undertakes primary computation, and the terminal device 101 undertakes secondary computation; or, the server 102 bears the secondary computing work, and the terminal device 101 bears the primary computing work; or, the server 102 and the terminal device 101 perform cooperative computing by using a distributed computing architecture.

The key point detection method provided by the embodiment of the present application may be executed by the terminal device 101, or may be executed by the server 102, or may be executed by both the terminal device 101 and the server 102, which is not limited in the embodiment of the present application. In the case that the key point detection method provided by the embodiment of the present application is executed by the terminal device 101 and the server 102 together, the server 102 undertakes the primary calculation work, and the terminal device 101 undertakes the secondary calculation work; or, the server 102 bears the secondary computing work, and the terminal device 101 bears the primary computing work; or, the server 102 and the terminal device 101 perform cooperative computing by using a distributed computing architecture.

It should be noted that the execution device of the method for acquiring the keypoint detection model may be the same as or different from the execution device of the method for detecting the keypoint, and this is not limited in this embodiment of the application. Illustratively, the executing device of the method for acquiring the key point detection model is a terminal device 101, and the executing device of the key point detection model is a server 102; or, both the execution device of the method for acquiring the key point detection model and the execution device of the key point detection method are the terminal device 101.

Alternatively, the terminal device 101 may be any electronic product capable of performing man-machine interaction with a user through one or more modes of a keyboard, a touch pad, a touch screen, a remote controller, voice interaction or handwriting equipment. The terminal device 101 includes, but is not limited to, a mobile phone, a computer, an intelligent voice interaction device, an intelligent appliance, a vehicle-mounted terminal, an aircraft, and the like. The server 102 is a server, or a server cluster formed by a plurality of servers, or any one of a cloud computing platform and a virtualization center, which is not limited in this embodiment of the present application. The server 102 and the terminal apparatus 101 are communicatively connected via a wired network or a wireless network. The server 102 has a data receiving function, a data processing function, and a data transmitting function. Of course, the server 102 may also have other functions, which are not limited in this embodiment.

Those skilled in the art will appreciate that the terminal device 101 and the server 102 are only examples, and other existing or future terminal devices or servers, as applicable to the present application, are also included within the scope of the present application and are hereby incorporated by reference.

The embodiment of the present application provides a method for acquiring a keypoint detection model, where the method is executed by a computer device, and the method may be applied to the implementation environment shown in fig. 1, where the computer device may be the terminal device 101 in fig. 1, or may also be the server 102 in fig. 1, and the method is not limited in this embodiment of the present application. Taking a flowchart of an obtaining method of a keypoint detection model provided in the embodiment of the present application shown in fig. 2 as an example, as shown in fig. 2, the method includes the following steps 201 to 205.

In step 201, a training data set and an initial key point detection model are obtained, the training data set includes a sample image and standard position information corresponding to the sample key point, and the sample key point is a key point of a target portion included in the sample image.

In the exemplary embodiment of the present application, the target portion may be a hand, a foot, or another portion of a body, which is not limited in the embodiment of the present application. The sample key points are key points included in the target portion, the number of the sample key points is one or more, and the embodiment of the present application does not limit this. Illustratively, the target portion is a hand, and the hand is included in the sample image, and the sample key points are key points of the hand.

In one possible implementation, there are two implementations described below for obtaining the training data set.

The method comprises the following steps of firstly, obtaining a sample image; identifying the sample image to obtain candidate position information corresponding to the sample key point; adjusting the candidate position information to obtain standard position information corresponding to the sample key points; and acquiring a training data set according to the sample image and the standard position information corresponding to the sample key points.

The sample image may be an image stored in a storage space of the computer device, an image uploaded by a user, or an image downloaded from a browser, and a source of the sample image is not limited in the embodiment of the present application. The number of sample images may be one or more.

Optionally, the step of identifying the sample image to obtain candidate location information corresponding to the sample keypoint includes: and inputting the sample image into the super large model, and taking an output result of the super large model as candidate position information corresponding to the sample key points included in the sample image. For example, the super-large model may be a High Resolution Net (HRNET) model or a hourglass (a convolutional neural network) model.

Before the sample image is input into the super-large model, the super-large model needs to be trained, and the training process of the super-large model comprises the following steps: and acquiring a sample image set, wherein the sample image set comprises a first image and the position information of key points included in the first image, and the first image comprises a target part. And training the super-large model according to the first image and the position information of the key points included in the first image to obtain the trained super-large model. Optionally, the position information of the key points included in the first image is manually labeled.

Optionally, the step of adjusting candidate location information corresponding to the sample key point to obtain standard location information corresponding to the sample key point includes: manually adjusting candidate position information corresponding to the sample key points to obtain standard position information corresponding to the sample key points; or the computer device uniformly adjusts the candidate position information corresponding to the sample key points to obtain the standard position information corresponding to the sample key points.

For the same sample image, because the standards of each person are different, the position information of the key points labeled by each person is different, and if the initial key point detection model is updated according to the position information of the key points included in the artificially labeled sample image and the sample image, the jitter of the target key point detection model obtained by updating is larger, and the accuracy of the position information of the key points included in the determined image is lower. Therefore, according to the super-large model, the candidate position information of the key points included in the sample image is obtained, fine adjustment is performed on the basis of the candidate position information corresponding to the key points through manual work or computer equipment, and the standard position information corresponding to the key points included in the sample image is obtained, so that the standards of the standard position information corresponding to the key points are consistent, and the consistency of the standard position information corresponding to the key points of the sample is improved. When the initial key point detection model is updated according to the sample position information of the key points included in the sample image and the sample image acquired in the manner, the jitter of the obtained target key point detection model is small, and the accuracy of the position information of the key points included in the determined image can be improved.

In addition, when the target portion included in the sample image is a hand, the hand includes 21 key points, and the process of acquiring the standard position information corresponding to each key point is similar, and the embodiment of the present application will be described by taking as an example only the process of acquiring the standard position information corresponding to any one of the 21 key points. Taking the schematic diagram of key points of a hand provided by the embodiment of the present application shown in fig. 3 as an example, the black dots in fig. 3 are key points.

Acquiring shape parameters and posture parameters corresponding to a target part, and generating a target part model according to the shape parameters and the posture parameters, wherein the target part model comprises standard position information corresponding to a sample key point; pasting a texture map on the target part model to obtain the target part model pasted with the texture map; projecting the target part model attached with the texture map into a background map to obtain a sample image; and acquiring a training data set according to the sample image and the standard position information corresponding to the sample key points.

The shape (shape) parameter is used for controlling the shape of the target part, and the pose (position) parameter is used for controlling the pose of the target part. The shape parameters and the posture parameters can be input by a user or automatically generated by computer equipment, and the acquisition mode of the shape parameters and the posture parameters is not limited in the embodiment of the application.

Optionally, the generating the target portion model according to the shape parameter and the posture parameter includes: the shape parameters and the posture parameters are input into a mano model (a hand model with joints and non-rigid deformation), and a target part model is obtained according to the output result of the mano model.

In one possible implementation, the computer device stores a plurality of candidate texture maps, attaches the texture map to the target region model, and obtains the texture map-attached target region model by: and determining a texture map in the candidate texture maps, and pasting the determined texture map on the target part model to obtain the target part model with the texture map. The process of determining a texture map among a plurality of candidate texture maps includes, but is not limited to: the computer device randomly selects one candidate texture map from the plurality of candidate texture maps, or takes the selected candidate texture map as the determined texture map according to an operation instruction of a user.

Optionally, the process of attaching a texture map to the target region model to obtain the texture map-attached target region model includes: the target portion model and the Texture map are input to a Parametric hard Texture model (Parametric hard Texture model), and the target portion model to which the Texture map is attached is obtained according to an output result of the Parametric hard Texture model.

In a possible implementation manner, the computer device stores a plurality of candidate background maps, and the process of projecting the target region model with the texture map attached thereto into the background map to obtain the sample image includes: and determining a background image in the candidate background images, and projecting the target part model attached with the texture image to the determined background image to obtain a sample image. The process of determining a background map among a plurality of candidate background maps includes, but is not limited to: the computer equipment randomly selects one candidate background picture from the candidate background pictures or takes the selected candidate background picture as the determined background picture according to the operation instruction of the user.

The shape parameters and the posture parameters are automatically generated by computer equipment or manually input by a user, so that the generated sample images are more diversified, and when the sample images acquired in the mode are used for updating the initial key point detection model, the acquired target key point detection model can detect the position information of the key points of the target part included in the images in any shape and any posture, and the application range of the target hand key point detection model is wider.

Fig. 4 is a schematic diagram of a sample image acquiring process provided in an embodiment of the present application. In fig. 4, the shape parameters and the posture parameters are input into a mano model to obtain a target part model, and the target part model and the texture map are input into a parameter hard texture model to obtain a target part model attached with the texture map; and acquiring a sample image according to the target part model attached with the texture map and the background map.

It should be noted that any one of the above implementation manners may be selected to obtain the training data set, or both the training data sets obtained by the above two implementation manners may be used as the training data set of the initial keypoint detection model, which is not limited in this embodiment of the present application.

Fig. 5 is a schematic diagram of an initial keypoint detection model provided by an embodiment of the present application, and in fig. 5, the initial keypoint detection model includes 4 convolutional layers and 2 upsampling layers. And inputting the sample image into the initial key point detection model, and obtaining a sample heat point diagram corresponding to the sample key points included in the sample image through 4 convolution layers and 2 upsampling layers. Of course, the initial keypoint detection model may also include a greater or lesser number of convolutional layers and upsampled layers. The initial key point detection model provided by the embodiment of the application has the advantages that the feature map (feature map) is smaller, the number of the upper sampling layers (upsample) is less, the complexity of the model can be further reduced, and the processing speed of the model is improved. The initial key point detection model provided by the embodiment of the application comprises but is not limited to a mobile terminal suitable for a smart phone and the like.

In step 202, an initial keypoint detection model is called to process the sample image, so as to obtain a sample hotspot graph corresponding to the sample keypoint.

In one possible implementation, the sample image is input into the initial keypoint detection model, and a sample hotspot graph corresponding to the sample keypoint is obtained.

Wherein, each key point corresponds to a sample heat point diagram. Taking the target part included in the sample image as a hand and the hand includes 21 key points as an example, after the sample image is input into the initial key point detection model, a sample heat point map corresponding to each key point is obtained, that is, 21 sample heat point maps are obtained.

In step 203, sample position information corresponding to the sample key point is determined according to the sample hotspot graph.

In a possible implementation manner, the process of determining sample location information corresponding to the sample key point according to the sample hotspot graph includes: and performing soft-argmax (a processing mode) processing on the sample hotspot graph to obtain sample position information corresponding to the sample key points. The processing mode can avoid quantization errors, so that the determined sample position information is higher in accuracy and stability.

The process of performing soft-argmax processing on the sample hotspot graph to obtain the sample position information corresponding to the sample key point comprises the following steps: and acquiring a first hotspot graph and a second hotspot graph, wherein the first hotspot graph is a hotspot graph corresponding to the first dimension, and the second hotspot graph is a hotspot graph corresponding to the second dimension. And determining sample position information corresponding to the sample key points according to the sample hotspot graph, the first hotspot graph and the second hotspot graph.

The first hotspot graph and the second hotspot graph are set based on experience or adjusted according to an implementation environment, which is not limited in the embodiment of the present application. Fig. 6 is a schematic diagram of a sample hotspot graph, a first hotspot graph and a second hotspot graph provided in an embodiment of the present application. Fig. 6 (1) is a sample hotspot graph, fig. 6 (2) is a first hotspot graph, and fig. 6 (3) is a second hotspot graph.

The method and the device do not limit the process of determining the sample position information corresponding to the sample key point according to the sample hotspot graph, the first hotspot graph and the second hotspot graph. Optionally, determining a first numerical value according to the sample hotspot graph and the first hotspot graph, wherein the first numerical value is a numerical value of the sample key point in a first dimension; determining a second numerical value according to the sample hotspot graph and the second hotspot graph, wherein the second numerical value is the numerical value of the sample key point in a second dimension; and obtaining sample position information corresponding to the sample key points according to the first numerical value and the second numerical value.

The sample hotspot graph, the first hotspot graph and the second hotspot graph respectively comprise a plurality of numerical values, and the number of the numerical values of the sample hotspot graph, the first hotspot graph and the second hotspot graph is the same. From the sample hotspot graph and the first hotspot graph, the process of determining the first value includes, but is not limited to: and multiplying the sample hotspot graph by the numerical values at the same position in the first hotspot graph to obtain a third numerical value corresponding to each position, and determining the first numerical value according to the third numerical value corresponding to each position. Illustratively, the third values corresponding to the respective positions are added to obtain the first value. Determining a second value from the sample hotspot graph and the second hotspot graph comprises: and multiplying the numerical values at the same position in the sample hotspot graph and the second hotspot graph to obtain a fourth numerical value corresponding to each position, and determining the second numerical value according to the fourth numerical value corresponding to each position. Illustratively, the fourth values corresponding to the respective positions are added to obtain the second value.

Taking the sample, first, and second hotspot graphs shown in fig. 6 as examples, a first value x =0.1 x 0+0.1 x 0.4+0.6 x 0.4+0.1 x 0.8+0.1 x 0.4=0.4 is determined; the second value y =0.1 × 0+0.1 × (-0.4) +0.6 × 0+0.1 × 0.4=0, and the sample position information corresponding to the sample keypoint is obtained as (0.4, 0).

It should be noted that the determination process of each key point in the target portion included in the sample image is similar, and a detailed description thereof is omitted here.

In step 204, a reference loss value is determined according to the standard position information, the sample position information and the sample hotspot graph, wherein the reference loss value is used for indicating the detection accuracy of the initial key point detection model.

Optionally, the determining the reference loss value according to the standard position information, the sample position information, and the sample hotspot graph includes: determining a first loss value between the standard location information and the sample location information; and determining a second loss value according to the standard position information and the sample heat point diagram, and determining a reference loss value according to the first loss value and the second loss value.

Wherein the process of determining the first loss value between the standard location information and the sample location information comprises: and calling a target loss function according to the standard position information and the sample position information, and determining a first loss value. Alternatively, the target loss function may be an L2 loss function, but may also be other loss functions. Alternatively, the euclidean distance between the standard position information and the sample position information may also be used as the first loss value.

In one possible implementation, the determining the second loss value according to the standard position information and the sample hotspot graph includes: determining a third hotspot graph corresponding to the standard position information; and determining a second loss value according to the sample heat point diagram and the third heat point diagram. Optionally, a Mean Square Error (MSE) between the sample hotspot graph and the third hotspot graph is determined, the MSE being taken as the second loss value.

Fig. 7 is a schematic diagram of a third hotspot graph provided in the embodiment of the present application. Determining a second loss value as

。

Optionally, the determining the reference loss value according to the first loss value and the second loss value includes: the sum of the first loss value and the second loss value is taken as a reference loss value. The reference loss value may also be determined based on the first loss value, the second loss value, and the hyperparameter. The value of the hyper-parameter can be set based on experience, and can also be adjusted according to the implementation environment, which is not limited in the embodiment of the application. Illustratively, the value of the hyper-parameter is 0.1. For another example, the value of the hyper-parameter is 0.2.

In one possible implementation, the reference loss value is determined according to the following formula (1) based on the standard position information, the sample position information, and the sample hotspot graph.

Formula (1)

In the above-mentioned formula (1),

for the purpose of reference to the value of the loss,

in order to be the sample position information,

is the information of the standard position, and is,

in order to be the first loss value,

in the form of a regular term, the term,

in order to obtain a sample hot spot map,

in order to be the second loss value,

is a hyper-parameter.

It should be noted that the reference loss value is used to indicate the detection accuracy of the initial keypoint detection model, and the reference loss value and the detection accuracy of the initial keypoint detection model have an inverse relationship. The larger the reference loss value is, the lower the detection precision of the initial key point detection model is, and the lower the detection accuracy is; conversely, the smaller the reference loss value is, the higher the detection precision of the initial key point detection model is, and the higher the detection accuracy is.

In step 205, based on the reference loss value being greater than the loss threshold, the initial keypoint detection model is updated to obtain a target keypoint detection model, where the target keypoint detection model is used to detect the target image so as to determine target position information corresponding to the keypoint of the target portion included in the target image.

When the reference loss value is not greater than the loss threshold value, it is indicated that the detection precision of the initial key point detection model is high, and therefore the initial key point detection model is used as the target key point detection model. When the reference loss is greater than the loss threshold, it is indicated that the detection accuracy of the initial key point detection model is low, and the initial key point detection model needs to be updated to obtain a target key point detection model with high detection accuracy.

The loss threshold may be set based on experience, or may be adjusted according to an implementation environment, and the loss threshold is not limited in this embodiment.

Optionally, the updating the initial key point detection model to obtain the target key point detection model includes: updating the initial key point detection model to obtain an intermediate key point detection model; calling an intermediate key point detection model to process the sample image to obtain an intermediate hot point diagram corresponding to the sample key point; determining intermediate position information corresponding to the sample key points according to the intermediate hot point diagram; determining a candidate loss value according to the standard position information, the middle position information and the middle heat point diagram; and taking the intermediate key point detection model as a target key point detection model based on the candidate loss value not greater than the loss threshold value. And based on the fact that the candidate loss value is still larger than the loss threshold value, continuously updating the intermediate key point detection model until the updated key point detection model is called, acquiring a hot spot diagram corresponding to the sample image, acquiring position information corresponding to the key points included in the sample image according to the hot spot diagram, and taking the updated key point detection model as a target key point detection model based on the fact that the loss value determined by the position information corresponding to the key points, the standard position information corresponding to the key points and the hot spot diagram corresponding to the sample image is not larger than the loss threshold value.

Optionally, the updating the initial keypoint detection model to obtain the intermediate keypoint detection model includes: and adjusting parameters of each convolution layer and the up-sampling layer included by the initial key point detection model to obtain a middle key point detection model. The target key point detection model provided by the embodiment of the application only uses the computing power of 30M FLOPS (Million Floating-point Operations per Second), can be compatible with the key point detection of various scenes, can well support 3D key points, can obtain very stable points in the video processing process, and can be well suitable for various key point detection scenes.

According to the method, the sample position information corresponding to the sample key point of the target part included in the sample image is obtained through the initial key point detection model, and when the loss value is determined, not only the standard position information corresponding to the sample key point and the sample position information corresponding to the sample key point are considered, but also the sample heat point diagram corresponding to the sample key point is considered, so that the accuracy of the determined loss value is higher. And updating the initial key point detection model by adopting the loss value with higher accuracy, so that the obtained target key point detection model has higher detection accuracy, higher precision, better stability and higher accuracy and precision of key point detection.

The embodiment of the present application provides a method for detecting a keypoint, where the method is executed by a computer device, and taking application of the method to the implementation environment shown in fig. 1 as an example, the computer device may be the terminal device 101 in fig. 1, or may be the server 102 in fig. 1. Taking a flowchart of a method for detecting a keypoint provided in the embodiment of the present application shown in fig. 8 as an example, the method includes the following steps 801 to 803.

In step 801, a target image and a target keypoint detection model are acquired.

In the exemplary embodiment of the present application, the target keypoint detection model is obtained by the above method for obtaining a keypoint detection model shown in fig. 2. The target image includes a target portion, and the target portion is an arbitrary portion, for example, the target portion is a hand, a foot, or another portion of the body, which is not limited in this embodiment of the present application. The target key point detection model is used for detecting the target image so as to determine the position information of the key points of the target part included in the target image.

The embodiment of the present application does not limit the manner of acquiring the target image. Illustratively, the target image may be acquired in any of four ways.

The method comprises the steps of storing a plurality of candidate images to be subjected to key point detection in a storage space of computer equipment, and determining one candidate image as a target image from the plurality of candidate images.

For example, the computer device randomly determines one candidate image among a plurality of candidate images as the target image. For another example, the computer device displays a plurality of candidate images, and in response to receiving a selection instruction for any one of the candidate images, takes the selected candidate image as the target image.

And in the second mode, the image uploaded by the user is used as a target image.

Optionally, the computer device displays an image upload control for uploading an image. In response to an operation instruction for the image uploading control, the computer device receives the image uploaded by the user, and the computer device takes the image uploaded by the user as a target image.

And thirdly, downloading an image comprising the target part from the browser as a target image.

And in the fourth mode, the image acquired by the image acquisition device of the computer equipment is used as a target image.

Optionally, the computer device includes an image capturing device, which may be a camera or other components capable of capturing images. The computer equipment collects an image through the image collecting device, identifies the collected image, and takes the collected image as a target image in response to the collected image including a target part.

In step 802, a target keypoint detection model is called to process the target image to obtain a target hotspot graph corresponding to the target keypoint of the target part.

In a possible implementation manner, the target image is input into the target key point detection model, and the target heat point map corresponding to the target key point of the target part included in the target image is obtained according to the output result of the target key point detection model.

It should be noted that the number of key points included in the target portion in the target image is equal to the number of target hotspot maps, that is, each key point corresponds to one target hotspot map.

In step 803, target position information corresponding to the target key point is determined according to the target hotspot graph.

In one possible implementation, the target location information corresponding to the target key point may be determined according to the target hotspot graph in any one of the following two implementations.

According to the first implementation mode, first position information corresponding to the target key point is determined according to the target heat point diagram, and the first position information is used as the target position information corresponding to the target key point.

The process of determining the first location information corresponding to the target key point according to the target hotspot graph is consistent with the process of determining the sample location information corresponding to the sample key point according to the sample hotspot graph in the step 203, and is not repeated here.

As shown in fig. 9, which is a schematic view illustrating determination of target location information corresponding to a target key point, in fig. 9, a target image is input into a target key point detection model, and a target heat point map corresponding to the target key point included in a target location in the target image is obtained

Aim at eyesAnd (4) carrying out soft-argmax processing on the hotspot graph to obtain target position information (coordinates) corresponding to the target key points.

And determining first position information corresponding to the target key point according to the target heat point diagram, and determining target position information corresponding to the target key point according to the first position information and the reference position information.

The reference position information is position information of a target key point of a target part included in the reference image, and the acquisition time of the reference image is adjacent to the acquisition time of the target image and is before the acquisition time of the target image. Illustratively, the acquisition time of the target image is t, and the acquisition time of the reference image is t-1. The target key point of the target portion included in the reference image is the index finger tip, and the target key point of the target portion included in the target image is also the index finger tip.

Optionally, the process of determining the reference position information corresponding to the target key point of the target portion included in the reference image includes: acquiring a reference image, wherein the reference image comprises a target part; calling a target key point detection model to process the reference image to obtain a reference heat point diagram corresponding to a target key point of the target part; and determining reference position information corresponding to the target key points according to the reference heat point diagram.

The process of calling the target key point detection model to process the reference image to obtain the reference heat point diagram corresponding to the target key point of the target part is similar to the process of calling the initial key point detection model to process the sample image to obtain the sample heat point diagram corresponding to the sample key point in the step 202; the process of determining the reference position information corresponding to the target key point according to the reference heat point map is similar to the process of determining the sample position information corresponding to the sample key point according to the sample heat point map in step 203, and is not repeated here.

In a possible implementation manner, the process of determining, according to the first location information and the reference location information, the target location information corresponding to the target key point includes: acquiring an optical flow compensation value between a reference image and a target image, wherein the optical flow compensation value is used for indicating the speed from the reference image to the target image; determining the distance between the target key point in the reference image and the target image according to the reference position information and the first position information; determining a distance weight parameter according to the distance, wherein the distance weight parameter is in direct proportion to the distance; and determining target position information corresponding to the target key points according to the reference position information, the first position information, the optical flow compensation value, the distance and the distance weight parameter.

Alternatively, the optical flow compensation value between the reference image and the target image is acquired according to an LK (Lucas-Kanade) optical flow compensation method. The euclidean distance between the reference position information and the first position information may be taken as the distance of the target key point between the reference image and the target image.

Illustratively, target position information corresponding to the target key point is determined according to the following formula (2) based on the reference position information, the first position information, the optical flow compensation value, the distance, and the distance weight parameter.

Formula (2)

In the above-mentioned formula (2),

as the information on the position of the object,

in order to refer to the position information,

as the compensation value of the optical flow, for example,

in order to be a distance-weighting parameter,

is a distance between the two or more of the sensors,

is the first positionAnd (4) information.

Fig. 10 is a schematic diagram of a reference image and a target image according to an embodiment of the present application. Fig. 10 (1) shows a reference image, and fig. 10 (2) shows a target image. The reference image is the image of the t-1 th frame, and the target image is the image of the t-th frame.

Fig. 11 is a schematic diagram of detecting key points of a reference image and a target image according to an embodiment of the present application. The black dots in (1) and (2) in fig. 11 are the same key point, and the gray dots in (1) and (2) in fig. 11 are the same key point. The target position information of the black dots in (2) in fig. 11 is determined by the method of the second implementation manner, and the target position information of the gray dots in (2) in fig. 11 is determined by the method of the first implementation manner. As can be seen from fig. 11, the black dots are substantially still, and the gray dots will jump in a small range, which indicates that the target location information determined by the method provided by the second implementation mode is higher in accuracy and better in effect.

The target key point detection model obtained by the method has high detection precision, high detection accuracy and good detection effect, so that the target image is detected by the target key point detection model with high detection precision, high detection accuracy and good detection effect, and the target position information of the target key point of the target part in the detected target image has high accuracy and high precision.

Fig. 12 is a schematic structural diagram of an apparatus for acquiring a keypoint detection model according to an embodiment of the present application, and as shown in fig. 12, the apparatus includes:

an obtaining module 1201, configured to obtain a training data set and an initial key point detection model, where the training data set includes a sample image and standard position information corresponding to a sample key point, and the sample key point is a key point of a target portion included in the sample image;

the processing module 1202 is configured to invoke the initial key point detection model to process the sample image, so as to obtain a sample heat point map corresponding to the sample key point;

a determining module 1203, configured to determine, according to the sample hotspot graph, sample position information corresponding to the sample key point;

the determining module 1203 is further configured to determine a reference loss value according to the standard position information, the sample position information, and the sample hotspot graph, where the reference loss value is used to indicate the detection accuracy of the initial keypoint detection model;

an updating module 1204, configured to update the initial keypoint detection model based on that the reference loss value is greater than the loss threshold value, to obtain a target keypoint detection model, where the target keypoint detection model is used to detect the target image, so as to determine target position information corresponding to the keypoint of the target portion included in the target image.

In a possible implementation manner, the determining module 1203 is configured to obtain a first hotspot graph and a second hotspot graph, where the first hotspot graph is a hotspot graph corresponding to a first dimension, and the second hotspot graph is a hotspot graph corresponding to a second dimension; and determining sample position information corresponding to the sample key points according to the sample hotspot graph, the first hotspot graph and the second hotspot graph.

In a possible implementation manner, the determining module 1203 is configured to determine a first numerical value according to the sample hotspot graph and the first hotspot graph, where the first numerical value is a numerical value of the sample keypoint in a first dimension; determining a second numerical value according to the sample hotspot graph and the second hotspot graph, wherein the second numerical value is the numerical value of the sample key point in a second dimension; and determining sample position information corresponding to the sample key points according to the first numerical value and the second numerical value.

In one possible implementation manner, the sample hotspot graph, the first hotspot graph and the second hotspot graph respectively comprise a plurality of numerical values, and the number of the numerical values included in the sample hotspot graph, the first hotspot graph and the second hotspot graph is the same;

a determining module 1203, configured to multiply the sample hotspot graph and the numerical values located at the same position in the first hotspot graph to obtain a third numerical value corresponding to each position; determining a first numerical value according to a third numerical value corresponding to each position; multiplying the sample hotspot graph and the numerical values at the same position in the second hotspot graph to obtain a fourth numerical value corresponding to each position; and determining a second numerical value according to the fourth numerical value corresponding to each position.

In a possible implementation, the determining module 1203 is configured to determine a first loss value between the standard position information and the sample position information; determining a second loss value according to the standard position information and the sample heat point diagram; and determining a reference loss value according to the first loss value and the second loss value.

In a possible implementation manner, the determining module 1203 is configured to determine a third hotspot graph corresponding to the standard location information; and determining a second loss value according to the sample heat point diagram and the third heat point diagram.

In a possible implementation manner, the obtaining module 1201 is configured to obtain a sample image; identifying the sample image to obtain candidate position information corresponding to the sample key point; adjusting the candidate position information to obtain standard position information corresponding to the sample key points; and acquiring a training data set according to the sample image and the standard position information corresponding to the sample key points.

In a possible implementation manner, the obtaining module 1201 is configured to obtain a shape parameter and an attitude parameter corresponding to the target portion; generating a target part model according to the shape parameters and the posture parameters, wherein the target part model comprises standard position information corresponding to the sample key points; pasting a texture map on the target part model to obtain the target part model pasted with the texture map; projecting the target part model attached with the texture map into a background map to obtain a sample image; and acquiring a training data set according to the sample image and the standard position information corresponding to the sample key points.

In a possible implementation manner, the updating module 1204 is configured to update the initial keypoint detection model based on that the reference loss value is greater than the loss threshold value, so as to obtain an intermediate keypoint detection model; calling an intermediate key point detection model to process the sample image to obtain an intermediate hot point diagram corresponding to the sample key point; determining intermediate position information corresponding to the sample key points according to the intermediate hot spot diagram; determining a candidate loss value according to the standard position information, the middle position information and the middle heat point diagram; and taking the intermediate key point detection model as a target key point detection model based on the candidate loss value not greater than the loss threshold value.

According to the device, the sample position information corresponding to the sample key point of the target part included in the sample image is obtained through the initial key point detection model, and the loss value is determined by considering not only the standard position information corresponding to the sample key point and the sample position information corresponding to the sample key point, but also the sample heat point diagram corresponding to the sample key point, so that the accuracy of the determined loss value is higher. And updating the initial key point detection model by using the loss value with higher accuracy, so that the obtained target key point detection model has higher detection accuracy, higher precision, better stability and better key point detection effect.

Fig. 13 is a schematic structural diagram of a hand keypoint detection device according to an embodiment of the present application, and as shown in fig. 13, the device includes:

an obtaining module 1301, configured to obtain a target image and a target key point detection model, where the target image includes a target portion, and the target key point detection model is obtained by an obtaining apparatus of the key point detection model shown in fig. 12;

a processing module 1302, configured to invoke the target key point detection model to process the target image, so as to obtain a target heat point map corresponding to the target key point of the target portion;

and the determining module 1303 is configured to determine, according to the target hotspot graph, target position information corresponding to the target key point.

In a possible implementation manner, the determining module 1303 is configured to determine, according to the target hotspot graph, first location information corresponding to the target keypoint; taking the first position information as target position information corresponding to the target key points; or determining target position information corresponding to the target key point according to the first position information and reference position information, wherein the reference position information is the position information of the target key point of the target part included in the reference image, and the acquisition time of the reference image is adjacent to the acquisition time of the target image and is before the acquisition time of the target image.

In a possible implementation manner, the obtaining module 1301 is further configured to obtain a reference image, where the reference image includes the target portion;

the processing module 1302 is further configured to invoke the target key point detection model to process the reference image, so as to obtain a reference heat point diagram corresponding to the target key point of the target portion;

the determining module 1303 is further configured to determine, according to the reference hotspot graph, reference location information corresponding to the target keypoint.

In a possible implementation manner, the determining module 1303 is configured to obtain an optical flow compensation value between the reference image and the target image, where the optical flow compensation value is used to indicate a speed from the reference image to the target image; determining the distance between the target key point in the reference image and the target image according to the reference position information and the first position information;

determining a distance weight parameter according to the distance, wherein the distance weight parameter is in direct proportion to the distance; and determining target position information corresponding to the target key points according to the reference position information, the first position information, the optical flow compensation value, the distance and the distance weight parameter.

The target key point detection model obtained by the device has high detection precision, high detection accuracy and good detection effect, so that the target image is detected by the target key point detection model with high detection precision, high detection accuracy and good detection effect, and the target position information of the target key point of the target part in the detected target image has high accuracy.

It should be understood that, when the above-mentioned apparatus is provided to implement its functions, it is only illustrated by the division of the above-mentioned functional modules, and in practical applications, the above-mentioned functions may be distributed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

Fig. 14 shows a block diagram of a terminal device 1400 according to an exemplary embodiment of the present application. The terminal device 1400 may be a portable mobile terminal, such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal device 1400 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and so on.

In general, terminal device 1400 includes: a processor 1401, and a memory 1402.

Processor 1401 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 1401 may be implemented in at least one hardware form of DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array). Processor 1401 may also include a main processor, which is a processor for Processing data in an awake state, and is also referred to as a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1401 may be integrated with a GPU (Graphics Processing Unit) that is responsible for rendering and drawing content that the display screen needs to display. In some embodiments, processor 1401 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 1402 may include one or more computer-readable storage media, which may be non-transitory. Memory 1402 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1402 is used to store at least one instruction for execution by processor 1401 to implement the method for acquiring a keypoint detection model provided by the method embodiment shown in fig. 2 of the present application and/or the method for keypoint detection provided by the method embodiment shown in fig. 8.

In some embodiments, terminal device 1400 may further optionally include: a peripheral device interface 1403 and at least one peripheral device. The processor 1401, the memory 1402, and the peripheral device interface 1403 may be connected by buses or signal lines. Each peripheral device may be connected to the peripheral device interface 1403 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1404, a display 1405, a camera assembly 1406, audio circuitry 1407, and a power supply 1408.

The peripheral device interface 1403 can be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 1401 and the memory 1402. In some embodiments, the processor 1401, memory 1402, and peripheral interface 1403 are integrated on the same chip or circuit board; in some other embodiments, any one or both of the processor 1401, the memory 1402, and the peripheral device interface 1403 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 1404 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 1404 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 1404 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1404 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 1404 may communicate with other terminal devices via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 1404 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 1405 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1405 is a touch display screen, the display screen 1405 also has the ability to capture touch signals on or over the surface of the display screen 1405. The touch signal may be input to the processor 1401 for processing as a control signal. At this point, the display 1405 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 1405 may be one, disposed on the front panel of the terminal device 1400; in other embodiments, the display 1405 may be at least two, and is disposed on different surfaces of the terminal 1400 or in a foldable design; in other embodiments, the display 1405 may be a flexible display disposed on a curved surface or a folded surface of the terminal device 1400. Even further, the display 1405 may be arranged in a non-rectangular irregular figure, i.e., a shaped screen. The Display 1405 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.

The camera assembly 1406 is used to capture images or video. Optionally, camera assembly 1406 includes a front camera and a rear camera. In general, a front camera is disposed on a front panel of the terminal apparatus 1400, and a rear camera is disposed on a rear surface of the terminal apparatus 1400. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1406 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp and can be used for light compensation under different color temperatures.

The audio circuit 1407 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1401 for processing or inputting the electric signals to the radio frequency circuit 1404 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different positions of the terminal device 1400. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1401 or the radio frequency circuit 1404 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuit 1407 may also include a headphone jack.

Power supply 1408 is used to provide power to various components in terminal device 1400. The power supply 1408 may be ac, dc, disposable or rechargeable. When the power supply 1408 comprises a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal device 1400 further includes one or more sensors 1409. The one or more sensors 1409 include, but are not limited to: acceleration sensor 1410, gyro sensor 1411, pressure sensor 1412, optical sensor 1413, and proximity sensor 1414.

The acceleration sensor 1410 may detect the magnitude of acceleration on three coordinate axes of the coordinate system established with the terminal apparatus 1400. For example, the acceleration sensor 1410 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 1401 can control the display 1405 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1410. The acceleration sensor 1410 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 1411 may detect a body direction and a rotation angle of the terminal device 1400, and the gyro sensor 1411 may cooperate with the acceleration sensor 1410 to acquire a 3D motion of the user on the terminal device 1400. The processor 1401, based on the data collected by the gyro sensor 1411, may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensors 1412 may be disposed on the side bezel of terminal device 1400 and/or underneath display 1405. When the pressure sensor 1412 is arranged at the side frame of the terminal device 1400, the holding signal of the user to the terminal device 1400 can be detected, and the processor 1401 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 1412. When the pressure sensor 1412 is disposed at the lower layer of the display 1405, the processor 1401 controls the operability control on the UI interface according to the pressure operation of the user on the display 1405. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The optical sensor 1413 is used to collect ambient light intensity. In one embodiment, processor 1401 may control the display brightness of display 1405 based on the ambient light intensity collected by optical sensor 1413. Specifically, when the ambient light intensity is high, the display luminance of the display screen 1405 is increased; when the ambient light intensity is low, the display brightness of the display screen 1405 is reduced. In another embodiment, the processor 1401 can also dynamically adjust the shooting parameters of the camera component 1406 according to the intensity of the ambient light collected by the optical sensor 1413.

Proximity sensors 1414, also known as distance sensors, are typically disposed on the front panel of the terminal device 1400. The proximity sensor 1414 is used to collect the distance between the user and the front of the terminal device 1400. In one embodiment, the processor 1401 controls the display 1405 to switch from the bright screen state to the dark screen state when the proximity sensor 1414 detects that the distance between the user and the front of the terminal device 1400 is gradually decreased; when the proximity sensor 1414 detects that the distance between the user and the front of the terminal device 1400 is gradually increased, the display 1405 is controlled by the processor 1401 to switch from the sniff state to the brighten state.

Those skilled in the art will appreciate that the architecture shown in fig. 14 is not limiting of terminal device 1400 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be employed.

Fig. 15 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 1500 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 1501 and one or more memories 1502, where at least one program code is stored in the one or more memories 1502, and is loaded and executed by the one or more processors 1501 to implement the method for obtaining the keypoint detection model according to the method embodiment shown in fig. 2 and/or the method for detecting keypoints according to the method embodiment shown in fig. 8. Certainly, the server 1500 may further have a wired or wireless network interface, a keyboard, an input/output interface, and other components to facilitate input and output, and the server 1500 may further include other components for implementing functions of the device, which is not described herein again.

In an exemplary embodiment, a computer-readable storage medium is further provided, where at least one program code is stored, and the at least one program code is loaded into and executed by a processor, so as to enable a computer to implement the method for acquiring a keypoint detection model provided by the method embodiment shown in fig. 2 and/or the method for detecting keypoints provided by the method embodiment shown in fig. 8.

Alternatively, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program or a computer program product is further provided, in which at least one computer instruction is stored, and the at least one computer instruction is loaded and executed by a processor, so as to enable a computer to implement the method for acquiring the keypoint detection model provided by the method embodiment shown in fig. 2 and/or the method for detecting keypoints provided by the method embodiment shown in fig. 8.

It should be noted that information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals referred to in this application are authorized by the user or sufficiently authorized by various parties, and the collection, use, and processing of the relevant data is required to comply with relevant laws and regulations and standards in relevant countries and regions. For example, the sample images, target images, etc. referred to in this application are obtained with sufficient authorization.

It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the principles of the present application should be included in the protection scope of the present application.

Claims

1. A method for acquiring a keypoint detection model is characterized by comprising the following steps:

acquiring a training data set and an initial key point detection model, wherein the training data set comprises a sample image and standard position information corresponding to the sample key point, and the sample key point is a key point of a target part included in the sample image;

2. The method of claim 1, wherein determining sample location information corresponding to the sample keypoints from the sample hotspot graph comprises:

acquiring a first hotspot graph and a second hotspot graph, wherein the first hotspot graph is a hotspot graph corresponding to a first dimension, and the second hotspot graph is a hotspot graph corresponding to a second dimension;

3. The method of claim 2, wherein determining sample location information corresponding to the sample keypoints from the sample hotspot graph, the first hotspot graph, and the second hotspot graph comprises:

determining a first numerical value according to the sample hotspot graph and the first hotspot graph, wherein the first numerical value is the numerical value of the sample key point in the first dimension;

4. The method of claim 3, wherein the sample hotspot graph, the first hotspot graph, and the second hotspot graph each comprise a plurality of numerical values, and wherein the sample hotspot graph, the first hotspot graph, and the second hotspot graph comprise the same number of numerical values;

determining a first value according to the sample hotspot graph and the first hotspot graph, wherein the determining comprises: multiplying the sample hotspot graph and the numerical values at the same position in the first hotspot graph to obtain a third numerical value corresponding to each position; determining the first numerical value according to the third numerical value corresponding to each position;

determining a second value from the sample hotspot graph and the second hotspot graph comprises: multiplying the numerical values at the same position in the sample hotspot graph and the second hotspot graph to obtain a fourth numerical value corresponding to each position; and determining the second numerical value according to the fourth numerical value corresponding to each position.

5. The method according to any one of claims 1 to 4, wherein the determining a reference loss value based on the standard location information, the sample location information, and the sample hotspot graph comprises:

determining a first loss value between the standard location information and the sample location information;

6. The method of claim 5, wherein determining a second loss value based on the standard location information and the sample hotspot graph comprises:

determining a third heat point diagram corresponding to the standard position information;

7. The method of any of claims 1 to 4 and 6, wherein the obtaining the training data set comprises:

acquiring the sample image;

8. The method of any of claims 1 to 4 and 6, wherein the obtaining a training data set comprises:

acquiring shape parameters and posture parameters corresponding to the target part;

9. The method according to any one of claims 1 to 4 and 6, wherein the updating the initial keypoint detection model based on the reference loss value being greater than a loss threshold to obtain a target keypoint detection model comprises:

updating the initial key point detection model based on the reference loss value being larger than the loss threshold value to obtain an intermediate key point detection model;

10. A method of keypoint detection, the method comprising:

acquiring a target image and a target key point detection model, wherein the target image comprises a target part, and the target key point detection model is acquired by the acquisition method of the key point detection model according to any one of claims 1 to 9;

11. The method according to claim 10, wherein the determining target location information corresponding to the target key point according to the target hotspot graph comprises:

determining first position information corresponding to the target key point according to the target heat point diagram;

taking the first position information as target position information corresponding to the target key point; or determining target position information corresponding to the target key point according to the first position information and reference position information, wherein the reference position information is position information of the target key point of the target part included in the reference image, and the acquisition time of the reference image is adjacent to and before the acquisition time of the target image.

12. The method of claim 11, further comprising:

acquiring the reference image, wherein the reference image comprises the target part;

calling the target key point detection model to process the reference image to obtain a reference heat point diagram corresponding to the target key point of the target part;

and determining reference position information corresponding to the target key point according to the reference heat point diagram.

13. The method according to claim 11, wherein the determining the target location information corresponding to the target key point according to the first location information and the reference location information comprises:

acquiring an optical flow compensation value between the reference image and the target image, wherein the optical flow compensation value is used for indicating the speed of the reference image to the target image;

14. An apparatus for acquiring a keypoint detection model, the apparatus comprising:

15. A keypoint detection device, the device comprising:

an obtaining module, configured to obtain a target image and a target keypoint detection model, where the target image includes a target portion, and the target keypoint detection model is obtained by the obtaining apparatus of the keypoint detection model according to claim 14;

16. A computer device comprising a processor and a memory, the memory having stored therein at least one program code, the at least one program code being loaded into and executed by the processor, to cause the computer device to carry out a method of acquiring a keypoint detection model according to any one of claims 1 to 9, or to cause the computer device to carry out a method of keypoint detection according to any one of claims 10 to 13.

17. A computer-readable storage medium, having stored therein at least one program code, which is loaded and executed by a processor, to cause a computer to implement the method of acquiring a keypoint detection model according to any one of claims 1 to 9, or to cause the computer to implement the method of keypoint detection according to any one of claims 10 to 13.