CN113468924A

CN113468924A - Key point detection model training method and device and key point detection method and device

Info

Publication number: CN113468924A
Application number: CN202010243835.5A
Authority: CN
Inventors: 鲍慊; 刘武; 梅涛
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2021-10-01
Anticipated expiration: 2040-03-31
Also published as: CN113468924B

Abstract

The disclosure provides a key point detection model training method and device and a key point detection method and device. Searching a topological relation between nodes in the deep learning model in a preset search space by the key point detection model training device; processing the feature map of the preset image by using a deep learning model to obtain an output feature map; processing the output characteristic diagram to generate a key point heat map of a preset image; taking the difference between the position coordinates of the key points in the key point heat map and the preset position coordinates as a loss function, and repeatedly executing the step of searching the topological relation among the nodes in the deep learning model in a preset search space according to the loss function until the loss function value meets the preset condition or reaches the preset cycle number; and training the deep learning model by using the training data to obtain a key point detection model. The optimal key point detection result is provided by adjusting the topological relation among all the nodes in the deep learning model.

Description

Key point detection model training method and device and key point detection method and device

Technical Field

The disclosure relates to the field of information processing, and in particular to a method and a device for training a key point detection model and a method and a device for detecting key points.

Background

The human body key point detection is to obtain the key point position of a human body in an image or a video through a computer vision technology. In the related art, a network model based on deep learning is generally adopted to realize human body key point detection. These network models are pre-designed based on manual experience.

Disclosure of Invention

The inventor finds that, through research, because existing human body key point detection algorithms are network models designed based on manual experience, the design of the network models usually requires a great deal of network design experience and parameter adjustment experience, and usually requires a great deal of time and effort of designers. In addition, the network model designed according to manual experience and the corresponding key point detection task cannot be completely matched, so that for the key point detection task, the network model designed based on manual experience cannot provide an optimal key point detection result.

Accordingly, the present disclosure provides a keypoint detection model training scheme and a corresponding keypoint detection scheme. By dynamically adjusting the topological relation among the nodes in the deep learning model, the optimal key point detection result can be provided.

According to a first aspect of the embodiments of the present disclosure, there is provided a method for training a keypoint detection model, including: searching a topological relation among nodes in the deep learning model in a preset search space; extracting a corresponding feature map from a preset image; processing the feature map of the preset image by using the deep learning model to obtain an output feature map; processing the output feature map to generate a key point heat map of the preset image; taking the difference between the position coordinates of the key points in the key point heat map and preset position coordinates as a loss function, and repeatedly executing the step of searching the topological relation among the nodes in the deep learning model in a preset search space according to the loss function until the loss function value meets a preset condition or reaches a preset cycle number; and training the deep learning model by using training data to obtain a key point detection model.

In some embodiments, the deep learning model comprises a first deep learning submodel, a second deep learning submodel, and a third deep learning submodel; the processing of the feature map of the preset image by using the deep learning model comprises the following steps: processing a feature map of a preset image by using the first deep learning submodel to obtain a first feature map, wherein the first feature map and the feature map of the preset image have the same size; processing the first feature map by using the second deep learning submodel to obtain a second feature map, wherein the size of the second feature map is smaller than that of the first feature map; and performing fusion processing on the second feature map and the first feature map by using the third deep learning submodel to obtain the output feature map.

In some embodiments, the second deep learning submodel includes N transformation network models; processing the first feature map with the second deep learning submodel includes: processing the first feature map by using a 1 st transformation network model to obtain a 1 st output feature map, wherein the size of the 1 st output feature map is smaller than that of the first feature map; and processing the (i-1) th output feature map by using the (i) th transformation network model to obtain an (i) th output feature map, wherein the size of the (i) th output feature map is smaller than that of the (i-1) th output result, and i is more than or equal to 2 and less than or equal to N.

In some embodiments, the third deep learning submodel includes N converged network models; the fusion processing of the second feature map and the first feature map by using the third deep learning submodel comprises the following steps: fusing the output characteristic diagram of the Nth conversion network model and the output characteristic diagram of the (N-1) th conversion network model by using the 1 st fusion network model to obtain a 1 st fusion characteristic diagram; fusing the j-1 st fusion characteristic diagram and the output characteristic diagram of the N-j transformation network model by using the j-th fusion network model to obtain a j-th fusion characteristic diagram, wherein j is more than or equal to 2 and is less than or equal to N-1; and fusing the (N-1) th fusion feature map and the first feature map by using the Nth fusion network model to obtain the output feature map.

In some embodiments, in the process of entering the fusion processing, in the received two feature maps, each fusion network model takes the feature map with a smaller size as a first feature map to be processed, and takes the feature map with a larger size as a second feature map to be processed; performing up-sampling on the first feature map to be processed to obtain a third feature map to be processed, wherein the third feature map to be processed and the second feature map to be processed have the same size; and carrying out fusion processing on the third feature graph to be processed and the second feature graph to be processed.

In some embodiments, the search space comprises at least one of convolution, pooling, full concatenation, batch normalization.

According to a second aspect of the embodiments of the present disclosure, there is provided a keypoint detection model training device, including: the searching module is configured to search out a topological relation among nodes in the deep learning model in a preset searching space; the characteristic extraction module is configured to extract a corresponding characteristic graph from a preset image; the first processing module is configured to process the feature map of the preset image by using the deep learning model to obtain an output feature map; a second processing module configured to process the output feature map to generate a keypoint heat map of the preset image; a first training module, configured to use a difference between a key point position coordinate in the key point heat map and a preset position coordinate as a loss function, and instruct a search module to repeatedly execute an operation of searching out a topological relation between nodes in a deep learning model in a preset search space according to the loss function until a loss function value meets a preset condition or reaches a preset cycle number; a second training module configured to train the deep learning model with training data to obtain a keypoint detection model.

According to a third aspect of the embodiments of the present disclosure, there is provided a keypoint detection model training device, including: a memory configured to store instructions; a processor coupled to the memory, the processor configured to perform a method of performing keypoint detection model training as described in any of the above embodiments based on instructions stored in the memory.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a keypoint detection method, including: collecting an image to be detected; and performing key point detection processing on the image to be detected by using the key point detection model trained by the key point detection model training method of any embodiment to obtain key points in the image to be detected.

According to a fifth aspect of the embodiments of the present disclosure, there is provided a keypoint detection apparatus, comprising: the acquisition module is configured to acquire an image to be detected; the detection module is configured to perform the keypoint detection processing on the image to be detected by using the keypoint detection model trained by the keypoint detection model training method according to any embodiment, so as to obtain the keypoints in the image to be detected.

According to a sixth aspect of the embodiments of the present disclosure, there is provided a keypoint detection apparatus comprising: a memory configured to store instructions; a processor coupled to the memory, the processor configured to perform a method of performing keypoint detection as described in any of the above embodiments based on instructions stored by the memory.

According to a seventh aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, in which computer instructions are stored, and when executed by a processor, the computer-readable storage medium implements the method according to any of the embodiments described above.

Other features of the present disclosure and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to the drawings without inventive exercise.

FIG. 1 is a schematic flow chart diagram illustrating a method for training a keypoint detection model according to an embodiment of the present disclosure;

FIG. 2 is a schematic structural diagram of a keypoint detection model according to an embodiment of the disclosure;

FIG. 3 is a schematic flow chart diagram illustrating a method for training a keypoint detection model according to another embodiment of the present disclosure;

FIG. 4 is a schematic structural diagram of a keypoint detection model according to another embodiment of the present disclosure;

FIG. 5 is a schematic structural diagram of a keypoint detection model according to another embodiment of the present disclosure;

FIG. 6 is a schematic structural diagram of a keypoint detection model training device according to an embodiment of the present disclosure;

FIG. 7 is a schematic structural diagram of a keypoint detection model training device according to another embodiment of the present disclosure;

FIG. 8 is a schematic flow chart diagram of a keypoint detection method according to an embodiment of the disclosure;

FIG. 9 is a schematic structural diagram of a keypoint detection apparatus according to an embodiment of the present disclosure;

fig. 10 is a schematic structural diagram of a keypoint detection apparatus according to another embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

The relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

Fig. 1 is a schematic flow chart of a method for training a keypoint detection model according to an embodiment of the disclosure. In some embodiments, the following steps of the keypoint detection model training method are performed by the keypoint detection model training apparatus. The corresponding keypoint detection model is shown in fig. 2.

In step 101, searching out a topological relation between nodes in the deep learning model in a preset search space.

It should be noted here that in deep learning, the neural network may be considered to be composed of many neuron nodes OP, the operation space of each OP includes convolution, pooling, full connection, batch normalization, etc., and each OP may include several inputs and outputs. The connection between any two OPs represents one of these operations. And selecting corresponding operation of each OP by searching in the search space, and determining a corresponding neural network architecture according to the topological relation among the nodes. In addition, by limiting the search space, the search speed can be increased.

Since how to perform the search itself is not the point of the invention of the present disclosure, it is not described here.

In step 102, a corresponding feature map is extracted from the preset image.

In some embodiments, the preset image is processed using a preset feature map extraction model (e.g., a convolutional neural network with two convolution kernels of 3 × 3 and a step size of 2) to obtain a feature map of the preset image.

In step 103, the feature map of the preset image is processed by using the deep learning model to obtain an output feature map.

By processing the feature map of the preset image for feature fusion, subsequent heatmap processing is facilitated.

At step 104, the output feature map is processed to generate a keypoint heat map of the preset image.

For example, the output feature maps are processed using a preset heat map generation model (e.g., a convolutional neural network with a convolution kernel of 1 × 1 and a step size of 1) to generate corresponding keypoint heat maps.

Here, it should be noted that the number of heatmaps is the same as the number of key points. For example, if there are 16 keypoints in a human skeleton, 16 heat maps are generated, one for each keypoint. And searching the position where the maximum probability is located on each heat map, namely, the position coordinate of the corresponding key point. By summarizing the coordinates of the keypoints obtained from the 16 heatmaps, the coordinates of the 16 keypoints in the human skeleton can be obtained.

Since heatmaps are not themselves the point of the present disclosure, they are not described here.

In step 105, the difference between the position coordinates of the key points in the key point heat map and the preset position coordinates is used as a loss function, and the step of searching the topological relation between the nodes in the deep learning model in the preset search space is repeatedly executed according to the loss function until the loss function value meets the preset condition or reaches the preset cycle number.

For example, if the loss function value is less than a predetermined threshold, the search process may be stopped. In addition, the search process may also be stopped when the number of cycles reaches a preset value.

In step 106, the deep learning model is trained using the training data to obtain a keypoint detection model.

And training the deep learning model determined through the search process by utilizing the training data so as to determine the corresponding weight parameters of the optimal deep learning model.

In the method for training the key point detection model provided by the embodiment of the disclosure, the deep learning model matched with the key point detection task can be obtained by optimizing the topological relation of the nodes in the deep learning model.

Fig. 3 is a schematic flowchart of a method for training a keypoint detection model according to another embodiment of the disclosure. In some embodiments, the following steps of the keypoint detection model training method are performed by the keypoint detection model training apparatus. The corresponding keypoint detection model is shown in fig. 4.

In step 301, a topological relation between nodes in the first deep learning submodel, the second deep learning submodel, and the third deep learning submodel is searched in a preset search space.

In step 302, a corresponding feature map is extracted from a preset image.

In step 303, the feature map of the preset image is processed by using the first deep learning submodel to obtain a first feature map, and the size of the feature map of the first feature map is the same as that of the feature map of the preset image.

In step 304, the first feature map is processed by a second deep learning submodel to obtain a second feature map, wherein the size of the second feature map is smaller than that of the first feature map.

In step 305, the second feature map and the first feature map are fused by using a third deep learning submodel to obtain an output feature map.

It should be noted that, since the size of the second feature map is smaller than that of the first feature map, the resolution of the second feature map is lower than that of the first feature map. And by fusing feature maps with different resolutions, the subsequent key point heat map generation is facilitated.

In some embodiments, the third deep learning submodel performs upsampling on the second feature map to obtain a third feature map when performing the fusion process, and the third feature map and the first feature map have the same size. Then, the third feature map and the second feature map are subjected to fusion processing. Thereby, the best fusion result can be obtained.

Upsampling is achieved, for example, by using bilinear interpolation.

At step 306, the output feature map is processed to generate a keypoint heat map of the preset image.

Here, the output feature maps are processed using a heat map generation model (e.g., a convolutional neural network with a convolution kernel of 1 x 1 and step size of 1) to generate corresponding keypoint heat maps.

In step 307, the difference between the position coordinates of the key points in the key point heat map and the preset position coordinates is used as a loss function, and the step of searching the topological relation among the nodes in the first deep learning sub-model, the second deep learning sub-model and the third deep learning sub-model in the preset search space is repeatedly executed according to the loss function until the loss function value meets the preset condition or reaches the preset cycle number.

In step 308, the first deep learning submodel, the second deep learning submodel, and the third deep learning submodel are trained using the training data to obtain a keypoint detection model.

And training the first deep learning submodel, the second deep learning submodel and the third deep learning submodel determined through the searching process by utilizing the training data so as to determine the optimal corresponding weight parameters of the first deep learning submodel, the second deep learning submodel and the third deep learning submodel.

In some embodiments, the second deep learning submodel includes N transformation network models, and the third deep learning submodel includes N fusion network models, where N is a positive integer.

In the second deep learning submodel, the 1 st transformation network model is used for processing the first feature map to obtain a 1 st output feature map, and the size of the 1 st output feature map is smaller than that of the first feature map. And processing the (i-1) th output feature map by using the (i) th transformation network model to obtain an (i) th output feature map, wherein the size of the (i) th output feature map is smaller than that of the (i-1) th output result, and i is more than or equal to 2 and less than or equal to N. And in the third deep learning submodel, fusing the output characteristic diagram of the Nth conversion network model and the output characteristic diagram of the (N-1) th conversion network model by using the 1 st fusion network model to obtain the 1 st fusion characteristic diagram. And fusing the j-1 st fusion characteristic diagram and the output characteristic diagram of the N-j transformation network model by using the j-th fusion network model to obtain a j-th fusion characteristic diagram, wherein j is more than or equal to 2 and is less than or equal to N-1. And fusing the (N-1) th fusion feature map and the first feature map by using the Nth fusion network model to obtain an output feature map.

In some embodiments, in the process of entering the fusion process, in the two received feature maps, the feature map with the smaller size is used as the first feature map to be processed, and the feature map with the larger size is used as the second feature map to be processed. And performing up-sampling on the first characteristic diagram to be processed to obtain a third characteristic diagram to be processed, wherein the third characteristic diagram to be processed and the second characteristic diagram to be processed have the same size. And then carrying out fusion processing on the third feature graph to be processed and the second feature graph to be processed.

Fig. 5 is a schematic structural diagram of a keypoint detection model according to another embodiment of the present disclosure. As an example, in the embodiment shown in fig. 5, the second deep learning submodel includes 3 transformation network models, and the third deep learning submodel includes 3 fusion network models.

In the second deep learning submodel, the first feature map is processed by using the transformation network model 1 to obtain a 1 st output feature map, and the size of the 1 st output feature map is smaller than that of the first feature map. And processing the 1 st output characteristic diagram by using the transformation network model 2 to obtain a 2 nd output characteristic diagram, wherein the size of the 2 nd output characteristic diagram is smaller than that of the 1 st output result. And processing the 2 nd output characteristic diagram by using the transformation network model 3 to obtain a 3 rd output characteristic diagram, wherein the size of the 3 rd output characteristic diagram is smaller than that of the 2 nd output result.

In the third deep learning submodel, the output characteristic diagram of the transformation network model 3 and the output characteristic diagram of the transformation network model 2 are fused by using the fusion network model 1 to obtain a 1 st fusion characteristic diagram. And fusing the 1 st fusion characteristic diagram and the output characteristic diagram of the transformation network model 1 by using the fusion network model 2 to obtain a 2 nd fusion characteristic diagram. And fusing the 2 nd fusion feature map and the first feature map by using the 3 rd fusion network model to obtain an output feature map.

Fig. 6 is a schematic structural diagram of a keypoint detection model training device according to an embodiment of the present disclosure. As shown in fig. 6, the key point detection model training apparatus includes a search module 61, a feature extraction module 62, a first processing module 63, a second processing module 64, a first training module 65, and a second training module 66. The corresponding keypoint detection model is shown in fig. 2.

The searching module 61 searches the topological relation between the nodes in the deep learning model in a preset search space.

The feature extraction module 62 extracts a corresponding feature map from the preset image.

The first processing module 63 processes the feature map of the preset image by using the deep learning model to obtain an output feature map. By processing the feature map of the preset image, the subsequent heat map processing is facilitated through feature fusion.

The second processing module 64 processes the output feature map to generate a keypoint heat map of the preset image.

The first training module 65 uses the difference between the position coordinates of the key points in the key point heatmap and the preset position coordinates as a loss function, and instructs the search module 61 to repeatedly perform the operation of searching out the topological relation between the nodes in the deep learning model in the preset search space according to the loss function until the loss function value satisfies the preset condition or reaches the preset cycle number.

The second training module 66 trains the deep learning model with the training data to obtain a keypoint detection model.

In some embodiments, the corresponding keypoint detection model is shown in FIG. 4.

The searching module 61 searches a preset search space for a topological relation among nodes in the first deep learning submodel, the second deep learning submodel and the third deep learning submodel.

The first processing module 63 processes the feature map of the preset image by using the first deep learning submodel to obtain a first feature map. The first feature map and the feature map of the preset image have the same size. And processing the first feature map by using a second deep learning submodel to obtain a second feature map. The size of the second feature map is smaller than the size of the first feature map. And then, carrying out fusion processing on the second feature map and the first feature map by using a third deep learning submodel to obtain an output feature map.

In some embodiments, the first processing module 63 performs up-sampling on the second feature map to obtain a third feature map when performing the fusion process using the third deep learning submodel, and the third feature map and the first feature map have the same size. Then, the third feature map and the second feature map are subjected to fusion processing. Thereby, the best fusion result can be obtained.

The first training module 65 repeatedly executes an operation of searching the topological relation among the nodes in the first deep learning submodel, the second deep learning submodel, and the third deep learning submodel in a preset search space according to the loss function instruction search module 61 until the loss function value satisfies a preset condition or reaches a preset cycle number.

The second training module 66 trains the first deep learning submodel, the second deep learning submodel, and the third deep learning submodel with the training data to obtain a keypoint detection model.

In some embodiments, the second deep learning submodel includes N transformation network models and the third deep learning submodel includes N fusion network models.

The first processing module 63 processes the first feature map by using the 1 st transformation network model to obtain a 1 st output feature map, wherein the size of the 1 st output feature map is smaller than that of the first feature map. The first processing module 63 processes the i-1 th output feature map by using the i-th transformation network model to obtain the i-th output feature map, wherein the size of the i-th output feature map is smaller than that of the i-1 th output result, and i is greater than or equal to 2 and less than or equal to N.

The first processing module 63 further fuses the output feature map of the nth transform network model and the output feature map of the N-1 st transform network model by using the 1 st fusion network model to obtain a 1 st fusion feature map. The first processing module 63 further fuses the j-1 th fusion feature map and the output feature map of the N-j transformation network model by using the j-th fusion network model to obtain the j-th fusion feature map, wherein j is greater than or equal to 2 and is less than or equal to N-1. The first processing module 63 fuses the N-1 th fusion feature map and the first feature map by using the nth fusion network model to obtain an output feature map.

In some embodiments, in the process of entering the fusion processing by using each fusion network model, the first processing module 63 uses, as the first to-be-processed feature map, the feature map with a smaller size and uses the feature map with a larger size as the second to-be-processed feature map in the two received feature maps, performs up-sampling on the first to-be-processed feature map to obtain a third to-be-processed feature map, where the third to-be-processed feature map and the second to-be-processed feature map have the same size, and performs the fusion processing on the third to-be-processed feature map and the second to-be-processed feature map.

In some embodiments, the corresponding keypoint detection model is shown in FIG. 5.

The first processing module 63 processes the first feature map by using the transform network model 1 to obtain a 1 st output feature map, wherein the size of the 1 st output feature map is smaller than that of the first feature map. The first processing module 63 processes the 1 st output feature map by using the transformation network model 2 to obtain a 2 nd output feature map, wherein the size of the 2 nd output feature map is smaller than that of the 1 st output result. The first processing module 63 processes the 2 nd output feature map by using the transformation network model 3 to obtain a 3 rd output feature map, wherein the size of the 3 rd output feature map is smaller than that of the 2 nd output result.

In addition, the first processing module 63 fuses the output feature map of the transform network model 3 and the output feature map of the transform network model 2 by using the fusion network model 1 to obtain a 1 st fusion feature map. The first processing module 63 fuses the 1 st fused feature map and the output feature map of the transform network model 1 by using the fused network model 2 to obtain a 2 nd fused feature map. The first processing module 63 fuses the 2 nd fused feature map and the first feature map by using the 3 rd fused network model to obtain an output feature map.

Fig. 7 is a schematic structural diagram of a keypoint detection model training device according to another embodiment of the present disclosure. As shown in fig. 7, the training device comprises a memory 71 and a processor 72.

The memory 71 is used for storing instructions, the processor 72 is coupled to the memory 71, and the processor 72 is configured to execute the method according to any one of the embodiments in fig. 1 or fig. 3 based on the instructions stored in the memory.

As shown in fig. 7, the apparatus further includes a communication interface 73 for information interaction with other devices. Meanwhile, the device also comprises a bus 74, and the processor 72, the communication interface 73 and the memory 71 are communicated with each other through the bus 74.

The memory 71 may comprise a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 71 may also be a memory array. The storage 71 may also be partitioned and the blocks may be combined into virtual volumes according to certain rules.

Further, the processor 72 may be a central processing unit CPU, or may be an application specific integrated circuit ASIC, or one or more integrated circuits configured to implement embodiments of the present disclosure.

The present disclosure also relates to a computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions, and the instructions, when executed by a processor, implement a method according to any one of the embodiments shown in fig. 1 or fig. 3.

Fig. 8 is a schematic flowchart of a keypoint detection method according to an embodiment of the disclosure. In some embodiments, the following keypoint detection method steps are performed by the keypoint detection apparatus.

In step 801, an image to be detected is acquired.

In step 802, a keypoint detection model trained by the keypoint detection model training method according to any embodiment of fig. 1 or fig. 3 is used to perform keypoint detection processing on an image to be detected, so as to obtain keypoints in the image to be detected.

Fig. 9 is a schematic structural diagram of a keypoint detection apparatus according to an embodiment of the present disclosure. As shown in fig. 9, the key point detecting apparatus includes an acquisition module 91 and a detection module 92.

The collecting module 91 is used for collecting an image to be detected.

The detection module 92 performs a keypoint detection process on the image to be detected by using the keypoint detection model trained by the keypoint detection model training method according to any embodiment of fig. 1 or fig. 3, so as to obtain keypoints in the image to be detected.

Fig. 10 is a schematic structural diagram of a keypoint detection apparatus according to another embodiment of the present disclosure. As shown in fig. 10, the key point detecting device includes a memory 1001, a processor 1002, a communication interface 1003, and a bus 1004. Fig. 10 differs from fig. 7 in that, in the embodiment shown in fig. 10, the processor 1002 is configured to perform the method according to any of the embodiments in fig. 8 based on instructions stored in the memory.

The present disclosure also relates to a computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions, and the instructions, when executed by a processor, implement the method according to any one of the embodiments in fig. 8.

In some embodiments, the functional unit modules described above can be implemented as a general purpose Processor, a Programmable Logic Controller (PLC), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable Logic device, discrete Gate or transistor Logic, discrete hardware components, or any suitable combination thereof for performing the functions described in this disclosure.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The description of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the disclosure in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the disclosure and the practical application, and to enable others of ordinary skill in the art to understand the disclosure for various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A method for training a key point detection model comprises the following steps:

searching a topological relation among nodes in the deep learning model in a preset search space;

extracting a corresponding feature map from a preset image;

processing the feature map of the preset image by using the deep learning model to obtain an output feature map;

processing the output feature map to generate a key point heat map of the preset image;

taking the difference between the position coordinates of the key points in the key point heat map and preset position coordinates as a loss function, and repeatedly executing the step of searching the topological relation among the nodes in the deep learning model in a preset search space according to the loss function until the loss function value meets a preset condition or reaches a preset cycle number;

and training the deep learning model by using training data to obtain a key point detection model.

2. The method of claim 1, wherein the deep learning model comprises a first deep learning submodel, a second deep learning submodel, and a third deep learning submodel;

the processing of the feature map of the preset image by using the deep learning model comprises the following steps:

processing the feature map of the preset image by using the first deep learning submodel to obtain a first feature map, wherein the first feature map and the feature map of the preset image have the same size;

processing the first feature map by using the second deep learning submodel to obtain a second feature map, wherein the size of the second feature map is smaller than that of the first feature map;

and performing fusion processing on the second feature map and the first feature map by using the third deep learning submodel to obtain the output feature map.

3. The method of claim 2, wherein the second deep learning submodel includes N transformation network models;

processing the first feature map with the second deep learning submodel includes:

processing the first feature map by using a 1 st transformation network model to obtain a 1 st output feature map, wherein the size of the 1 st output feature map is smaller than that of the first feature map;

and processing the (i-1) th output feature map by using the (i) th transformation network model to obtain an (i) th output feature map, wherein the size of the (i) th output feature map is smaller than that of the (i-1) th output result, and i is more than or equal to 2 and less than or equal to N.

4. The method of claim 3, wherein the third deep learning submodel includes N converged network models;

the fusion processing of the second feature map and the first feature map by using the third deep learning submodel comprises the following steps:

fusing the output characteristic diagram of the Nth conversion network model and the output characteristic diagram of the (N-1) th conversion network model by using the 1 st fusion network model to obtain a 1 st fusion characteristic diagram;

fusing the j-1 st fusion characteristic diagram and the output characteristic diagram of the N-j transformation network model by using the j-th fusion network model to obtain a j-th fusion characteristic diagram, wherein j is more than or equal to 2 and is less than or equal to N-1;

and fusing the (N-1) th fusion feature map and the first feature map by using the Nth fusion network model to obtain the output feature map.

5. The method of claim 4, wherein,

in the process of fusion processing of each fusion network model, in the received two feature graphs, the feature graph with the smaller size is used as a first feature graph to be processed, and the feature graph with the larger size is used as a second feature graph to be processed;

performing up-sampling on the first feature map to be processed to obtain a third feature map to be processed, wherein the third feature map to be processed and the second feature map to be processed have the same size;

and carrying out fusion processing on the third feature graph to be processed and the second feature graph to be processed.

6. The method of any one of claims 1-5,

the search space includes at least one of convolution, pooling, full concatenation, and batch normalization.

7. A keypoint detection model training device comprising:

the searching module is configured to search out a topological relation among nodes in the deep learning model in a preset searching space;

the characteristic extraction module is configured to extract a corresponding characteristic graph from a preset image;

the first processing module is configured to process the feature map of the preset image by using the deep learning model to obtain an output feature map;

a second processing module configured to process the output feature map to generate a keypoint heat map of the preset image;

a first training module, configured to use a difference between a key point position coordinate in the key point heat map and a preset position coordinate as a loss function, and instruct a search module to repeatedly execute an operation of searching out a topological relation between nodes in a deep learning model in a preset search space according to the loss function until a loss function value meets a preset condition or reaches a preset cycle number;

a second training module configured to train the deep learning model with training data to obtain a keypoint detection model.

8. A keypoint detection model training device comprising:

a memory configured to store instructions;

a processor coupled to the memory, the processor configured to perform implementing the method of any of claims 1-6 based on instructions stored by the memory.

9. A keypoint detection method comprising:

collecting an image to be detected;

and (3) carrying out key point detection processing on the image to be detected by using the key point detection model trained by the key point detection model training method of any one of claims 1-6 to obtain key points in the image to be detected.

10. A keypoint detection device comprising:

the acquisition module is configured to acquire an image to be detected;

a detection module configured to perform a keypoint detection process on the image to be detected by using the keypoint detection model trained by the keypoint detection model training method of any one of claims 1 to 6 to obtain keypoints in the image to be detected.

11. A keypoint detection device comprising:

a memory configured to store instructions;

a processor coupled to the memory, the processor configured to perform implementing the method of claim 9 based on instructions stored by the memory.

12. A computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions which, when executed by a processor, implement the method of any of claims 1-6, 9.