CN113468924A - Key point detection model training method and device and key point detection method and device - Google Patents

Key point detection model training method and device and key point detection method and device Download PDF

Info

Publication number
CN113468924A
CN113468924A CN202010243835.5A CN202010243835A CN113468924A CN 113468924 A CN113468924 A CN 113468924A CN 202010243835 A CN202010243835 A CN 202010243835A CN 113468924 A CN113468924 A CN 113468924A
Authority
CN
China
Prior art keywords
feature map
deep learning
preset
model
key point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010243835.5A
Other languages
Chinese (zh)
Other versions
CN113468924B (en
Inventor
鲍慊
刘武
梅涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202010243835.5A priority Critical patent/CN113468924B/en
Publication of CN113468924A publication Critical patent/CN113468924A/en
Application granted granted Critical
Publication of CN113468924B publication Critical patent/CN113468924B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure provides a key point detection model training method and device and a key point detection method and device. Searching a topological relation between nodes in the deep learning model in a preset search space by the key point detection model training device; processing the feature map of the preset image by using a deep learning model to obtain an output feature map; processing the output characteristic diagram to generate a key point heat map of a preset image; taking the difference between the position coordinates of the key points in the key point heat map and the preset position coordinates as a loss function, and repeatedly executing the step of searching the topological relation among the nodes in the deep learning model in a preset search space according to the loss function until the loss function value meets the preset condition or reaches the preset cycle number; and training the deep learning model by using the training data to obtain a key point detection model. The optimal key point detection result is provided by adjusting the topological relation among all the nodes in the deep learning model.

Description

Key point detection model training method and device and key point detection method and device
Technical Field
The disclosure relates to the field of information processing, and in particular to a method and a device for training a key point detection model and a method and a device for detecting key points.
Background
The human body key point detection is to obtain the key point position of a human body in an image or a video through a computer vision technology. In the related art, a network model based on deep learning is generally adopted to realize human body key point detection. These network models are pre-designed based on manual experience.
Disclosure of Invention
The inventor finds that, through research, because existing human body key point detection algorithms are network models designed based on manual experience, the design of the network models usually requires a great deal of network design experience and parameter adjustment experience, and usually requires a great deal of time and effort of designers. In addition, the network model designed according to manual experience and the corresponding key point detection task cannot be completely matched, so that for the key point detection task, the network model designed based on manual experience cannot provide an optimal key point detection result.
Accordingly, the present disclosure provides a keypoint detection model training scheme and a corresponding keypoint detection scheme. By dynamically adjusting the topological relation among the nodes in the deep learning model, the optimal key point detection result can be provided.
According to a first aspect of the embodiments of the present disclosure, there is provided a method for training a keypoint detection model, including: searching a topological relation among nodes in the deep learning model in a preset search space; extracting a corresponding feature map from a preset image; processing the feature map of the preset image by using the deep learning model to obtain an output feature map; processing the output feature map to generate a key point heat map of the preset image; taking the difference between the position coordinates of the key points in the key point heat map and preset position coordinates as a loss function, and repeatedly executing the step of searching the topological relation among the nodes in the deep learning model in a preset search space according to the loss function until the loss function value meets a preset condition or reaches a preset cycle number; and training the deep learning model by using training data to obtain a key point detection model.
In some embodiments, the deep learning model comprises a first deep learning submodel, a second deep learning submodel, and a third deep learning submodel; the processing of the feature map of the preset image by using the deep learning model comprises the following steps: processing a feature map of a preset image by using the first deep learning submodel to obtain a first feature map, wherein the first feature map and the feature map of the preset image have the same size; processing the first feature map by using the second deep learning submodel to obtain a second feature map, wherein the size of the second feature map is smaller than that of the first feature map; and performing fusion processing on the second feature map and the first feature map by using the third deep learning submodel to obtain the output feature map.
In some embodiments, the second deep learning submodel includes N transformation network models; processing the first feature map with the second deep learning submodel includes: processing the first feature map by using a 1 st transformation network model to obtain a 1 st output feature map, wherein the size of the 1 st output feature map is smaller than that of the first feature map; and processing the (i-1) th output feature map by using the (i) th transformation network model to obtain an (i) th output feature map, wherein the size of the (i) th output feature map is smaller than that of the (i-1) th output result, and i is more than or equal to 2 and less than or equal to N.
In some embodiments, the third deep learning submodel includes N converged network models; the fusion processing of the second feature map and the first feature map by using the third deep learning submodel comprises the following steps: fusing the output characteristic diagram of the Nth conversion network model and the output characteristic diagram of the (N-1) th conversion network model by using the 1 st fusion network model to obtain a 1 st fusion characteristic diagram; fusing the j-1 st fusion characteristic diagram and the output characteristic diagram of the N-j transformation network model by using the j-th fusion network model to obtain a j-th fusion characteristic diagram, wherein j is more than or equal to 2 and is less than or equal to N-1; and fusing the (N-1) th fusion feature map and the first feature map by using the Nth fusion network model to obtain the output feature map.
In some embodiments, in the process of entering the fusion processing, in the received two feature maps, each fusion network model takes the feature map with a smaller size as a first feature map to be processed, and takes the feature map with a larger size as a second feature map to be processed; performing up-sampling on the first feature map to be processed to obtain a third feature map to be processed, wherein the third feature map to be processed and the second feature map to be processed have the same size; and carrying out fusion processing on the third feature graph to be processed and the second feature graph to be processed.
In some embodiments, the search space comprises at least one of convolution, pooling, full concatenation, batch normalization.
According to a second aspect of the embodiments of the present disclosure, there is provided a keypoint detection model training device, including: the searching module is configured to search out a topological relation among nodes in the deep learning model in a preset searching space; the characteristic extraction module is configured to extract a corresponding characteristic graph from a preset image; the first processing module is configured to process the feature map of the preset image by using the deep learning model to obtain an output feature map; a second processing module configured to process the output feature map to generate a keypoint heat map of the preset image; a first training module, configured to use a difference between a key point position coordinate in the key point heat map and a preset position coordinate as a loss function, and instruct a search module to repeatedly execute an operation of searching out a topological relation between nodes in a deep learning model in a preset search space according to the loss function until a loss function value meets a preset condition or reaches a preset cycle number; a second training module configured to train the deep learning model with training data to obtain a keypoint detection model.
According to a third aspect of the embodiments of the present disclosure, there is provided a keypoint detection model training device, including: a memory configured to store instructions; a processor coupled to the memory, the processor configured to perform a method of performing keypoint detection model training as described in any of the above embodiments based on instructions stored in the memory.
According to a fourth aspect of the embodiments of the present disclosure, there is provided a keypoint detection method, including: collecting an image to be detected; and performing key point detection processing on the image to be detected by using the key point detection model trained by the key point detection model training method of any embodiment to obtain key points in the image to be detected.
According to a fifth aspect of the embodiments of the present disclosure, there is provided a keypoint detection apparatus, comprising: the acquisition module is configured to acquire an image to be detected; the detection module is configured to perform the keypoint detection processing on the image to be detected by using the keypoint detection model trained by the keypoint detection model training method according to any embodiment, so as to obtain the keypoints in the image to be detected.
According to a sixth aspect of the embodiments of the present disclosure, there is provided a keypoint detection apparatus comprising: a memory configured to store instructions; a processor coupled to the memory, the processor configured to perform a method of performing keypoint detection as described in any of the above embodiments based on instructions stored by the memory.
According to a seventh aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, in which computer instructions are stored, and when executed by a processor, the computer-readable storage medium implements the method according to any of the embodiments described above.
Other features of the present disclosure and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a schematic flow chart diagram illustrating a method for training a keypoint detection model according to an embodiment of the present disclosure;
FIG. 2 is a schematic structural diagram of a keypoint detection model according to an embodiment of the disclosure;
FIG. 3 is a schematic flow chart diagram illustrating a method for training a keypoint detection model according to another embodiment of the present disclosure;
FIG. 4 is a schematic structural diagram of a keypoint detection model according to another embodiment of the present disclosure;
FIG. 5 is a schematic structural diagram of a keypoint detection model according to another embodiment of the present disclosure;
FIG. 6 is a schematic structural diagram of a keypoint detection model training device according to an embodiment of the present disclosure;
FIG. 7 is a schematic structural diagram of a keypoint detection model training device according to another embodiment of the present disclosure;
FIG. 8 is a schematic flow chart diagram of a keypoint detection method according to an embodiment of the disclosure;
FIG. 9 is a schematic structural diagram of a keypoint detection apparatus according to an embodiment of the present disclosure;
fig. 10 is a schematic structural diagram of a keypoint detection apparatus according to another embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
The relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
Fig. 1 is a schematic flow chart of a method for training a keypoint detection model according to an embodiment of the disclosure. In some embodiments, the following steps of the keypoint detection model training method are performed by the keypoint detection model training apparatus. The corresponding keypoint detection model is shown in fig. 2.
In step 101, searching out a topological relation between nodes in the deep learning model in a preset search space.
In some embodiments, the search space comprises at least one of convolution, pooling, full concatenation, batch normalization.
It should be noted here that in deep learning, the neural network may be considered to be composed of many neuron nodes OP, the operation space of each OP includes convolution, pooling, full connection, batch normalization, etc., and each OP may include several inputs and outputs. The connection between any two OPs represents one of these operations. And selecting corresponding operation of each OP by searching in the search space, and determining a corresponding neural network architecture according to the topological relation among the nodes. In addition, by limiting the search space, the search speed can be increased.
Since how to perform the search itself is not the point of the invention of the present disclosure, it is not described here.
In step 102, a corresponding feature map is extracted from the preset image.
In some embodiments, the preset image is processed using a preset feature map extraction model (e.g., a convolutional neural network with two convolution kernels of 3 × 3 and a step size of 2) to obtain a feature map of the preset image.
In step 103, the feature map of the preset image is processed by using the deep learning model to obtain an output feature map.
By processing the feature map of the preset image for feature fusion, subsequent heatmap processing is facilitated.
At step 104, the output feature map is processed to generate a keypoint heat map of the preset image.
For example, the output feature maps are processed using a preset heat map generation model (e.g., a convolutional neural network with a convolution kernel of 1 × 1 and a step size of 1) to generate corresponding keypoint heat maps.
Here, it should be noted that the number of heatmaps is the same as the number of key points. For example, if there are 16 keypoints in a human skeleton, 16 heat maps are generated, one for each keypoint. And searching the position where the maximum probability is located on each heat map, namely, the position coordinate of the corresponding key point. By summarizing the coordinates of the keypoints obtained from the 16 heatmaps, the coordinates of the 16 keypoints in the human skeleton can be obtained.
Since heatmaps are not themselves the point of the present disclosure, they are not described here.
In step 105, the difference between the position coordinates of the key points in the key point heat map and the preset position coordinates is used as a loss function, and the step of searching the topological relation between the nodes in the deep learning model in the preset search space is repeatedly executed according to the loss function until the loss function value meets the preset condition or reaches the preset cycle number.
For example, if the loss function value is less than a predetermined threshold, the search process may be stopped. In addition, the search process may also be stopped when the number of cycles reaches a preset value.
In step 106, the deep learning model is trained using the training data to obtain a keypoint detection model.
And training the deep learning model determined through the search process by utilizing the training data so as to determine the corresponding weight parameters of the optimal deep learning model.
In the method for training the key point detection model provided by the embodiment of the disclosure, the deep learning model matched with the key point detection task can be obtained by optimizing the topological relation of the nodes in the deep learning model.
Fig. 3 is a schematic flowchart of a method for training a keypoint detection model according to another embodiment of the disclosure. In some embodiments, the following steps of the keypoint detection model training method are performed by the keypoint detection model training apparatus. The corresponding keypoint detection model is shown in fig. 4.
In step 301, a topological relation between nodes in the first deep learning submodel, the second deep learning submodel, and the third deep learning submodel is searched in a preset search space.
In some embodiments, the search space comprises at least one of convolution, pooling, full concatenation, batch normalization.
In step 302, a corresponding feature map is extracted from a preset image.
In some embodiments, the preset image is processed using a preset feature map extraction model (e.g., a convolutional neural network with two convolution kernels of 3 × 3 and a step size of 2) to obtain a feature map of the preset image.
In step 303, the feature map of the preset image is processed by using the first deep learning submodel to obtain a first feature map, and the size of the feature map of the first feature map is the same as that of the feature map of the preset image.
In step 304, the first feature map is processed by a second deep learning submodel to obtain a second feature map, wherein the size of the second feature map is smaller than that of the first feature map.
In step 305, the second feature map and the first feature map are fused by using a third deep learning submodel to obtain an output feature map.
It should be noted that, since the size of the second feature map is smaller than that of the first feature map, the resolution of the second feature map is lower than that of the first feature map. And by fusing feature maps with different resolutions, the subsequent key point heat map generation is facilitated.
In some embodiments, the third deep learning submodel performs upsampling on the second feature map to obtain a third feature map when performing the fusion process, and the third feature map and the first feature map have the same size. Then, the third feature map and the second feature map are subjected to fusion processing. Thereby, the best fusion result can be obtained.
Upsampling is achieved, for example, by using bilinear interpolation.
At step 306, the output feature map is processed to generate a keypoint heat map of the preset image.
Here, the output feature maps are processed using a heat map generation model (e.g., a convolutional neural network with a convolution kernel of 1 x 1 and step size of 1) to generate corresponding keypoint heat maps.
In step 307, the difference between the position coordinates of the key points in the key point heat map and the preset position coordinates is used as a loss function, and the step of searching the topological relation among the nodes in the first deep learning sub-model, the second deep learning sub-model and the third deep learning sub-model in the preset search space is repeatedly executed according to the loss function until the loss function value meets the preset condition or reaches the preset cycle number.
In step 308, the first deep learning submodel, the second deep learning submodel, and the third deep learning submodel are trained using the training data to obtain a keypoint detection model.
And training the first deep learning submodel, the second deep learning submodel and the third deep learning submodel determined through the searching process by utilizing the training data so as to determine the optimal corresponding weight parameters of the first deep learning submodel, the second deep learning submodel and the third deep learning submodel.
In some embodiments, the second deep learning submodel includes N transformation network models, and the third deep learning submodel includes N fusion network models, where N is a positive integer.
In the second deep learning submodel, the 1 st transformation network model is used for processing the first feature map to obtain a 1 st output feature map, and the size of the 1 st output feature map is smaller than that of the first feature map. And processing the (i-1) th output feature map by using the (i) th transformation network model to obtain an (i) th output feature map, wherein the size of the (i) th output feature map is smaller than that of the (i-1) th output result, and i is more than or equal to 2 and less than or equal to N. And in the third deep learning submodel, fusing the output characteristic diagram of the Nth conversion network model and the output characteristic diagram of the (N-1) th conversion network model by using the 1 st fusion network model to obtain the 1 st fusion characteristic diagram. And fusing the j-1 st fusion characteristic diagram and the output characteristic diagram of the N-j transformation network model by using the j-th fusion network model to obtain a j-th fusion characteristic diagram, wherein j is more than or equal to 2 and is less than or equal to N-1. And fusing the (N-1) th fusion feature map and the first feature map by using the Nth fusion network model to obtain an output feature map.
In some embodiments, in the process of entering the fusion process, in the two received feature maps, the feature map with the smaller size is used as the first feature map to be processed, and the feature map with the larger size is used as the second feature map to be processed. And performing up-sampling on the first characteristic diagram to be processed to obtain a third characteristic diagram to be processed, wherein the third characteristic diagram to be processed and the second characteristic diagram to be processed have the same size. And then carrying out fusion processing on the third feature graph to be processed and the second feature graph to be processed.
Fig. 5 is a schematic structural diagram of a keypoint detection model according to another embodiment of the present disclosure. As an example, in the embodiment shown in fig. 5, the second deep learning submodel includes 3 transformation network models, and the third deep learning submodel includes 3 fusion network models.
In the second deep learning submodel, the first feature map is processed by using the transformation network model 1 to obtain a 1 st output feature map, and the size of the 1 st output feature map is smaller than that of the first feature map. And processing the 1 st output characteristic diagram by using the transformation network model 2 to obtain a 2 nd output characteristic diagram, wherein the size of the 2 nd output characteristic diagram is smaller than that of the 1 st output result. And processing the 2 nd output characteristic diagram by using the transformation network model 3 to obtain a 3 rd output characteristic diagram, wherein the size of the 3 rd output characteristic diagram is smaller than that of the 2 nd output result.
In the third deep learning submodel, the output characteristic diagram of the transformation network model 3 and the output characteristic diagram of the transformation network model 2 are fused by using the fusion network model 1 to obtain a 1 st fusion characteristic diagram. And fusing the 1 st fusion characteristic diagram and the output characteristic diagram of the transformation network model 1 by using the fusion network model 2 to obtain a 2 nd fusion characteristic diagram. And fusing the 2 nd fusion feature map and the first feature map by using the 3 rd fusion network model to obtain an output feature map.
Fig. 6 is a schematic structural diagram of a keypoint detection model training device according to an embodiment of the present disclosure. As shown in fig. 6, the key point detection model training apparatus includes a search module 61, a feature extraction module 62, a first processing module 63, a second processing module 64, a first training module 65, and a second training module 66. The corresponding keypoint detection model is shown in fig. 2.
The searching module 61 searches the topological relation between the nodes in the deep learning model in a preset search space.
In some embodiments, the search space comprises at least one of convolution, pooling, full concatenation, batch normalization.
It should be noted here that in deep learning, the neural network may be considered to be composed of many neuron nodes OP, the operation space of each OP includes convolution, pooling, full connection, batch normalization, etc., and each OP may include several inputs and outputs. The connection between any two OPs represents one of these operations. And selecting corresponding operation of each OP by searching in the search space, and determining a corresponding neural network architecture according to the topological relation among the nodes. In addition, by limiting the search space, the search speed can be increased.
The feature extraction module 62 extracts a corresponding feature map from the preset image.
In some embodiments, the preset image is processed using a preset feature map extraction model (e.g., a convolutional neural network with two convolution kernels of 3 × 3 and a step size of 2) to obtain a feature map of the preset image.
The first processing module 63 processes the feature map of the preset image by using the deep learning model to obtain an output feature map. By processing the feature map of the preset image, the subsequent heat map processing is facilitated through feature fusion.
The second processing module 64 processes the output feature map to generate a keypoint heat map of the preset image.
For example, the output feature maps are processed using a preset heat map generation model (e.g., a convolutional neural network with a convolution kernel of 1 × 1 and a step size of 1) to generate corresponding keypoint heat maps.
Here, it should be noted that the number of heatmaps is the same as the number of key points. For example, if there are 16 keypoints in a human skeleton, 16 heat maps are generated, one for each keypoint. And searching the position where the maximum probability is located on each heat map, namely, the position coordinate of the corresponding key point. By summarizing the coordinates of the keypoints obtained from the 16 heatmaps, the coordinates of the 16 keypoints in the human skeleton can be obtained.
The first training module 65 uses the difference between the position coordinates of the key points in the key point heatmap and the preset position coordinates as a loss function, and instructs the search module 61 to repeatedly perform the operation of searching out the topological relation between the nodes in the deep learning model in the preset search space according to the loss function until the loss function value satisfies the preset condition or reaches the preset cycle number.
For example, if the loss function value is less than a predetermined threshold, the search process may be stopped. In addition, the search process may also be stopped when the number of cycles reaches a preset value.
The second training module 66 trains the deep learning model with the training data to obtain a keypoint detection model.
And training the deep learning model determined through the search process by utilizing the training data so as to determine the corresponding weight parameters of the optimal deep learning model.
In some embodiments, the corresponding keypoint detection model is shown in FIG. 4.
The searching module 61 searches a preset search space for a topological relation among nodes in the first deep learning submodel, the second deep learning submodel and the third deep learning submodel.
The first processing module 63 processes the feature map of the preset image by using the first deep learning submodel to obtain a first feature map. The first feature map and the feature map of the preset image have the same size. And processing the first feature map by using a second deep learning submodel to obtain a second feature map. The size of the second feature map is smaller than the size of the first feature map. And then, carrying out fusion processing on the second feature map and the first feature map by using a third deep learning submodel to obtain an output feature map.
In some embodiments, the first processing module 63 performs up-sampling on the second feature map to obtain a third feature map when performing the fusion process using the third deep learning submodel, and the third feature map and the first feature map have the same size. Then, the third feature map and the second feature map are subjected to fusion processing. Thereby, the best fusion result can be obtained.
The first training module 65 repeatedly executes an operation of searching the topological relation among the nodes in the first deep learning submodel, the second deep learning submodel, and the third deep learning submodel in a preset search space according to the loss function instruction search module 61 until the loss function value satisfies a preset condition or reaches a preset cycle number.
The second training module 66 trains the first deep learning submodel, the second deep learning submodel, and the third deep learning submodel with the training data to obtain a keypoint detection model.
In some embodiments, the second deep learning submodel includes N transformation network models and the third deep learning submodel includes N fusion network models.
The first processing module 63 processes the first feature map by using the 1 st transformation network model to obtain a 1 st output feature map, wherein the size of the 1 st output feature map is smaller than that of the first feature map. The first processing module 63 processes the i-1 th output feature map by using the i-th transformation network model to obtain the i-th output feature map, wherein the size of the i-th output feature map is smaller than that of the i-1 th output result, and i is greater than or equal to 2 and less than or equal to N.
The first processing module 63 further fuses the output feature map of the nth transform network model and the output feature map of the N-1 st transform network model by using the 1 st fusion network model to obtain a 1 st fusion feature map. The first processing module 63 further fuses the j-1 th fusion feature map and the output feature map of the N-j transformation network model by using the j-th fusion network model to obtain the j-th fusion feature map, wherein j is greater than or equal to 2 and is less than or equal to N-1. The first processing module 63 fuses the N-1 th fusion feature map and the first feature map by using the nth fusion network model to obtain an output feature map.
In some embodiments, in the process of entering the fusion processing by using each fusion network model, the first processing module 63 uses, as the first to-be-processed feature map, the feature map with a smaller size and uses the feature map with a larger size as the second to-be-processed feature map in the two received feature maps, performs up-sampling on the first to-be-processed feature map to obtain a third to-be-processed feature map, where the third to-be-processed feature map and the second to-be-processed feature map have the same size, and performs the fusion processing on the third to-be-processed feature map and the second to-be-processed feature map.
In some embodiments, the corresponding keypoint detection model is shown in FIG. 5.
The first processing module 63 processes the first feature map by using the transform network model 1 to obtain a 1 st output feature map, wherein the size of the 1 st output feature map is smaller than that of the first feature map. The first processing module 63 processes the 1 st output feature map by using the transformation network model 2 to obtain a 2 nd output feature map, wherein the size of the 2 nd output feature map is smaller than that of the 1 st output result. The first processing module 63 processes the 2 nd output feature map by using the transformation network model 3 to obtain a 3 rd output feature map, wherein the size of the 3 rd output feature map is smaller than that of the 2 nd output result.
In addition, the first processing module 63 fuses the output feature map of the transform network model 3 and the output feature map of the transform network model 2 by using the fusion network model 1 to obtain a 1 st fusion feature map. The first processing module 63 fuses the 1 st fused feature map and the output feature map of the transform network model 1 by using the fused network model 2 to obtain a 2 nd fused feature map. The first processing module 63 fuses the 2 nd fused feature map and the first feature map by using the 3 rd fused network model to obtain an output feature map.
Fig. 7 is a schematic structural diagram of a keypoint detection model training device according to another embodiment of the present disclosure. As shown in fig. 7, the training device comprises a memory 71 and a processor 72.
The memory 71 is used for storing instructions, the processor 72 is coupled to the memory 71, and the processor 72 is configured to execute the method according to any one of the embodiments in fig. 1 or fig. 3 based on the instructions stored in the memory.
As shown in fig. 7, the apparatus further includes a communication interface 73 for information interaction with other devices. Meanwhile, the device also comprises a bus 74, and the processor 72, the communication interface 73 and the memory 71 are communicated with each other through the bus 74.
The memory 71 may comprise a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 71 may also be a memory array. The storage 71 may also be partitioned and the blocks may be combined into virtual volumes according to certain rules.
Further, the processor 72 may be a central processing unit CPU, or may be an application specific integrated circuit ASIC, or one or more integrated circuits configured to implement embodiments of the present disclosure.
The present disclosure also relates to a computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions, and the instructions, when executed by a processor, implement a method according to any one of the embodiments shown in fig. 1 or fig. 3.
Fig. 8 is a schematic flowchart of a keypoint detection method according to an embodiment of the disclosure. In some embodiments, the following keypoint detection method steps are performed by the keypoint detection apparatus.
In step 801, an image to be detected is acquired.
In step 802, a keypoint detection model trained by the keypoint detection model training method according to any embodiment of fig. 1 or fig. 3 is used to perform keypoint detection processing on an image to be detected, so as to obtain keypoints in the image to be detected.
Fig. 9 is a schematic structural diagram of a keypoint detection apparatus according to an embodiment of the present disclosure. As shown in fig. 9, the key point detecting apparatus includes an acquisition module 91 and a detection module 92.
The collecting module 91 is used for collecting an image to be detected.
The detection module 92 performs a keypoint detection process on the image to be detected by using the keypoint detection model trained by the keypoint detection model training method according to any embodiment of fig. 1 or fig. 3, so as to obtain keypoints in the image to be detected.
Fig. 10 is a schematic structural diagram of a keypoint detection apparatus according to another embodiment of the present disclosure. As shown in fig. 10, the key point detecting device includes a memory 1001, a processor 1002, a communication interface 1003, and a bus 1004. Fig. 10 differs from fig. 7 in that, in the embodiment shown in fig. 10, the processor 1002 is configured to perform the method according to any of the embodiments in fig. 8 based on instructions stored in the memory.
The present disclosure also relates to a computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions, and the instructions, when executed by a processor, implement the method according to any one of the embodiments in fig. 8.
In some embodiments, the functional unit modules described above can be implemented as a general purpose Processor, a Programmable Logic Controller (PLC), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable Logic device, discrete Gate or transistor Logic, discrete hardware components, or any suitable combination thereof for performing the functions described in this disclosure.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The description of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the disclosure in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the disclosure and the practical application, and to enable others of ordinary skill in the art to understand the disclosure for various embodiments with various modifications as are suited to the particular use contemplated.

Claims (12)

1. A method for training a key point detection model comprises the following steps:
searching a topological relation among nodes in the deep learning model in a preset search space;
extracting a corresponding feature map from a preset image;
processing the feature map of the preset image by using the deep learning model to obtain an output feature map;
processing the output feature map to generate a key point heat map of the preset image;
taking the difference between the position coordinates of the key points in the key point heat map and preset position coordinates as a loss function, and repeatedly executing the step of searching the topological relation among the nodes in the deep learning model in a preset search space according to the loss function until the loss function value meets a preset condition or reaches a preset cycle number;
and training the deep learning model by using training data to obtain a key point detection model.
2. The method of claim 1, wherein the deep learning model comprises a first deep learning submodel, a second deep learning submodel, and a third deep learning submodel;
the processing of the feature map of the preset image by using the deep learning model comprises the following steps:
processing the feature map of the preset image by using the first deep learning submodel to obtain a first feature map, wherein the first feature map and the feature map of the preset image have the same size;
processing the first feature map by using the second deep learning submodel to obtain a second feature map, wherein the size of the second feature map is smaller than that of the first feature map;
and performing fusion processing on the second feature map and the first feature map by using the third deep learning submodel to obtain the output feature map.
3. The method of claim 2, wherein the second deep learning submodel includes N transformation network models;
processing the first feature map with the second deep learning submodel includes:
processing the first feature map by using a 1 st transformation network model to obtain a 1 st output feature map, wherein the size of the 1 st output feature map is smaller than that of the first feature map;
and processing the (i-1) th output feature map by using the (i) th transformation network model to obtain an (i) th output feature map, wherein the size of the (i) th output feature map is smaller than that of the (i-1) th output result, and i is more than or equal to 2 and less than or equal to N.
4. The method of claim 3, wherein the third deep learning submodel includes N converged network models;
the fusion processing of the second feature map and the first feature map by using the third deep learning submodel comprises the following steps:
fusing the output characteristic diagram of the Nth conversion network model and the output characteristic diagram of the (N-1) th conversion network model by using the 1 st fusion network model to obtain a 1 st fusion characteristic diagram;
fusing the j-1 st fusion characteristic diagram and the output characteristic diagram of the N-j transformation network model by using the j-th fusion network model to obtain a j-th fusion characteristic diagram, wherein j is more than or equal to 2 and is less than or equal to N-1;
and fusing the (N-1) th fusion feature map and the first feature map by using the Nth fusion network model to obtain the output feature map.
5. The method of claim 4, wherein,
in the process of fusion processing of each fusion network model, in the received two feature graphs, the feature graph with the smaller size is used as a first feature graph to be processed, and the feature graph with the larger size is used as a second feature graph to be processed;
performing up-sampling on the first feature map to be processed to obtain a third feature map to be processed, wherein the third feature map to be processed and the second feature map to be processed have the same size;
and carrying out fusion processing on the third feature graph to be processed and the second feature graph to be processed.
6. The method of any one of claims 1-5,
the search space includes at least one of convolution, pooling, full concatenation, and batch normalization.
7. A keypoint detection model training device comprising:
the searching module is configured to search out a topological relation among nodes in the deep learning model in a preset searching space;
the characteristic extraction module is configured to extract a corresponding characteristic graph from a preset image;
the first processing module is configured to process the feature map of the preset image by using the deep learning model to obtain an output feature map;
a second processing module configured to process the output feature map to generate a keypoint heat map of the preset image;
a first training module, configured to use a difference between a key point position coordinate in the key point heat map and a preset position coordinate as a loss function, and instruct a search module to repeatedly execute an operation of searching out a topological relation between nodes in a deep learning model in a preset search space according to the loss function until a loss function value meets a preset condition or reaches a preset cycle number;
a second training module configured to train the deep learning model with training data to obtain a keypoint detection model.
8. A keypoint detection model training device comprising:
a memory configured to store instructions;
a processor coupled to the memory, the processor configured to perform implementing the method of any of claims 1-6 based on instructions stored by the memory.
9. A keypoint detection method comprising:
collecting an image to be detected;
and (3) carrying out key point detection processing on the image to be detected by using the key point detection model trained by the key point detection model training method of any one of claims 1-6 to obtain key points in the image to be detected.
10. A keypoint detection device comprising:
the acquisition module is configured to acquire an image to be detected;
a detection module configured to perform a keypoint detection process on the image to be detected by using the keypoint detection model trained by the keypoint detection model training method of any one of claims 1 to 6 to obtain keypoints in the image to be detected.
11. A keypoint detection device comprising:
a memory configured to store instructions;
a processor coupled to the memory, the processor configured to perform implementing the method of claim 9 based on instructions stored by the memory.
12. A computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions which, when executed by a processor, implement the method of any of claims 1-6, 9.
CN202010243835.5A 2020-03-31 2020-03-31 Method and device for training key point detection model, and method and device for detecting key point Active CN113468924B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010243835.5A CN113468924B (en) 2020-03-31 2020-03-31 Method and device for training key point detection model, and method and device for detecting key point

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010243835.5A CN113468924B (en) 2020-03-31 2020-03-31 Method and device for training key point detection model, and method and device for detecting key point

Publications (2)

Publication Number Publication Date
CN113468924A true CN113468924A (en) 2021-10-01
CN113468924B CN113468924B (en) 2024-06-18

Family

ID=77866085

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010243835.5A Active CN113468924B (en) 2020-03-31 2020-03-31 Method and device for training key point detection model, and method and device for detecting key point

Country Status (1)

Country Link
CN (1) CN113468924B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017133009A1 (en) * 2016-02-04 2017-08-10 广州新节奏智能科技有限公司 Method for positioning human joint using depth image of convolutional neural network
CN109753910A (en) * 2018-12-27 2019-05-14 北京字节跳动网络技术有限公司 Crucial point extracting method, the training method of model, device, medium and equipment
US20190147298A1 (en) * 2017-11-14 2019-05-16 Magic Leap, Inc. Meta-learning for multi-task learning for neural networks
CN109948526A (en) * 2019-03-18 2019-06-28 北京市商汤科技开发有限公司 Image processing method and device, detection device and storage medium
CN110084221A (en) * 2019-05-08 2019-08-02 南京云智控产业技术研究院有限公司 A kind of serializing face critical point detection method of the tape relay supervision based on deep learning
CN110309706A (en) * 2019-05-06 2019-10-08 深圳市华付信息技术有限公司 Face critical point detection method, apparatus, computer equipment and storage medium
CN110532981A (en) * 2019-09-03 2019-12-03 北京字节跳动网络技术有限公司 Human body key point extracting method, device, readable storage medium storing program for executing and equipment
WO2020010979A1 (en) * 2018-07-10 2020-01-16 腾讯科技(深圳)有限公司 Method and apparatus for training model for recognizing key points of hand, and method and apparatus for recognizing key points of hand
CN110705563A (en) * 2019-09-07 2020-01-17 创新奇智(重庆)科技有限公司 Industrial part key point detection method based on deep learning
CN110728359A (en) * 2019-10-10 2020-01-24 北京百度网讯科技有限公司 Method, device, equipment and storage medium for searching model structure
EP3605394A1 (en) * 2018-08-03 2020-02-05 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for recognizing body movement

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017133009A1 (en) * 2016-02-04 2017-08-10 广州新节奏智能科技有限公司 Method for positioning human joint using depth image of convolutional neural network
US20190147298A1 (en) * 2017-11-14 2019-05-16 Magic Leap, Inc. Meta-learning for multi-task learning for neural networks
WO2020010979A1 (en) * 2018-07-10 2020-01-16 腾讯科技(深圳)有限公司 Method and apparatus for training model for recognizing key points of hand, and method and apparatus for recognizing key points of hand
EP3605394A1 (en) * 2018-08-03 2020-02-05 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for recognizing body movement
CN109753910A (en) * 2018-12-27 2019-05-14 北京字节跳动网络技术有限公司 Crucial point extracting method, the training method of model, device, medium and equipment
CN109948526A (en) * 2019-03-18 2019-06-28 北京市商汤科技开发有限公司 Image processing method and device, detection device and storage medium
CN110309706A (en) * 2019-05-06 2019-10-08 深圳市华付信息技术有限公司 Face critical point detection method, apparatus, computer equipment and storage medium
CN110084221A (en) * 2019-05-08 2019-08-02 南京云智控产业技术研究院有限公司 A kind of serializing face critical point detection method of the tape relay supervision based on deep learning
CN110532981A (en) * 2019-09-03 2019-12-03 北京字节跳动网络技术有限公司 Human body key point extracting method, device, readable storage medium storing program for executing and equipment
CN110705563A (en) * 2019-09-07 2020-01-17 创新奇智(重庆)科技有限公司 Industrial part key point detection method based on deep learning
CN110728359A (en) * 2019-10-10 2020-01-24 北京百度网讯科技有限公司 Method, device, equipment and storage medium for searching model structure

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HE KM 等: "Deep residual learning for image recognition", PROCEEDINGS OF 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 31 December 2016 (2016-12-31) *
孟令军;王静波;: "基于Pytorch和Opencv的人脸关键点检测", 电视技术, no. 14, 25 July 2019 (2019-07-25) *
范叶平;李玉;杨德胜;万涛;马冬;李帷韬;: "基于深度集成学习的人脸智能反馈认知方法", 电子技术应用, no. 05, 6 May 2019 (2019-05-06) *

Also Published As

Publication number Publication date
CN113468924B (en) 2024-06-18

Similar Documents

Publication Publication Date Title
CN111160375B (en) Three-dimensional key point prediction and deep learning model training method, device and equipment
CN111179419B (en) Three-dimensional key point prediction and deep learning model training method, device and equipment
CN109740534B (en) Image processing method, device and processing equipment
CN112990297A (en) Training method, application method and device of multi-mode pre-training model
CN110348447B (en) Multi-model integrated target detection method with abundant spatial information
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN111985414B (en) Joint position determining method and device
CN110782430A (en) Small target detection method and device, electronic equipment and storage medium
CN105096304B (en) The method of estimation and equipment of a kind of characteristics of image
CN114995729A (en) Voice drawing method and device and computer equipment
WO2022213395A1 (en) Light-weighted target detection method and device, and storage medium
US10713479B2 (en) Motion recognition method and motion recognition device for recognizing motion of user received via NUI device by comparing with preset comparison target information
CN110728359B (en) Method, device, equipment and storage medium for searching model structure
CN113468924B (en) Method and device for training key point detection model, and method and device for detecting key point
CN111738086A (en) Composition method and system for point cloud segmentation and point cloud segmentation system and device
CN112801045B (en) Text region detection method, electronic equipment and computer storage medium
Liu et al. SSD small object detection algorithm based on feature enhancement and sample selection
CN114612758A (en) Target detection method based on deep grouping separable convolution
CN110705695B (en) Method, device, equipment and storage medium for searching model structure
CN111325343B (en) Neural network determination, target detection and intelligent driving control method and device
Shi et al. Application research of cnn accelerator design based on FPGA in ADAS
CN109409226B (en) Finger vein image quality evaluation method and device based on cascade optimization CNN
CN116420174A (en) Full scale convolution for convolutional neural networks
CN112528899A (en) Image salient object detection method and system based on implicit depth information recovery
CN111782837A (en) Image retrieval method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant