CN113468924B - Method and device for training key point detection model, and method and device for detecting key point - Google Patents

Method and device for training key point detection model, and method and device for detecting key point Download PDF

Info

Publication number
CN113468924B
CN113468924B CN202010243835.5A CN202010243835A CN113468924B CN 113468924 B CN113468924 B CN 113468924B CN 202010243835 A CN202010243835 A CN 202010243835A CN 113468924 B CN113468924 B CN 113468924B
Authority
CN
China
Prior art keywords
feature map
model
deep learning
preset
key point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010243835.5A
Other languages
Chinese (zh)
Other versions
CN113468924A (en
Inventor
鲍慊
刘武
梅涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202010243835.5A priority Critical patent/CN113468924B/en
Publication of CN113468924A publication Critical patent/CN113468924A/en
Application granted granted Critical
Publication of CN113468924B publication Critical patent/CN113468924B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure provides a key point detection model training method and device and a key point detection method and device. The key point detection model training device searches topological relations among all nodes in the deep learning model in a preset search space; processing the feature map of the preset image by using a deep learning model to obtain an output feature map; processing the output feature map to generate a key point heat map of a preset image; taking the difference between the key point position coordinates and the preset position coordinates in the key point heat map as a loss function, and repeatedly executing the step of searching out the topological relation among all nodes in the deep learning model in a preset search space according to the loss function until the loss function value meets a preset condition or reaches a preset circulation number; training the deep learning model by using training data to obtain a key point detection model. The present disclosure provides optimal keypoint detection results by adjusting the topological relationship between nodes in a deep learning model.

Description

Method and device for training key point detection model, and method and device for detecting key point
Technical Field
The disclosure relates to the field of information processing, and in particular relates to a method and a device for training a key point detection model and a method and a device for detecting a key point.
Background
The human body key point detection is to obtain the key point position of the human body in an image or a video through a computer vision technology. In the related art, a network model based on deep learning is generally adopted to realize human body key point detection. These network models are designed in advance based on human experience.
Disclosure of Invention
The inventor finds through research that, because the existing human body key point detection algorithm is a network model designed based on manual experience, the design of the network model generally requires a great deal of network design experience and parameter adjustment experience, and generally requires great time and effort of a designer. In addition, the network model designed according to the manual experience cannot be completely matched with the corresponding key point detection task, so that the network model designed based on the manual experience cannot provide an optimal key point detection result for the key point detection task.
Accordingly, the present disclosure provides a keypoint detection model training scheme and a corresponding keypoint detection scheme. By dynamically adjusting the topological relation among the nodes in the deep learning model, an optimal key point detection result can be provided.
According to a first aspect of an embodiment of the present disclosure, there is provided a keypoint detection model training method, including: searching out topological relations among all nodes in the deep learning model in a preset search space; extracting a corresponding feature map from a preset image; processing the feature map of the preset image by using the deep learning model to obtain an output feature map; processing the output feature map to generate a key point heat map of the preset image; taking the difference between the key point position coordinates and the preset position coordinates in the key point heat map as a loss function, and repeatedly executing the step of searching out the topological relation among the nodes in the deep learning model in a preset search space according to the loss function until the loss function value meets a preset condition or reaches a preset cycle number; and training the deep learning model by using training data to obtain a key point detection model.
In some embodiments, the deep learning model includes a first deep learning sub-model, a second deep learning sub-model, and a third deep learning sub-model; the processing of the feature map of the preset image by using the deep learning model comprises the following steps: processing a feature map of a preset image by using the first deep learning sub-model to obtain a first feature map, wherein the first feature map and the feature map of the preset image have the same size; processing the first feature map by using the second deep learning sub-model to obtain a second feature map, wherein the size of the second feature map is smaller than that of the first feature map; and carrying out fusion processing on the second feature map and the first feature map by using the third deep learning submodel so as to obtain the output feature map.
In some embodiments, the second deep learning sub-model includes N transformation network models; processing the first feature map using the second deep learning sub-model includes: processing the first feature map by using a1 st transformation network model to obtain a1 st output feature map, wherein the size of the 1 st output feature map is smaller than that of the first feature map; and processing the ith-1 output characteristic diagram by utilizing the ith transformation network model to obtain the ith output characteristic diagram, wherein the size of the ith output characteristic diagram is smaller than that of the ith-1 output result, and i is more than or equal to 2 and less than or equal to N.
In some embodiments, the third deep learning sub-model includes N fused network models; the fusing of the second feature map and the first feature map using the third deep learning sub-model includes: fusing the output characteristic diagram of the Nth transformation network model and the output characteristic diagram of the N-1 th transformation network model by using the 1 st fusion network model to obtain a 1 st fusion characteristic diagram; fusing the j-1 th fused feature map and the output feature map of the N-j transform network model by utilizing the j-th fused network model to obtain a j-th fused feature map, wherein j is more than or equal to 2 and less than or equal to N-1; and fusing the N-1 fusion feature map and the first feature map by using the N fusion network model to obtain the output feature map.
In some embodiments, in the process of entering the fusion process, each fusion network model takes a feature map with smaller size as a first feature map to be processed and takes a feature map with larger size as a second feature map to be processed in the received two feature maps; up-sampling the first to-be-processed feature map to obtain a third to-be-processed feature map, wherein the third to-be-processed feature map and the second to-be-processed feature map are the same in size; and carrying out fusion processing on the third to-be-processed feature map and the second to-be-processed feature map.
In some embodiments, the search space includes at least one of convolution, pooling, full join, batch normalization.
According to a second aspect of the embodiments of the present disclosure, there is provided a keypoint detection model training device, including: the searching module is configured to search out topological relations among all nodes in the deep learning model in a preset searching space; the feature extraction module is configured to extract a corresponding feature map from a preset image; the first processing module is configured to process the feature map of the preset image by using the deep learning model so as to obtain an output feature map; the second processing module is configured to process the output feature map to generate a key point heat map of the preset image; the first training module is configured to take the difference between the coordinates of the key points in the key point heat map and the coordinates of the preset positions as a loss function, and instruct the searching module to repeatedly execute the operation of searching out the topological relation among the nodes in the deep learning model in the preset searching space according to the loss function until the loss function value meets the preset condition or reaches the preset cycle number; and the second training module is configured to train the deep learning model by using training data so as to obtain a key point detection model.
According to a third aspect of embodiments of the present disclosure, there is provided a keypoint detection model training device, including: a memory configured to store instructions; a processor coupled to the memory, the processor configured to execute instructions stored in the memory to implement the keypoint detection model training method as described in any of the embodiments above.
According to a fourth aspect of embodiments of the present disclosure, there is provided a keypoint detection method, including: collecting an image to be detected; and performing key point detection processing on the image to be detected by using the key point detection model trained by the key point detection model training method according to any embodiment, so as to obtain the key points in the image to be detected.
According to a fifth aspect of embodiments of the present disclosure, there is provided a keypoint detection apparatus, comprising: the acquisition module is configured to acquire an image to be detected; the detection module is configured to perform a keypoint detection process on the image to be detected by using the keypoint detection model trained by the keypoint detection model training method according to any one of the embodiments, so as to obtain the keypoints in the image to be detected.
According to a sixth aspect of the embodiments of the present disclosure, there is provided a keypoint detection apparatus, including: a memory configured to store instructions; a processor coupled to the memory, the processor configured to implement the keypoint detection method as described in any of the embodiments above based on execution of instructions stored by the memory.
According to a seventh aspect of embodiments of the present disclosure, there is provided a computer readable storage medium, wherein the computer readable storage medium stores computer instructions which, when executed by a processor, implement a method as referred to in any of the embodiments above.
Other features of the present disclosure and its advantages will become apparent from the following detailed description of exemplary embodiments of the disclosure, which proceeds with reference to the accompanying drawings.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, the drawings that are required for the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present disclosure, and that other drawings may be obtained according to these drawings without inventive faculty for a person skilled in the art.
FIG. 1 is a flow chart of a method for training a keypoint detection model in accordance with one embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a key point detection model according to an embodiment of the present disclosure;
FIG. 3 is a flow chart of a method for training a keypoint detection model according to another embodiment of the disclosure;
FIG. 4 is a schematic diagram of a key point detection model according to another embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a key point detection model according to another embodiment of the present disclosure;
FIG. 6 is a schematic structural diagram of a training device for a keypoint detection model according to an embodiment of the disclosure;
FIG. 7 is a schematic structural diagram of a training device for a keypoint detection model according to another embodiment of the disclosure;
FIG. 8 is a flow chart of a method for keypoint detection according to an embodiment of the disclosure;
FIG. 9 is a schematic diagram of a key point detecting device according to an embodiment of the disclosure;
fig. 10 is a schematic structural diagram of a key point detection device according to another embodiment of the present disclosure.
Detailed Description
The following description of the technical solutions in the embodiments of the present disclosure will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, not all embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses. All other embodiments, which can be made by one of ordinary skill in the art without inventive effort, based on the embodiments in this disclosure are intended to be within the scope of this disclosure.
The relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless it is specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but should be considered part of the specification where appropriate.
In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
Fig. 1 is a flow chart of a method for training a keypoint detection model according to an embodiment of the disclosure. In some embodiments, the following keypoint detection model training method steps are performed by the keypoint detection model training device. The corresponding keypoint detection model is shown in fig. 2.
In step 101, a topological relation among nodes in the deep learning model is searched out in a preset search space.
In some embodiments, the search space includes at least one of convolution, pooling, full join, batch normalization.
It should be noted here that in deep learning, a neural network may be considered to be composed of many neuron nodes OP, and the operation space of each OP includes convolution, pooling, full-connection, batch normalization, and the like, and each OP may include several inputs and outputs. The manner of connection between any two OPs represents one of these operations. By searching in the search space to select the respective operation of each OP, the respective neural network architecture is determined from the topological relationship between the nodes. In addition, by limiting the search space, the search speed can be increased.
Since how to perform the search itself is not the point of the invention of the present disclosure, it is not described here.
In step 102, a corresponding feature map is extracted from a preset image.
In some embodiments, the preset image is processed using a preset feature map extraction model (e.g., by two convolutional neural networks of 3*3 with step size 2) to obtain a feature map of the preset image.
In step 103, the feature map of the preset image is processed by using the deep learning model to obtain an output feature map.
The subsequent heat map (heatmap) processing is facilitated by processing the feature map of the preset image, such as by feature fusion.
In step 104, the output feature map is processed to generate a keypoint heat map of the preset image.
For example, the output feature map is processed using a preset heat map generation model (e.g., a convolutional neural network with a convolutional kernel 1*1 and a step size of 1) to generate a corresponding keypoint heat map.
Here, the number of heat maps is the same as the number of key points. For example, if there are 16 key points in the skeleton of a human body, 16 heat maps are generated, and each heat map corresponds to one key point. The position of the maximum probability is found on each heat map, namely the position coordinate as the corresponding key point. By summarizing the position coordinates of the key points obtained by the 16 heat maps, the position coordinates of 16 key points in the human skeleton can be obtained.
Since the heat map itself is not the point of the present disclosure, it is not described here.
In step 105, the difference between the coordinates of the key point in the key point heat map and the coordinates of the preset position is used as a loss function, and the step of searching the topological relation among the nodes in the deep learning model in the preset search space is repeatedly executed according to the loss function until the loss function value meets the preset condition or reaches the preset cycle number.
For example, if the loss function value is less than a preset threshold, the search process may be stopped. In addition, the search process may be stopped if the number of loops reaches a preset value.
In step 106, the deep learning model is trained using the training data to obtain a keypoint detection model.
The deep learning model determined through the search process is trained by utilizing training data to determine corresponding weight parameters for the optimal deep learning model.
In the method for training the key point detection model provided by the embodiment of the disclosure, the topological relation of the nodes in the deep learning model is optimized, so that the deep learning model matched with the key point detection task can be obtained.
Fig. 3 is a flowchart illustrating a method for training a keypoint detection model according to another embodiment of the disclosure. In some embodiments, the following keypoint detection model training method steps are performed by the keypoint detection model training device. The corresponding keypoint detection model is shown in fig. 4.
In step 301, a topological relation among nodes in the first deep learning submodel, the second deep learning submodel and the third deep learning submodel is searched in a preset search space.
In some embodiments, the search space includes at least one of convolution, pooling, full join, batch normalization.
In step 302, a corresponding feature map is extracted from the preset image.
In some embodiments, the preset image is processed using a preset feature map extraction model (e.g., by two convolutional neural networks of 3*3 with step size 2) to obtain a feature map of the preset image.
In step 303, the feature map of the preset image is processed by using the first deep learning sub-model to obtain a first feature map, where the first feature map and the feature map of the preset image have the same size.
In step 304, the first feature map is processed using the second deep learning sub-model to obtain a second feature map, the second feature map having a size that is smaller than the size of the first feature map.
In step 305, a fusion process is performed on the second feature map and the first feature map using the third deep learning sub-model to obtain an output feature map.
Here, since the size of the second feature map is smaller than that of the first feature map, the resolution of the second feature map is lower than that of the first feature map. And the feature images with different resolutions are fused, so that the subsequent generation of the key point heat map is facilitated.
In some embodiments, the third deep learning sub-model upsamples the second feature map to obtain a third feature map, the third feature map being the same size as the first feature map, when the fusion process is performed. The third feature map and the second feature map are then subjected to fusion processing. Whereby the best fusion result can be obtained.
For example, up-sampling is achieved by using bilinear interpolation.
In step 306, the output feature map is processed to generate a keypoint heat map of the preset image.
The output feature map is processed here using a heat map generation model (e.g., a convolutional neural network with a convolutional kernel 1*1, step size 1) to generate a corresponding keypoint heat map.
In step 307, the difference between the coordinates of the key point in the key point heat map and the coordinates of the preset position is taken as a loss function, and the step of searching the topology relationship among the nodes in the first deep learning submodel, the second deep learning submodel and the third deep learning submodel in the preset search space is repeatedly executed according to the loss function until the loss function value meets the preset condition or reaches the preset cycle number.
In step 308, the first, second, and third deep learning sub-models are trained using the training data to obtain a keypoint detection model.
Training the first, second and third deep learning sub-models determined through the search process by using training data to determine respective weight parameters of the optimal first, second and third deep learning sub-models.
In some embodiments, the second deep learning sub-model includes N transformation network models, and the third deep learning sub-model includes N fusion network models, N being a positive integer.
In the second deep learning sub-model, the first feature map is processed by using the 1 st transformation network model to obtain a 1 st output feature map, and the size of the 1 st output feature map is smaller than that of the first feature map. And processing the ith-1 output characteristic diagram by utilizing the ith transformation network model to obtain an ith output characteristic diagram, wherein the size of the ith output characteristic diagram is smaller than that of the ith-1 output result, and i is more than or equal to 2 and less than or equal to N. And in the third deep learning sub-model, fusing the output characteristic diagram of the Nth transformation network model and the output characteristic diagram of the N-1 th transformation network model by using the 1 st fusion network model to obtain the 1 st fusion characteristic diagram. And fusing the j-1 th fused feature map and the output feature map of the N-j transform network model by utilizing the j-th fused network model to obtain the j-th fused feature map, wherein j is more than or equal to 2 and less than or equal to N-1. And fusing the N-1 fusion feature map and the first feature map by using the N fusion network model to obtain an output feature map.
In some embodiments, in the process of entering the fusion process, each fusion network model takes a feature map with a smaller size as a first feature map to be processed and takes a feature map with a larger size as a second feature map to be processed in the received two feature maps. And up-sampling the first to-be-processed feature map to obtain a third to-be-processed feature map, wherein the third to-be-processed feature map and the second to-be-processed feature map are the same in size. And then fusing the third to-be-processed feature map and the second to-be-processed feature map.
Fig. 5 is a schematic structural diagram of a keypoint detection model according to another embodiment of the disclosure. By way of example, in the embodiment shown in fig. 5, the second deep learning sub-model includes 3 transformation network models and the third deep learning sub-model includes 3 fusion network models.
In the second deep learning sub-model, the first feature map is processed by using the transformation network model 1 to obtain a1 st output feature map, and the size of the 1 st output feature map is smaller than that of the first feature map. And processing the 1 st output characteristic diagram by using the transformation network model 2 to obtain a2 nd output characteristic diagram, wherein the size of the 2 nd output characteristic diagram is smaller than the size of the 1 st output result. And processing the 2 nd output characteristic diagram by using the transformation network model 3 to obtain a3 rd output characteristic diagram, wherein the size of the 3 rd output characteristic diagram is smaller than the size of the 2 nd output result.
In the third deep learning sub-model, the output feature map of the transformation network model 3 and the output feature map of the transformation network model 2 are fused by using the fusion network model 1 to obtain a1 st fusion feature map. And fusing the 1 st fused feature map and the output feature map of the transformation network model 1 by utilizing the fused network model 2 to obtain a2 nd fused feature map. And fusing the 2 nd fused feature map and the first feature map by using the 3 rd fused network model to obtain an output feature map.
Fig. 6 is a schematic structural diagram of a training device for a key point detection model according to an embodiment of the present disclosure. As shown in fig. 6, the keypoint detection model training device includes a search module 61, a feature extraction module 62, a first processing module 63, a second processing module 64, a first training module 65, and a second training module 66. The corresponding keypoint detection model is shown in fig. 2.
The search module 61 searches the preset search space for the topological relation among the nodes in the deep learning model.
In some embodiments, the search space includes at least one of convolution, pooling, full join, batch normalization.
It should be noted here that in deep learning, a neural network may be considered to be composed of many neuron nodes OP, and the operation space of each OP includes convolution, pooling, full-connection, batch normalization, and the like, and each OP may include several inputs and outputs. The manner of connection between any two OPs represents one of these operations. By searching in the search space to select the respective operation of each OP, the respective neural network architecture is determined from the topological relationship between the nodes. In addition, by limiting the search space, the search speed can be increased.
The feature extraction module 62 extracts a corresponding feature map from the preset image.
In some embodiments, the preset image is processed using a preset feature map extraction model (e.g., by two convolutional neural networks of 3*3 with step size 2) to obtain a feature map of the preset image.
The first processing module 63 processes the feature map of the preset image by using the deep learning model to obtain an output feature map. The feature map of the preset image is processed so as to facilitate subsequent heat map processing through feature fusion.
The second processing module 64 processes the output feature map to generate a keypoint heat map of the preset image.
For example, the output feature map is processed using a preset heat map generation model (e.g., a convolutional neural network with a convolutional kernel 1*1 and a step size of 1) to generate a corresponding keypoint heat map.
Here, the number of heat maps is the same as the number of key points. For example, if there are 16 key points in the skeleton of a human body, 16 heat maps are generated, and each heat map corresponds to one key point. The position of the maximum probability is found on each heat map, namely the position coordinate as the corresponding key point. By summarizing the position coordinates of the key points obtained by the 16 heat maps, the position coordinates of 16 key points in the human skeleton can be obtained.
The first training module 65 takes the difference between the coordinates of the key point in the key point heat map and the coordinates of the preset position as a loss function, and instructs the searching module 61 to repeatedly perform the operation of searching the topology relationship between the nodes in the deep learning model in the preset searching space according to the loss function until the loss function value satisfies the preset condition or reaches the preset cycle number.
For example, if the loss function value is less than a preset threshold, the search process may be stopped. In addition, the search process may be stopped if the number of loops reaches a preset value.
The second training module 66 trains the deep learning model with training data to obtain a keypoint detection model.
The deep learning model determined through the search process is trained by utilizing training data to determine corresponding weight parameters for the optimal deep learning model.
In some embodiments, the corresponding keypoint detection model is shown in fig. 4.
The search module 61 searches the preset search space for the topological relation among the nodes in the first deep learning submodel, the second deep learning submodel and the third deep learning submodel.
The first processing module 63 processes the feature map of the preset image by using the first deep learning sub-model to obtain a first feature map. The first feature map and the feature map of the preset image have the same size. The first feature map is then processed using a second deep learning sub-model to obtain a second feature map. The second feature map has a smaller size than the first feature map. And then, carrying out fusion processing on the second feature map and the first feature map by using a third deep learning submodel so as to obtain an output feature map.
In some embodiments, the first processing module 63 performs up-sampling on the second feature map to obtain a third feature map when performing the fusion process using the third deep learning submodel, where the third feature map and the first feature map have the same size. The third feature map and the second feature map are then subjected to fusion processing. Whereby the best fusion result can be obtained.
The first training module 65 instructs the searching module 61 to repeatedly perform the operation of searching the topology relationship among the nodes in the first deep learning submodel, the second deep learning submodel, and the third deep learning submodel in the preset search space according to the loss function until the loss function value satisfies the preset condition or reaches the preset number of cycles.
The second training module 66 trains the first, second, and third deep learning sub-models with training data to obtain a keypoint detection model.
In some embodiments, the second deep learning sub-model includes N transformation network models and the third deep learning sub-model includes N fusion network models.
The first processing module 63 processes the first feature map using the 1 st transformation network model to obtain a1 st output feature map, where the 1 st output feature map has a size smaller than the first feature map. The first processing module 63 processes the i-1 th output feature map by using the i-th transformation network model to obtain an i-1 th output feature map, wherein the size of the i-1 th output feature map is smaller than that of the i-1 th output result, and i is more than or equal to 2 and less than or equal to N.
The first processing module 63 further utilizes the 1 st fusion network model to fuse the output feature map of the nth transformation network model with the output feature map of the N-1 st transformation network model to obtain the 1 st fusion feature map. The first processing module 63 further utilizes the jth fusion network model to fuse the j-1 th fusion feature map with the output feature map of the N-j-th transformation network model to obtain the jth fusion feature map, wherein j is greater than or equal to 2 and less than or equal to N-1. The first processing module 63 utilizes the nth fusion network model to fuse the nth-1 fusion feature map with the first feature map to obtain an output feature map.
In some embodiments, in the process of entering the fusion process by using each fusion network model, the first processing module 63 takes a feature map with a smaller size as a first feature map to be processed, takes a feature map with a larger size as a second feature map to be processed, performs up-sampling on the first feature map to be processed to obtain a third feature map to be processed, and performs fusion processing on the third feature map to be processed and the second feature map to be processed, where the third feature map to be processed and the second feature map to be processed are the same in size.
In some embodiments, the corresponding keypoint detection model is shown in fig. 5.
The first processing module 63 processes the first feature map by using the transformation network model 1 to obtain a1 st output feature map, where the 1 st output feature map has a size smaller than the first feature map. The first processing module 63 processes the 1 st output feature map by using the transformation network model 2 to obtain a2 nd output feature map, where the size of the 2 nd output feature map is smaller than the size of the 1 st output result. The first processing module 63 processes the 2 nd output feature map by using the transformation network model 3 to obtain a3 rd output feature map, where the size of the 3 rd output feature map is smaller than the size of the 2 nd output result.
In addition, the first processing module 63 fuses the output feature map of the transformation network model 3 and the output feature map of the transformation network model 2 by using the fusion network model 1 to obtain a1 st fusion feature map. The first processing module 63 uses the fusion network model 2 to fuse the 1 st fusion feature map with the output feature map of the transformation network model 1 to obtain a2 nd fusion feature map. The first processing module 63 uses the 3 rd fusion network model to fuse the 2 nd fusion feature map with the first feature map to obtain an output feature map.
Fig. 7 is a schematic structural diagram of a training device for a key point detection model according to another embodiment of the present disclosure. As shown in fig. 7, the training device includes a memory 71 and a processor 72.
The memory 71 is for storing instructions and the processor 72 is coupled to the memory 71, the processor 72 being configured to perform a method as referred to in any of the embodiments of fig. 1 or 3 based on the instructions stored by the memory.
As shown in fig. 7, the apparatus further comprises a communication interface 73 for information interaction with other devices. Also, the apparatus includes a bus 74, and the processor 72, the communication interface 73, and the memory 71 communicate with each other via the bus 74.
The memory 71 may comprise a high-speed RAM memory or may further comprise a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 71 may also be a memory array. The memory 71 may also be partitioned and the blocks may be combined into virtual volumes according to certain rules.
Further, the processor 72 may be a central processing unit CPU, or may be an application specific integrated circuit ASIC, or one or more integrated circuits configured to implement embodiments of the present disclosure.
The present disclosure also relates to a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement a method as referred to in any of the embodiments of fig. 1 or 3.
Fig. 8 is a flowchart of a key point detection method according to an embodiment of the disclosure. In some embodiments, the following keypoint detection method steps are performed by the keypoint detection device.
In step 801, an image to be detected is acquired.
In step 802, a keypoint detection process is performed on an image to be detected using a keypoint detection model trained by the keypoint detection model training method according to any one of the embodiments of fig. 1 or 3, so as to obtain a keypoint in the image to be detected.
Fig. 9 is a schematic structural diagram of a key point detection device according to an embodiment of the present disclosure. As shown in fig. 9, the key point detection device includes an acquisition module 91 and a detection module 92.
The acquisition module 91 is used for acquiring an image to be detected.
The detection module 92 performs a keypoint detection process on the image to be detected by using a keypoint detection model trained by the keypoint detection model training method according to any one of the embodiments of fig. 1 or fig. 3, so as to obtain a keypoint in the image to be detected.
Fig. 10 is a schematic structural diagram of a key point detection device according to another embodiment of the present disclosure. As shown in fig. 10, the key point detecting apparatus includes a memory 1001, a processor 1002, a communication interface 1003, and a bus 1004. Fig. 10 differs from fig. 7 in that in the embodiment shown in fig. 10, the processor 1002 is configured to perform a method as referred to in any of the embodiments of fig. 8 based on memory-stored instructions.
The present disclosure also relates to a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement a method as referred to in any of the embodiments of fig. 8.
In some embodiments, the functional unit blocks described above may be implemented as general purpose processors, programmable logic controllers (Programmable Logic Controller, abbreviated as PLCs), digital signal processors (DIGITAL SIGNAL processors, abbreviated as DSPs), application Specific Integrated Circuits (ASICs), field-Programmable gate arrays (Field-Programmable GATE ARRAY, abbreviated as FPGAs), or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or any suitable combination thereof for performing the functions described in the present disclosure.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The description of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the disclosure in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiments were chosen and described in order to best explain the principles of the disclosure and the practical application, and to enable others of ordinary skill in the art to understand the disclosure for various embodiments with various modifications as are suited to the particular use contemplated.

Claims (10)

1. A key point detection model training method comprises the following steps:
Searching out topological relations among all nodes in the deep learning model in a preset search space;
extracting a corresponding feature map from a preset image;
Processing the feature map of the preset image by using the deep learning model to obtain an output feature map;
Processing the output feature map to generate a key point heat map of the preset image;
Taking the difference between the key point position coordinates and the preset position coordinates in the key point heat map as a loss function, and repeatedly executing the step of searching out the topological relation among the nodes in the deep learning model in a preset search space according to the loss function until the loss function value meets a preset condition or reaches a preset cycle number;
training the deep learning model by using training data to obtain a key point detection model;
the deep learning model comprises a first deep learning sub-model, a second deep learning sub-model and a third deep learning sub-model, wherein the second deep learning sub-model comprises N transformation network models, and the third deep learning sub-model comprises N fusion network models;
the processing of the feature map of the preset image by using the deep learning model comprises the following steps:
Processing the feature map of the preset image by using the first deep learning sub-model to obtain a first feature map, wherein the first feature map and the feature map of the preset image have the same size;
Processing the first feature map by using the second deep learning sub-model to obtain a second feature map, wherein the size of the second feature map is smaller than that of the first feature map;
performing fusion processing on the second feature map and the first feature map by using the third deep learning submodel so as to obtain the output feature map;
The fusing of the second feature map and the first feature map using the third deep learning sub-model includes:
Fusing the output characteristic diagram of the Nth transformation network model and the output characteristic diagram of the N-1 th transformation network model by using the 1 st fusion network model to obtain a1 st fusion characteristic diagram;
Fusing the j-1 th fused feature map and the output feature map of the N-j transform network model by utilizing the j-th fused network model to obtain a j-th fused feature map, wherein j is more than or equal to 2 and less than or equal to N-1;
and fusing the N-1 fusion feature map and the first feature map by using the N fusion network model to obtain the output feature map.
2. The method of claim 1, wherein,
Processing the first feature map using the second deep learning sub-model includes:
processing the first feature map by using a1 st transformation network model to obtain a1 st output feature map, wherein the size of the 1 st output feature map is smaller than that of the first feature map;
and processing the ith-1 output characteristic diagram by utilizing the ith transformation network model to obtain the ith output characteristic diagram, wherein the size of the ith output characteristic diagram is smaller than that of the ith-1 output result, and i is more than or equal to 2 and less than or equal to N.
3. The method of claim 2, wherein,
In the process of entering the fusion processing, each fusion network model takes a feature image with smaller size as a first feature image to be processed and takes a feature image with larger size as a second feature image to be processed in the received two feature images;
Up-sampling the first to-be-processed feature map to obtain a third to-be-processed feature map, wherein the third to-be-processed feature map and the second to-be-processed feature map are the same in size;
And carrying out fusion processing on the third to-be-processed feature map and the second to-be-processed feature map.
4. The method according to any one of claim 1 to 3, wherein,
The search space includes at least one of convolution, pooling, full join, batch normalization.
5. A keypoint detection model training device, comprising:
The searching module is configured to search out topological relations among all nodes in the deep learning model in a preset searching space;
the feature extraction module is configured to extract a corresponding feature map from a preset image;
The first processing module is configured to process the feature map of the preset image by using the deep learning model to obtain an output feature map, wherein the deep learning model comprises a first deep learning sub-model, a second deep learning sub-model and a third deep learning sub-model, the second deep learning sub-model comprises N transformation network models, the third deep learning sub-model comprises N fusion network models, the feature map of the preset image is processed by using the first deep learning sub-model to obtain a first feature map, the first feature map is the same as the feature map of the preset image in size, the first feature map is processed by using the second deep learning sub-model to obtain a second feature map, the second feature map is smaller than the first feature map in size, the second feature map is fused by using the third deep learning sub-model to obtain the output feature map, the output feature map is obtained by using the first fusion network model, the output feature map of the Nth transformation network model is fused by using the first deep learning sub-model to obtain a first feature map, the output feature map is fused by using the first fusion network model of the Nth transformation network model, and the output feature map of the Nj is fused by using the first fusion network model of the Nth transformation network model to obtain a j-1, and the output feature map is fused by using the first feature map of the Nj-1, and the output feature map is fused by using the first fusion model, and the j is fused by the first feature map and is fused by the j fused by the first feature map;
the second processing module is configured to process the output feature map to generate a key point heat map of the preset image;
The first training module is configured to take the difference between the coordinates of the key points in the key point heat map and the coordinates of the preset positions as a loss function, and instruct the searching module to repeatedly execute the operation of searching out the topological relation among the nodes in the deep learning model in the preset searching space according to the loss function until the loss function value meets the preset condition or reaches the preset cycle number;
And the second training module is configured to train the deep learning model by using training data so as to obtain a key point detection model.
6. A keypoint detection model training device, comprising:
A memory configured to store instructions;
A processor coupled to the memory, the processor configured to perform the method of any of claims 1-4 based on instructions stored by the memory.
7. A keypoint detection method comprising:
Collecting an image to be detected;
Performing keypoint detection processing on the image to be detected by using the keypoint detection model trained by the keypoint detection model training method according to any one of claims 1 to 4, so as to obtain the keypoints in the image to be detected.
8. A keypoint detection device comprising:
the acquisition module is configured to acquire an image to be detected;
A detection module configured to perform a keypoint detection process on the image to be detected using the keypoint detection model trained by the keypoint detection model training method of any one of claims 1 to 4, to obtain keypoints in the image to be detected.
9. A keypoint detection device comprising:
A memory configured to store instructions;
A processor coupled to the memory, the processor configured to perform the method of implementing the method of claim 7 based on instructions stored by the memory.
10. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the method of any one of claims 1-4, 7.
CN202010243835.5A 2020-03-31 2020-03-31 Method and device for training key point detection model, and method and device for detecting key point Active CN113468924B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010243835.5A CN113468924B (en) 2020-03-31 2020-03-31 Method and device for training key point detection model, and method and device for detecting key point

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010243835.5A CN113468924B (en) 2020-03-31 2020-03-31 Method and device for training key point detection model, and method and device for detecting key point

Publications (2)

Publication Number Publication Date
CN113468924A CN113468924A (en) 2021-10-01
CN113468924B true CN113468924B (en) 2024-06-18

Family

ID=77866085

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010243835.5A Active CN113468924B (en) 2020-03-31 2020-03-31 Method and device for training key point detection model, and method and device for detecting key point

Country Status (1)

Country Link
CN (1) CN113468924B (en)

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105787439B (en) * 2016-02-04 2019-04-05 广州新节奏智能科技股份有限公司 A kind of depth image human synovial localization method based on convolutional neural networks
IL274424B2 (en) * 2017-11-14 2024-07-01 Magic Leap Inc Meta-learning for multi-task learning for neural networks
CN110163048B (en) * 2018-07-10 2023-06-02 腾讯科技(深圳)有限公司 Hand key point recognition model training method, hand key point recognition method and hand key point recognition equipment
CN108985259B (en) * 2018-08-03 2022-03-18 百度在线网络技术(北京)有限公司 Human body action recognition method and device
CN109753910B (en) * 2018-12-27 2020-02-21 北京字节跳动网络技术有限公司 Key point extraction method, model training method, device, medium and equipment
CN109948526B (en) * 2019-03-18 2021-10-29 北京市商汤科技开发有限公司 Image processing method and device, detection equipment and storage medium
CN110309706B (en) * 2019-05-06 2023-05-12 深圳华付技术股份有限公司 Face key point detection method and device, computer equipment and storage medium
CN110084221B (en) * 2019-05-08 2023-02-03 南京云智控产业技术研究院有限公司 Serialized human face key point detection method with relay supervision based on deep learning
CN110532981B (en) * 2019-09-03 2022-03-15 北京字节跳动网络技术有限公司 Human body key point extraction method and device, readable storage medium and equipment
CN110705563B (en) * 2019-09-07 2020-12-29 创新奇智(重庆)科技有限公司 Industrial part key point detection method based on deep learning
CN110728359B (en) * 2019-10-10 2022-04-26 北京百度网讯科技有限公司 Method, device, equipment and storage medium for searching model structure

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Deep residual learning for image recognition;He KM 等;Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition;20161231;全文 *
基于Pytorch和Opencv的人脸关键点检测;孟令军;王静波;;电视技术;20190725(第14期);全文 *

Also Published As

Publication number Publication date
CN113468924A (en) 2021-10-01

Similar Documents

Publication Publication Date Title
CN111160375B (en) Three-dimensional key point prediction and deep learning model training method, device and equipment
CN109740534B (en) Image processing method, device and processing equipment
CN112990297A (en) Training method, application method and device of multi-mode pre-training model
CN111179419B (en) Three-dimensional key point prediction and deep learning model training method, device and equipment
CN110348447B (en) Multi-model integrated target detection method with abundant spatial information
CN113192112B (en) Partial corresponding point cloud registration method based on learning sampling
US11449728B2 (en) Method of optimization of operating a convolutional neural network and system thereof
CN114896395A (en) Language model fine-tuning method, text classification method, device and equipment
US20220292349A1 (en) Device and computer-implemented method for the processing of digital sensor data and training method therefor
CN110782430A (en) Small target detection method and device, electronic equipment and storage medium
CN113468924B (en) Method and device for training key point detection model, and method and device for detecting key point
CN110728359B (en) Method, device, equipment and storage medium for searching model structure
CN110162655B (en) Sketch-based three-dimensional model local retrieval method and device and terminal equipment
CN112801045B (en) Text region detection method, electronic equipment and computer storage medium
CN116503654A (en) Multimode feature fusion method for carrying out character interaction detection based on bipartite graph structure
CN110705695B (en) Method, device, equipment and storage medium for searching model structure
CN114913330A (en) Point cloud component segmentation method and device, electronic equipment and storage medium
CN111782837B (en) Image retrieval method and device
Knobloch et al. PROBABILISTIC ANALYSIS OF THE CONVERGENCE OF THE DIFFERENTIAL EVOLUTION ALGORITHM.
CN112766465A (en) Training method of neural network for intelligent rotation performance detection
CN111598092A (en) Method for determining target area in image, method and device for identifying target
US20230409869A1 (en) Process for transforming a trained artificial neuron network
Matsui et al. Automatic feature point selection through hybrid metaheauristics based on Tabu search and memetic algorithm for augmented reality
CN109582296B (en) Program representation method based on stack enhanced LSTM
Huang et al. Lightweight Contrast Modeling for Attention-Aware Visual Localization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant