WO2022247403A1 - 关键点检测方法、电子设备、程序及存储介质 - Google Patents

关键点检测方法、电子设备、程序及存储介质 Download PDF

Info

Publication number
WO2022247403A1
WO2022247403A1 PCT/CN2022/081229 CN2022081229W WO2022247403A1 WO 2022247403 A1 WO2022247403 A1 WO 2022247403A1 CN 2022081229 W CN2022081229 W CN 2022081229W WO 2022247403 A1 WO2022247403 A1 WO 2022247403A1
Authority
WO
WIPO (PCT)
Prior art keywords
key point
key
point detection
target object
offsets
Prior art date
Application number
PCT/CN2022/081229
Other languages
English (en)
French (fr)
Inventor
李帮怀
袁野
Original Assignee
北京迈格威科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京迈格威科技有限公司 filed Critical 北京迈格威科技有限公司
Publication of WO2022247403A1 publication Critical patent/WO2022247403A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the technical field of image recognition, in particular to a key point detection method, electronic equipment, program and storage medium.
  • Key point detection is widely used in daily life. Common face recognition algorithms usually also rely on key point detection. Popular applications such as makeup testing, beautification, and face changing are also based on key point detection technology. On the other hand, the high requirement of keypoint detection accuracy is emphasized.
  • common key point detection methods are usually two-stage.
  • a target detection model is used to obtain the position of the target object in the image.
  • the target object is extracted according to the detected target frame, and then the key point
  • the point detection model detects key points, which is often referred to as the "top-down" approach. Since this method needs to be carried out step by step, when multiple target objects are included in one image, it needs to be cut out multiple times, and the key point detection model needs to be used for multiple detections respectively, so the detection efficiency is low.
  • the embodiments of the present application are proposed to provide a key point detection method, electronic device, program and storage medium that overcome the above problems or at least partially solve the above problems.
  • a key point detection method including:
  • multiple pose key point templates perform key point detection on the feature map to obtain at least one set of key points corresponding to the target object, wherein the pose key point template represents multiple keys in the pose key point template The relative positional relationship of the points;
  • the at least one group of key points is screened to obtain a key point detection result of the target object.
  • a key point detection method including:
  • a plurality of pose key point templates perform key point detection on the feature map to obtain at least one set of key point offsets corresponding to the target object, wherein the pose key point template represents the pose key point template
  • the relative positional relationship of a plurality of key points, each group of key point offsets characterizes the offset of the group of key points in the feature map and the key points in each attitude key point template
  • a key point detection result of the target object is obtained.
  • a key point detection device including:
  • the feature extraction module is used to extract the image features of the image to be detected through the backbone network to obtain a feature map, and the image to be detected includes the target object;
  • a key point detection module configured to perform key point detection on the feature map according to a plurality of pose key point templates to obtain at least one set of key points corresponding to the target object, wherein the pose key point template represents the pose The relative positional relationship of multiple key points in the key point template;
  • the detection result determining module is configured to screen the at least one group of key points to obtain the key point detection result of the target object.
  • a key point detection device including:
  • the feature extraction module is used to extract the image features of the image to be detected through the backbone network to obtain a feature map, and the image to be detected includes the target object;
  • a key point detection module configured to perform key point detection on the feature map according to a plurality of pose key point templates, to obtain at least one set of key point offsets corresponding to the target object, wherein the pose key point template represents The relative positional relationship of multiple key points in the attitude key point template, and the offset of each group of key points represents the offset between the key point in the feature map and the key point in each attitude key point template;
  • a detection result determining module configured to obtain a key point detection result of the target object according to the at least one set of key point offsets.
  • an electronic device including: a processor, a memory, and a computer program stored in the memory and operable on the processor, the computer program being executed by the The processor implements the key point detection method as described in the first aspect or the second aspect during execution.
  • a computer program including computer readable codes, when the computer readable codes are run on an electronic device, causing the electronic device to execute the first aspect or the second aspect The key point detection method described.
  • a computer-readable storage medium stores the computer program described in the sixth aspect.
  • the key point detection method, electronic equipment, program, and storage medium provided in the embodiments of the present application, after extracting the feature map of the image to be detected through the backbone network, perform key point detection on the feature map according to multiple pose key point templates to obtain the target object Corresponding to at least one set of key points, at least one set of key points is screened to obtain key point detection results of the target object, since the key point detection results of all target objects in the image to be detected can be directly determined based on multiple pose key point templates, It is not necessary to first determine the position of the target object in the image to be detected and then detect the key points, so that the detection efficiency of the key points can be improved.
  • Fig. 1 is a flow chart of the steps of a key point detection method provided in the embodiment of the present application
  • Figures 2a-2c are illustrations of gesture key point templates in the embodiment of the present application.
  • FIG. 3 is a flow chart of steps of another key point detection method provided in the embodiment of the present application.
  • FIG. 4 is a structural block diagram of a key point detection device provided in an embodiment of the present application.
  • Fig. 5 is a structural block diagram of another key point detection device provided by the embodiment of the present application.
  • Figure 6 schematically shows a block diagram of an electronic device for performing the method according to the present application.
  • Fig. 7 schematically shows a storage unit for holding or carrying program codes for realizing the method according to the present application.
  • Fig. 1 is a flow chart of the steps of a key point detection method provided in the embodiment of the present application. As shown in Fig. 1, the method may include:
  • Step 101 extracting image features of an image to be detected through a backbone network to obtain a feature map, and the image to be detected includes a target object.
  • the backbone network is used to extract image features of the image to be detected, for example, it may be a ResNet-50 network or the like.
  • the target object may be, for example, a human face, a human body, a pet, a vehicle, and the like.
  • the image to be detected is input into a backbone network such as ResNet-50, and the high-dimensional feature representation of the image to be detected is obtained, that is, the feature map is obtained.
  • Step 102 Perform key point detection on the feature map according to multiple pose key point templates to obtain at least one set of key points corresponding to the target object, wherein the pose key point templates represent multiple poses corresponding to the pose. The relative position relationship of key points.
  • the key point detection method in the embodiment of the present application is a "bottom-up" key point detection method, which is a single-stage complete end-to-end key point detection method, which can input any image to be detected and directly obtain the image to be detected
  • the key point position determination results of all target objects in the target object do not need to use the target detection model to determine the position of the target object and cut out the image.
  • the entire model forward process only needs one time, while the traditional "top-down" method needs to be For matting, the number of forwards is proportional to the number of target objects in the picture.
  • Each pose key point template corresponds to a key point in a pose. Because some poses will cause key points to be blocked, so The number of keypoints in each pose keypoint template is not necessarily the same.
  • 2a-2c are illustrations of pose key point templates in the embodiment of the present application. As shown in Figures 2a-2c, when the target object is a human face, three different pose key point templates are given, and one pose key point template defines the relative position relationship of the key points corresponding to a pose. The relative positional relationship can be, for example, the position of each key point relative to one of the center points. Each pose key point template includes the same center point. When the target object is a human face, the center point can be, for example, the nose in the key points. A point corresponding to the center (that is, the position of the tip of the nose), which is not limited in this embodiment of the present application.
  • the feature map can be directly input into the key point detection model, and the key point detection model is used to perform regression calculation on the key points in the image to be detected based on multiple pose key point templates to obtain the corresponding At least one set of keypoints for .
  • Step 103 Filter the at least one group of key points to obtain a key point detection result of the target object.
  • At least one set of key points corresponding to the target object is obtained, at least one set of key points is screened to filter out key point groups that indeed contain the target object, and a key point detection result of the target object in the image to be detected is obtained.
  • the key point detection model in the embodiment of the present application does not need to carry out the matting of the target object in the image to be detected first, and the key point detection results of all target objects in the image to be detected can be directly obtained based on the pose key point template, and the image to be detected includes When there are multiple target objects, the key point detection results of multiple target objects can be obtained through one detection.
  • key point detection is performed on the feature map to obtain at least one set of key points corresponding to the target object, including: multiple candidate key point offsets Confidence corresponding to each candidate key point offset, wherein the candidate key point offset is the offset between the feature point in the feature map and the key point in each pose key point template; according to the multiple Candidate key point offsets and the confidence level are used to determine at least one set of key points corresponding to the target object.
  • the feature point may be a pixel in the feature map, or may be a collection of multiple pixel points in a specific area in the feature map.
  • the feature map is input into the key point detection model, and the key point detection model performs key point detection on the feature map based on multiple pose key point templates to regress the offset between the feature points in the feature map and the key points in each pose key point template, A plurality of candidate key point offsets are obtained, and a confidence degree corresponding to the candidate key point offsets is determined. After obtaining multiple candidate key point offsets and confidences, the candidate key point offsets can be preliminarily screened based on the confidence to select the candidate key point offsets whose confidence meets the preset conditions.
  • Point templates, feature points, and filtered candidate key point offsets determine key point coordinates, and obtain at least one set of key points corresponding to the target object; or, after obtaining multiple candidate key point offsets and confidence degrees, you can Based on the key point template, feature points and key point offsets, a set of key point coordinates corresponding to a set of candidate key point offsets is determined to obtain at least one set of key points corresponding to the target object.
  • determining at least one set of key points corresponding to the target object according to the plurality of candidate key point offsets and the confidence level includes: offsetting from the plurality of candidate key point offsets Filter out the combination of key point offsets whose confidence is greater than or equal to the confidence threshold, and determine the combination of the filtered key point offsets as at least one set of key point offsets, and each set of key point offsets The amount characterizes the offset of the group of key points in the feature map and the key points in each attitude key point template; according to the at least one set of key point offsets and the attitude key points corresponding to each set of key point offsets A template for determining at least one set of key points corresponding to the target object.
  • the confidence of the combination of candidate key point offsets among multiple candidate key point offsets may be relatively small, such candidate key point offsets will not get the correct key point, so you can first select from multiple candidate key point offsets Screen out the combination of key point offsets whose confidence is greater than or equal to the confidence threshold from the key point offsets, determine a combination of key point offsets that has been screened out as a set of key point offsets, and filter out multiple key point offsets. Multiple sets of key point offsets can be obtained by combining key point offsets, so as to obtain at least one set of key point offsets.
  • the coordinates of at least one set of key points can be determined, and the corresponding At least one set of keypoints for . Preliminary screening of candidate key points based on confidence can reduce the amount of calculation and improve processing speed.
  • the at least one set of key points is screened to obtain the key point detection result of the target object, including: according to the at least one set of key points and the confidence corresponding to each set of key points , to determine the key point detection result of the target object.
  • the confidence of each set of key points in the at least one set of key points is to obtain the candidate key point offset of the set of key points Confidence of the quantity, so that at least one group of key points can be screened based on the corresponding confidence of at least one group of key points, and the key point detection result of the target object can be determined.
  • non-maximum value suppression processing When screening at least one group of key points, it can be screened based on non-maximum value suppression processing, and when performing non-maximum value suppression processing, it can directly perform non-maximum value suppression processing on at least one group of key points, or After the target box of each group of key points is determined, non-maximum value suppression is performed on the target boxes corresponding to at least one group of key points.
  • the determining the key point detection result of the target object according to the at least one set of key points and the confidence corresponding to each set of key points includes: respectively determining the corresponding according to the confidence corresponding to each group of key points, perform non-maximum value suppression processing on the target frames corresponding to the at least one group of key points, and obtain the key point detection result of the target object.
  • the target frame represents the location of the target object.
  • the target frame corresponding to each group of key points can be determined first, and then based on the confidence corresponding to each group of key points, at least one group of key points corresponding to The target frame is subjected to non-maximum suppression processing to obtain the final key point detection result.
  • non-maximum value suppression processing is performed based on the target frame. Compared with directly performing non-maximum value suppression processing on the key point, the amount of processed data can be reduced and the detection efficiency can be further improved.
  • the separately determining the target frame corresponding to each group of key points includes: respectively determining the minimum circumscribed rectangle corresponding to each group of key points as the target frame corresponding to the group of key points.
  • the minimum circumscribed rectangle of the group of key points may be determined first, and the minimum circumscribed rectangle is determined as the target frame corresponding to the group of key points.
  • the key The point detection results can accurately screen out the key points of the target object and improve the accuracy of key point detection.
  • key point detection is performed on the feature map to obtain multiple candidate key point offsets and confidence degrees corresponding to each candidate key point offset, including : Based on each feature point in the feature map, a plurality of the pose key point templates are matched with the feature map, and the offset between the feature points and the key points in each pose key point template is determined and the confidence of the feature points to obtain at least one set of key point offsets and the confidence corresponding to each set of key point offsets.
  • each feature point in the feature map is used as the center point for matching the feature map with the pose key point template, and multiple pose key point templates are matched with the feature map respectively, according to multiple The relative position relationship between the key point and the center point in the pose key point template, determine the offset between the center point and the key point in each pose key point template, one feature point corresponds to multiple pose key point templates, and multiple sets of key point offsets can be obtained.
  • multiple sets of key point offsets can be obtained, so that at least one set of key point offsets can be obtained after matching each feature point with the attitude key point template. During the matching process, it can be obtained Confidence for each set of keypoint offsets.
  • the screening of the at least one group of key points according to the confidence corresponding to each group of key points to obtain the key point detection result of the target object includes: according to the at least one The group of key points and the confidence corresponding to each group of key points, performing non-maximum value suppression processing on the at least one group of key points to obtain the key point detection result of the target object;
  • the method further includes: determining a target frame according to the key point detection result, and the target frame represents the location of the target object.
  • At least one group of key points can be screened by performing non-maximum value suppression processing, that is, at least one group of key points can be directly non-maximum Value suppression processing to obtain the key point detection results of the target object.
  • non-maximum value suppression processing that is, at least one group of key points can be directly non-maximum Value suppression processing to obtain the key point detection results of the target object.
  • a group of key points belonging to the same target object in the key point detection result can be determined, and the minimum circumscribed rectangle of the same group of key points can be determined, and the minimum circumscribed rectangle can be determined as the same group of key points.
  • the point's target box By determining the target box, the key point detection result and the corresponding target box can be displayed at the same time.
  • the key point detection method after extracting the feature map of the image to be detected through the backbone network, performs key point detection on the feature map according to multiple pose key point templates, and obtains at least one set of key points corresponding to the target object. At least one group of key points is screened to obtain the key point detection results of the target object. Since the key point detection results of all target objects in the image to be detected can be directly determined based on multiple pose key point templates, it is not necessary to first determine the key point detection results of the image to be detected. The position of the target object is then detected as a key point, so that the detection efficiency of the key point can be improved.
  • a plurality of pose key point templates perform key point detection on the feature map to obtain at least one set of key points corresponding to the target object, including: according to a plurality of pose key point templates, by The key point detection model performs key point detection on the feature map to obtain at least one set of key point offsets corresponding to the target object and a confidence degree corresponding to each set of key point offsets.
  • the key point detection model can be based on multiple pose key point template regression key points in the feature map and the offset of the key points in the pose key point template, and determine the confidence of each group of key point offsets.
  • the feature map is input into the key point detection model, and the key point detection model uses each feature point in the feature map as the center point for matching with the pose key point template, that is, as an anchor point (anchor), and each pose key point
  • the template is attached to the anchor point, and the key point offset of the key point relative to the anchor point is determined based on each pose key point template attached to it.
  • Each pose key point template gets a set of key point offsets. For multiple The pose key point template will get multiple sets of key point offsets corresponding to the feature points, and the key point detection model will output the corresponding confidence for each set of key point offsets.
  • the regression can be performed based on the convolutional layer.
  • the method further includes: training an initial key point detection model based on the gesture key point template and the sample image to obtain the key point detection model.
  • the initial key point detection model can be obtained by randomly initializing the network parameters, and the initial key point detection model is trained based on the pose key point template and sample images to obtain the trained key point detection model.
  • the training steps of the key point detection model include:
  • the key point labeling can be the labeling of the key point position coordinates detected by means of other key point detection methods, for example, after using the target detection model to detect the target object in the sample image, the detected target object can be Cut out the image, and use the traditional key point detection model to detect the coordinates of the key points.
  • the sample image After obtaining the sample image, perform key point detection on the sample image with the help of other key point detection methods, obtain the key point annotation corresponding to the target object in the sample image, input the sample image into the backbone network, and perform feature extraction on the sample image through the backbone network, Obtain the sample feature map corresponding to the sample image, input the sample feature map into the initial key point detection model, and the initial key point detection model performs key point detection on the sample feature map based on the pose key point template, and obtains the predicted offset of multiple key point sets , based on the predicted offset and the attitude key point template corresponding to the predicted offset, the key point coordinates of each key point set are determined, and based on the key point coordinates and key point labels, the network parameters of the initial key point detection model are adjusted to obtain The trained keypoint detection model.
  • the training step of the key point detection model also includes determining the key point of each pose based on the distance or offset between the key point in the pose key point template and the key point label. Confidence label corresponding to each sample feature point in the point template in the sample feature map;
  • the initial key point detection model is trained to obtain a trained key point detection model, including: based on the predicted offset , the posture key point template corresponding to the predicted offset, the key point label and the confidence level label, and train the initial key point detection model to obtain the key point detection model after training.
  • a classification network can be set in the key point detection model to predict the attachment of a feature point through the classification network. Confidence of multiple pose key point templates, which requires determining the confidence label corresponding to the sample feature map, in order to train and supervise the key point detection model.
  • Each feature point in the sample feature map is used as the center point for matching with the attitude key point template, and when each attitude key point template is respectively attached to the center point, the key point in the attitude key point template is determined relative to the key point.
  • the distance or offset of the point label so based on the distance or offset, the confidence label of each feature point in the sample feature map relative to each pose key point template can be determined.
  • One feature point corresponding to multiple pose key point templates will get the same confidence labeling results as the number of pose key point templates. For example, when there are 24 pose key point templates, one feature point in the sample feature map will correspond to 24 confidence levels Label the results.
  • the initial key point detection model can be trained based on the predicted offset, the attitude key point template corresponding to the predicted offset, the key point label, and the confidence label to meet the training end condition When the training ends, the key point detection model after training is obtained.
  • the distance or offset can be mapped between 0 and 1 to obtain the confidence label of each feature point in the sample feature map relative to each pose key point template.
  • the distance between the key points in the pose key point template and the key point annotation is the offset between the key points in the pose key point template and the key point annotation.
  • the distance or offset between the pose key point template and the key point label at each feature point can be mapped to between 0 and 1 by means of sigmoid, etc., and the obtained value is used as each feature point relative to each pose key Confidence labels for point templates. Since the greater the distance, the lower the confidence, and the smaller the distance, the higher the confidence, so the larger the distance, the smaller the classification label obtained by mapping, and the smaller the distance, the larger the classification label obtained by mapping.
  • the initial key point detection model also outputs a prediction confidence corresponding to the prediction offset.
  • the initial key point detection model is trained to obtain the trained Keypoint detection models, including:
  • the network parameters of the initial key point detection model are adjusted to obtain the key point detection model after training.
  • the predicted coordinates of the key points corresponding to the attitude key point template can be obtained, and the predicted coordinates of the key points and the key point labels corresponding to the sample feature map are substituted into the regression loss function , to obtain the regression loss value; and substitute the prediction confidence and confidence label corresponding to each group of prediction offsets into the confidence loss function to obtain the confidence loss value.
  • Add the regression loss value and the confidence loss value as the target loss value adjust the network parameters of the initial key point detection model based on the target loss value, and obtain the key point detection model after training.
  • the predicted coordinates of the key points can be obtained according to the predicted offset of the output of the key point detection model and the pose key point template, and the absolute value of the difference between the predicted coordinates of the key point and the key point label is taken as Regression loss function.
  • the confidence loss function may be a cross-entropy loss function (Cross-Entropy Loss).
  • Regression loss and confidence loss are used to constrain the training of the key point detection model, so that the trained key point detection model can accurately give key points and corresponding confidence levels, thereby improving the overall detection accuracy.
  • Fig. 3 is a flow chart of the steps of a key point detection method provided in the embodiment of the present application. As shown in Fig. 3, the method may include:
  • Step 301 extracting image features of an image to be detected through a backbone network to obtain a feature map, and the image to be detected includes a target object.
  • the backbone network is used to extract image features of the image to be detected, for example, it may be a ResNet-50 network or the like.
  • the target object may be, for example, a human face, a human body, a pet, a vehicle, and the like.
  • the image to be detected is input into a backbone network such as ResNet-50, and the high-dimensional feature representation of the image to be detected is obtained, that is, the feature map is obtained.
  • Step 302 Perform key point detection on the feature map according to multiple pose key point templates to obtain at least one set of key point offsets corresponding to the target object, wherein the pose key point templates represent the pose and pose The relative positional relationship of the corresponding multiple key points, and the offset of each group of key points represents the offset between the group of key points in the feature map and the key points in each pose key point template.
  • multiple pose key point templates are predefined relative positional relationships of key points corresponding to different poses, and each pose key point template corresponds to a key point in a pose, because some poses will cause key points to be occluded , so the number of key points in each pose key point template is not necessarily the same.
  • a pose key point template can regress a set of key point offsets at a feature point, so that for the feature map
  • the feature points and multiple pose key point templates can obtain at least one set of key point offsets corresponding to the target object.
  • the coordinates of a group of key points can be obtained based on the offset of each group of key points and the corresponding pose key point template.
  • performing key point detection on the feature map according to multiple pose key point templates to obtain at least one set of key point offsets corresponding to the target object including: A point template, performing key point detection on the feature map to obtain multiple candidate key point offsets and confidence levels corresponding to each candidate key point offset; according to the multiple candidate key point offsets and each candidate key point offset
  • the confidence degree corresponding to the point offset is to determine at least one set of key point offsets corresponding to the target object.
  • the key point offsets corresponding to the feature points in the regression feature map will get multiple sets of key point offsets based on each feature point, and at the same time, the corresponding key point offsets for each set of key point offsets can be obtained. Confidence, these key point offsets are candidate key point offsets, based on the confidence of each group of candidate key point offsets, the candidate key point offsets can be selected from the candidate key point offsets with a confidence greater than or equal to the confidence threshold A combination of candidate key point offsets, and performing non-maximum value suppression processing on the selected combination of candidate key point offsets to obtain at least one set of key point offsets corresponding to the target object. More accurate key point offsets can be obtained by determining at least one set of key point offsets corresponding to the target object based on the confidence levels corresponding to the candidate key point offsets.
  • Step 303 Obtain a key point detection result of the target object according to the at least one set of key point offsets.
  • each set of key point offsets represents the offset between the set of key points in the feature map and the key points in each pose key point template, based on at least one set of key point offsets and the corresponding pose key point template, it can be Get the key point detection results of the target object.
  • the key point detection result of the target object is obtained according to the at least one set of key point offsets, including:
  • a key point detection result of the target object is determined according to the at least one set of key point offsets and the pose key point template corresponding to the at least one set of key point offsets.
  • the key point detection method after extracting the feature map of the image to be detected through the backbone network, performs key point detection on the feature map according to multiple pose key point templates, and obtains at least one set of key point offsets corresponding to the target object Quantities, according to at least one set of key point offsets, determine the key point detection results of the target object, because the key point offsets of all target objects in the image to be detected can be directly determined based on multiple pose key point templates, so based on the key point
  • the offset can obtain the key point detection results of all target objects, and it is not necessary to first determine the position of the target object in the image to be detected before performing key point detection, thereby improving the detection efficiency of key points.
  • Fig. 4 is a structural block diagram of a key point detection device provided in an embodiment of the present application. As shown in Fig. 4, the key point detection device may include:
  • the feature extraction module 401 is used to extract the image features of the image to be detected through the backbone network to obtain a feature map, and the image to be detected includes the target object;
  • the key point detection module 402 is configured to perform key point detection on the feature map according to a plurality of pose key point templates to obtain at least one set of key points corresponding to the target object, wherein the pose key point template represents the same pose The relative positional relationship of the corresponding multiple key points;
  • the detection result determining module 403 is configured to filter the at least one group of key points to obtain a key point detection result of the target object.
  • the key point detection module includes:
  • the key point detection unit is used to perform key point detection on the feature map according to a plurality of posture key point templates, and obtain a plurality of candidate key point offsets and confidence degrees corresponding to each candidate key point offset, wherein the The candidate key point offset is the offset between the feature point in the feature map and the key point in each attitude key point template;
  • a key point determining unit configured to determine at least one group of key points corresponding to the target object according to the plurality of candidate key point offsets and the confidence.
  • the detection result determination module is specifically used for:
  • a key point detection result of the target object is determined according to the at least one group of key points and the confidence corresponding to each group of key points.
  • the detection result determination module includes:
  • a target frame determination unit is used to respectively determine the target frame corresponding to each group of key points
  • the detection result determination unit is used to perform non-maximum value suppression processing on the target frame corresponding to the at least one group of key points according to the confidence degree corresponding to each group of key points, so as to obtain the key point detection result of the target object.
  • the target frame determining unit is specifically used for:
  • the minimum bounding rectangle corresponding to each group of key points is determined as the target box corresponding to each group of key points.
  • the key point determining unit is specifically used for:
  • the key point detection unit is specifically used for:
  • a plurality of the gesture key point templates are matched with the feature map, and the offset and sum of the key points in the feature point and each pose key point template are determined.
  • the confidence of the feature points is obtained by obtaining at least one set of key point offsets and the confidence corresponding to each set of key point offsets.
  • the detection result determination module includes:
  • a detection result determination unit configured to perform non-maximum value suppression processing on the at least one group of key points according to the at least one group of key points and the confidence corresponding to each group of key points, to obtain the key point detection of the target object result;
  • the device also includes:
  • the target frame determining module is configured to determine a target frame according to the key point detection result, and the target frame represents the location of the target object.
  • the key point detection module is specifically used for:
  • key point detection is performed on the feature map through a key point detection model, and at least one set of key point offsets corresponding to the target object and confidence corresponding to each set of key point offsets are obtained.
  • Spend is performed on the feature map through a key point detection model, and at least one set of key point offsets corresponding to the target object and confidence corresponding to each set of key point offsets are obtained.
  • the device also includes:
  • the training module is used to train the initial key point detection model based on the posture key point template and the sample image, so as to obtain the key point detection model.
  • the training module includes:
  • a sample acquisition unit configured to acquire a sample image and a key point label corresponding to a target object in the sample image
  • the sample feature extraction unit is used to extract the image features of the sample image through the backbone network to obtain a sample feature map corresponding to the sample image;
  • a model processing unit configured to perform key point detection on the sample feature map through the initial key point detection model based on the posture key point template, and obtain a predicted offset of the key point set output by the initial key point detection model;
  • a model training unit configured to train an initial key point detection model based on the predicted offset, the pose key point template corresponding to the predicted offset, and the key point label, to obtain a trained key point detection model.
  • the training module also includes:
  • Confidence label determination unit used to determine each gesture key point template in each sample feature in the sample feature map based on the distance or offset between the key point in the gesture key point template and the key point label The corresponding confidence label at the point;
  • the model training unit is specifically used to: train the initial key point detection model based on the predicted offset, the attitude key point template corresponding to the predicted offset, the key point label and the confidence label, and obtain the training model. keypoint detection model.
  • the initial key point detection model also outputs the prediction confidence corresponding to the prediction offset.
  • model training unit is specifically used for:
  • the network parameters of the initial key point detection model are adjusted to obtain the key point detection model after training.
  • the key point detection device after extracting the feature map of the image to be detected through the backbone network, performs key point detection on the feature map according to multiple pose key point templates to obtain at least one set of key points, and at least one set of key points Points are screened to obtain the key point detection results of the target object. Since the key point detection results of all target objects in the image to be detected can be directly determined based on multiple pose key point templates, it is not necessary to first determine the position of the target object in the image to be detected. Then the detection of key points is carried out, so that the detection efficiency of key points can be improved.
  • Fig. 5 is a structural block diagram of a key point detection device provided in an embodiment of the present application. As shown in Fig. 5, the key point detection device may include:
  • the feature extraction module 501 is used to extract the image features of the image to be detected through the backbone network to obtain a feature map, and the image to be detected includes a target object;
  • the key point detection module 502 is configured to perform key point detection on the feature map according to a plurality of pose key point templates to obtain at least one set of key point offsets corresponding to the target object, wherein the pose key point template Characterizing the relative positional relationship of the plurality of key points corresponding to the posture, and each group of key point offsets characterizes the offset of the group of key points in the feature map and the key points in each posture key point template;
  • the detection result determination module 503 is configured to obtain a key point detection result of the target object according to the at least one set of key point offsets.
  • the key point detection module includes:
  • a key point detection unit configured to perform key point detection on the feature map according to a plurality of posture key point templates, to obtain a plurality of candidate key point offsets and confidence degrees corresponding to each candidate key point offset;
  • An offset determination unit configured to determine at least one set of key point offsets corresponding to the target object according to the plurality of candidate key point offsets and the confidence corresponding to each candidate key point offset.
  • the detection result determination module is specifically used for:
  • a key point detection result of the target object is determined according to the at least one set of key point offsets and the pose key point template corresponding to the at least one set of key point offsets.
  • the key point detection device after extracting the feature map of the image to be detected through the backbone network, performs key point detection on the feature map according to multiple pose key point templates, and obtains at least one set of key point offsets corresponding to the target object amount, according to at least one set of key point offsets, determine the key point detection results of the target object, because the key point offsets in the image to be detected can be directly determined based on multiple pose key point templates, so based on the key point offsets Key point detection results of all target objects can be obtained without first determining the position of the target object in the image to be detected and then performing key point detection, thereby improving the detection efficiency of key points.
  • the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network elements. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without any creative efforts.
  • the various component embodiments of the present application may be realized in hardware, or in software modules running on one or more processors, or in a combination thereof.
  • a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all functions of some or all components in the computing processing device according to the embodiments of the present application.
  • DSP digital signal processor
  • the present application can also be implemented as an apparatus or apparatus program (eg, computer program and computer program product) for performing a part or all of the methods described herein.
  • Such a program implementing the present application may be stored on a computer-readable medium, or may be in the form of one or more signals.
  • Such a signal may be downloaded from an Internet site, or provided on a carrier signal, or provided in any other form.
  • FIG. 6 shows an electronic device that can implement the method according to the present application.
  • the electronic device conventionally includes a processor 610 and a computer program product in the form of a memory 620 or a computer readable medium.
  • Memory 620 may be electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM.
  • the memory 620 has a storage space 630 for program code 631 for performing any method steps in the methods described above.
  • the storage space 630 for program codes may include respective program codes 631 for respectively implementing various steps in the above methods. These program codes can be read from or written into one or more computer program products.
  • These computer program products comprise program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks.
  • Such a computer program product is typically a portable or fixed storage unit as described with reference to FIG. 7 .
  • the storage unit may have storage segments, storage spaces, etc. arranged similarly to the memory 620 in the computing processing device of FIG. 7 .
  • the program code can eg be compressed in a suitable form.
  • the memory unit includes computer readable code 631', i.e. code readable by, for example, a processor such as 610, which when executed by the electronic device causes the electronic device to perform each of the methods described above. step.
  • embodiments of the embodiments of the present application may be provided as methods, devices, or computer program products. Therefore, the embodiment of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to the embodiments of the present application. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor or processor of other programmable data processing terminal equipment to produce a machine such that instructions executed by the computer or processor of other programmable data processing terminal equipment Produce means for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing terminal to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the The instruction means implements the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例提供了一种关键点检测方法、电子设备、程序及存储介质,该方法包括:通过主干网络提取待检测图像的图像特征,得到特征图,所述待检测图像包括目标对象;根据多个姿态关键点模板,对所述特征图进行关键点检测,得到所述目标对象对应的至少一组关键点,其中,所述姿态关键点模板表征与姿态所对应的多个关键点的相对位置关系;对所述至少一组关键点进行筛选,得到所述目标对象的关键点检测结果。本申请由于基于多个姿态关键点模板可以直接确定待检测图像中所有目标对象的关键点检测结果,从而可以提高关键点的检测效率。

Description

关键点检测方法、电子设备、程序及存储介质
本申请要求在2021年5月24日提交中国专利局、申请号为202110568016.2、发明名称为“关键点检测方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像识别技术领域,特别是涉及一种关键点检测方法、电子设备、程序及存储介质。
背景技术
关键点检测在日常生活中的应用十分广泛,常见的人脸识别算法通常也依赖于关键点的检测,试妆、美颜、换脸等风靡一时的应用背后也是基于关键点检测技术,这也从另一方面强调了关键点检测精度的高要求。
现有技术中,常见的关键点检测方法通常是两阶段的,第一阶段由一个目标检测模型得到图像中目标对象的位置,第二阶段根据检测出来的目标框将目标对象抠出来再由关键点检测模型进行关键点的检测,这种方式通常也被称为“自顶向下”的方式。这种方式由于需要分步进行,在一个图像中包括多个目标对象时,需要进行多次抠图,而且需要多次使用关键点检测模型分别进行检测,检测效率较低。
发明内容
鉴于上述问题,提出了本申请实施例以便提供一种克服上述问题或者至少部分地解决上述问题的一种关键点检测方法、电子设备、程序及存储介质。
依据本申请实施例的第一方面,提供了一种关键点检测方法,包括:
通过主干网络提取待检测图像的图像特征,得到特征图,所述待检测图像包括目标对象;
根据多个姿态关键点模板,对所述特征图进行关键点检测,得到所述目标对象对应的至少一组关键点,其中,所述姿态关键点模板表征所述姿态关键点模板中多个关键点的相对位置关系;
对所述至少一组关键点进行筛选,得到所述目标对象的关键点检测结果。
依据本申请实施例的第二方面,提供了一种关键点检测方法,包括:
通过主干网络提取待检测图像的图像特征,得到特征图,所述待检测图像包括目标对象;
根据多个姿态关键点模板,对所述特征图进行关键点检测,得到所述目标对象对应的至少一组关键点偏移量,其中,所述姿态关键点模板表征所述姿态关键点模板中多个关键点的相对位置关系,每组关键点偏移量表征所述特征图中该组关键点与每个姿态关键点模板中关键点的偏移量;
根据所述至少一组关键点偏移量,得到所述目标对象的关键点检测结果。
依据本申请实施例的第三方面,提供了一种关键点检测装置,包括:
特征提取模块,用于通过主干网络提取待检测图像的图像特征,得到特征图,所述待检测图像包括目标对象;
关键点检测模块,用于根据多个姿态关键点模板,对所述特征图进行关键点检测,得到所述目标对象对应的至少一组关键点,其中,所述姿态关键点模板表征所述姿态关键点模板中多个关键点的相对位置关系;
检测结果确定模块,用于对所述至少一组关键点进行筛选,得到所述目标对象的关键点检测结果。
依据本申请实施例的第四方面,提供了一种关键点检测装置,包括:
特征提取模块,用于通过主干网络提取待检测图像的图像特征,得到特征图,所述待检测图像包括目标对象;
关键点检测模块,用于根据多个姿态关键点模板,对所述特征图进行关键点检测,得到所述目标对象对应的至少一组关键点偏移量,其中,所述姿态关键点模板表征所述姿态关键点模板中多个关键点的相对位置关系,每组关键点偏移量表征所述特征图中该组关键点与每个姿态关键点模板中关键点的偏移量;
检测结果确定模块,用于根据所述至少一组关键点偏移量,得到所述目标对象的关键点检测结果。
依据本申请实施例的第五方面,提供了一种电子设备,包括:处理器、存储器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现如第一方面或第二方面中所述的关键 点检测方法。
依据本申请实施例的第六方面,提供了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备上运行时,导致所述电子设备执行第一方面或第二方面所述的关键点检测方法。
依据本申请实施例的第七方面,提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有第六方面所述的计算机程序。
本申请实施例提供的关键点检测方法、电子设备、程序及存储介质,通过主干网络提取待检测图像的特征图后,根据多个姿态关键点模板,对特征图进行关键点检测,得到目标对象对应的至少一组关键点,对至少一组关键点进行筛选,得到目标对象的关键点检测结果,由于基于多个姿态关键点模板可以直接确定待检测图像中所有目标对象的关键点检测结果,不需要首先确定待检测图像中的目标对象的位置再进行关键点的检测,从而可以提高关键点的检测效率。
上述说明仅是本申请技术方案的概述,为了能够更清楚了解本申请的技术手段,而可依照说明书的内容予以实施,并且为了让本申请的上述和其它目的、特征和优点能够更明显易懂,以下特举本申请的具体实施方式。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。
图1是本申请实施例提供的一种关键点检测方法的步骤流程图;
图2a-2c是本申请实施例中的姿态关键点模板的示例图;
图3是本申请实施例提供的另一种关键点检测方法的步骤流程图;
图4是本申请实施例提供的一种关键点检测装置的结构框图;
图5是本申请实施例提供的另一种关键点检测装置的结构框图;
图6示意性地示出了用于执行根据本申请的方法的电子设备的框图;以及
图7示意性地示出了用于保持或者携带实现根据本申请的方法的程序代码的存储单元。
具体实施例
下面将参照附图更详细地描述本申请的示例性实施例。虽然附图中显示了本申请的示例性实施例,然而应当理解,可以以各种形式实现本申请而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本申请,并且能够将本申请的范围完整的传达给本领域的技术人员。
图1是本申请实施例提供的一种关键点检测方法的步骤流程图,如图1所示,该方法可以包括:
步骤101,通过主干网络提取待检测图像的图像特征,得到特征图,所述待检测图像包括目标对象。
其中,主干网络用于提取待检测图像的图像特征,例如可以是ResNet-50网络等。所述目标对象例如可以是人脸、人体、宠物、车辆等。
将待检测图像输入ResNet-50等主干网络,得到待检测图像的高维特征表示,即得到特征图。
步骤102,根据多个姿态关键点模板,对所述特征图进行关键点检测,得到所述目标对象对应的至少一组关键点,其中,所述姿态关键点模板表征与姿态所对应的多个关键点的相对位置关系。
本申请实施例中的关键点检测方法是“自底向上”的关键点检测方法,是一个单阶段完全端对端的关键点检测方法,可以输入任意的待检测图像,并直接得到该待检测图像中所有目标对象的关键点位置确定结果,不需要使用目标检测模型进行目标对象的位置确定和抠图,整个模型前向过程只需要1次,而传统的“自顶向下”方法由于需要进行抠图,前向的次数是和图片中目标对象的数目成正比的。
多个姿态关键点模板是预先定义的不同姿态所对应的关键点的相对位置关系,每个姿态关键点模板对应一种姿态下的关键点,由于有的姿态下会造成关键点的遮挡,所以每个姿态关键点模板中的关键点的数量不一定相同。图2a-2c是本申请实施例中的姿态关键点模板的示例图。如图2a-2c所示,在目标对象为人脸时给出了三种不同的姿态关键点模板,一个姿态关键点模板定义了一种姿态所对应的关键点的相对位置关系。所述相对位置关系例如可以是各关键点相对于其中一个中心点的位置,每个姿态关键点模板均包括相同的中心点,在目标对象为人脸时,中心点例如可以是关 键点中的鼻子中心(即鼻尖位置)所对应的一个点,本申请实施例对此不作限定。
提取到待检测图像的特征图后,可以直接将该特征图输入关键点检测模型,通过关键点检测模型基于多个姿态关键点模板来对待检测图像中的关键点进行回归计算,得到目标对象对应的至少一组关键点。
步骤103,对所述至少一组关键点进行筛选,得到所述目标对象的关键点检测结果。
在得到目标对象对应的至少一组关键点后,对至少一组关键点进行筛选,以筛选出确实含有目标对象的关键点组,得到目标对象在待检测图像中的关键点检测结果。本申请实施例中的关键点检测模型不需要先对待检测图像进行目标对象的抠图,基于姿态关键点模板可以直接得到待检测图像中所有目标对象的关键点检测结果,在待检测图像中包括多个目标对象时,通过一次检测便可以得到多个目标对象的关键点检测结果。
在本申请的一个实施例中,根据多个姿态关键点模板,对所述特征图进行关键点检测,得到所述目标对象对应的至少一组关键点,包括:多个候选关键点偏移量和各候选关键点偏移量对应的置信度,其中,所述候选关键点偏移量为所述特征图中特征点与每个姿态关键点模板中关键点的偏移量;根据所述多个候选关键点偏移量和所述置信度,确定所述目标对象对应的至少一组关键点。
其中,所述特征点可以是特征图中的一个像素点,或者也可以是特征图中特定区域中的多个像素点的集合。
将特征图输入关键点检测模型,关键点检测模型基于多个姿态关键点模板对特征图进行关键点检测,来回归特征图中特征点与每个姿态关键点模板中关键点的偏移量,得到多个候选关键点偏移量,并确定候选关键点偏移量对应的置信度。在得到多个候选关键点偏移量和置信度后,可以先基于置信度对候选关键点偏移量进行初步筛选,以选出置信度满足预设条件的候选关键点偏移量,基于关键点模板、特征点和筛选出的候选关键点偏移量,确定关键点坐标,得到目标对象对应的至少一组关键点;或者,在得到多个候选关键点偏移量和置信度后,可以基于关键点模板、特征点和关键点偏移量,确定一组候选关键点偏移量所对应的一组关键点坐标, 得到目标对象对应的至少一组关键点。
在本申请的一个实施例中,根据所述多个候选关键点偏移量和所述置信度,确定所述目标对象对应的至少一组关键点,包括:从所述多个候选关键点偏移量中筛选出置信度大于或等于置信度阈值的关键点偏移量的组合,将筛选出的关键点偏移量的组合确定为至少一组关键点偏移量,每组关键点偏移量表征所述特征图中该组关键点与每个姿态关键点模板中关键点的偏移量;根据所述至少一组关键点偏移量和每组关键点偏移量对应的姿态关键点模板,确定所述目标对象对应的至少一组关键点。
多个候选关键点偏移量中有的候选关键点偏移量组合的置信度可能比较小,这样的候选关键点偏移量是不会得到正确的关键点的,所以可以首先从多个候选关键点偏移量中筛选出置信度大于或等于置信度阈值的关键点偏移量的组合,将筛选出的一个关键点偏移量的组合确定为一组关键点偏移量,筛选出多个关键点偏移量的组合时得到多组关键点偏移量,从而得到至少一组关键点偏移量。基于至少一组关键点偏移量、每组关键点偏移量对应的姿态关键点模板以及得到该组关键点偏移量的特征点,可以确定至少一组关键点的坐标,得到目标对象对应的至少一组关键点。先基于置信度对候选关键点进行初步筛选,可以减少计算量,提高处理速度。
在本申请的一个实施例中,对所述至少一组关键点进行筛选,得到所述目标对象的关键点检测结果,包括:根据所述至少一组关键点和每组关键点对应的置信度,确定所述目标对象的关键点检测结果。
在基于多个候选关键点偏移量和置信度确定目标对象对应的至少一组关键点后,至少一组关键点中每组关键点的置信度为得到该组关键点的候选关键点偏移量的置信度,从而可以基于至少一组关键点各自对应的置信度对至少一组关键点进行筛选,确定目标对象的关键点检测结果。在对至少一组关键点进行筛选时,可以基于非极大值抑制处理进行筛选,在进行非极大值抑制处理时,可以直接对至少一组关键点进行非极大值抑制处理,也可以确定每组关键点的目标框后,对至少一组关键点对应的目标框进行非极大值抑制处理。
在一种可选的实施方式中,所述根据所述至少一组关键点和每组关键点对应的置信度,确定所述目标对象的关键点检测结果,包括:分别确定 每组关键点对应的目标框;根据每组关键点对应的置信度,对所述至少一组关键点对应的目标框进行非极大值抑制处理,得到所述目标对象的关键点检测结果。
其中,所述目标框表征目标对象所在的位置。
在得到至少一组关键点和与每组关键点对应的置信度后,可以首先分别确定每组关键点对应的目标框,之后基于每组关键点对应的置信度,对至少一组关键点对应的目标框进行非极大值抑制处理,得到最终的关键点检测结果。在确定关键点对应的目标框后,基于目标框进行非极大值抑制处理,相对于直接对关键点进行非极大值抑制处理,可以减少处理的数据量,进一步提高检测效率。
在一种可选的实施方式中,所述分别确定每组关键点对应的目标框,包括:分别将每组关键点对应的最小外接矩形确定为该组关键点对应的目标框。
在确定一组关键点对应的目标框时,可以首先确定该组关键点的最小外接矩形,将该最小外接矩形确定为该组关键点对应的目标框。
通过在进行特征点检测的同时得到目标对象对应的至少一组关键点和与每组关键点对应的置信度,并基于每组关键点对应的置信度对至少一组关键点进行筛选,得到关键点检测结果,可以准确的筛选出目标对象的关键点,提高关键点检测的准确性。
在本申请的一个实施例中,根据多个姿态关键点模板,对所述特征图进行关键点检测,得到多个候选关键点偏移量和各候选关键点偏移量对应的置信度,包括:分别基于所述特征图中的每个特征点,将多个所述姿态关键点模板与所述特征图进行匹配,确定所述特征点与每个姿态关键点模板中关键点的偏移量和所述特征点的置信度,得到至少一组关键点偏移量和与每组关键点偏移量对应的置信度。
在本申请的一些实施例中,分别将特征图中的每个特征点作为特征图与姿态关键点模板进行匹配的中心点,将多个姿态关键点模板分别与特征图进行匹配,根据多个姿态关键点模板中关键点与中心点的相对位置关系,确定中心点与每个姿态关键点模板中关键点的偏移量,一个特征点对应多个姿态关键点模板可以得到多组关键点偏移量,对于多个特征点可以 得到多组关键点偏移量,从而经过对每个特征点与姿态关键点模板匹配后可以得到至少一组关键点偏移量,在匹配的过程中可以得到每组关键点偏移量对应的置信度。
在本申请的一个实施例中,所述根据每组关键点对应的置信度,对所述至少一组关键点进行筛选,得到所述目标对象的关键点检测结果,包括:根据所述至少一组关键点和每组关键点对应的置信度,对所述至少一组关键点进行非极大值抑制处理,得到所述目标对象的关键点检测结果;
所述方法还包括:根据所述关键点检测结果,确定目标框,所述目标框表征目标对象所在的位置。
在根据每组关键点的置信度对至少一组关键点进行筛选时,可以通过对至少一组关键点进行非极大值抑制处理来进行筛选,即直接对至少一组关键点进行非极大值抑制处理,得到目标对象的关键点检测结果。在得到目标对象的关键点检测结果后,可以确定关键点检测结果中属于同一目标对象的一组关键点,并确定同一组关键点的最小外接矩形,将该最小外接矩形确定为该同一组关键点的目标框。通过确定目标框可以同时显示关键点检测结果和对应的目标框。
本实施例提供的关键点检测方法,通过主干网络提取待检测图像的特征图后,根据多个姿态关键点模板,对特征图进行关键点检测,得到目标对象对应的至少一组关键点,对至少一组关键点进行筛选,得到目标对象的关键点检测结果,由于基于多个姿态关键点模板可以直接确定待检测图像中所有目标对象的关键点检测结果,不需要首先确定待检测图像中的目标对象的位置再进行关键点的检测,从而可以提高关键点的检测效率。
在上述技术方案的基础上,根据多个姿态关键点模板,对所述特征图进行关键点检测,得到所述目标对象对应的至少一组关键点,包括:根据多个姿态关键点模板,通过关键点检测模型对所述特征图进行关键点检测,得到所述目标对象对应的至少一组关键点偏移量和与每组关键点偏移量对应的置信度。
其中,关键点检测模型可以基于多个姿态关键点模板回归特征图中关键点与姿态关键点模板中关键点的偏移量,并确定每个每组关键点偏移量的置信度。
将特征图输入关键点检测模型,关键点检测模型分别将特征图中的每个特征点作为与姿态关键点模板进行匹配的中心点,即作为锚点(anchor),分别将每个姿态关键点模板附着于锚点上,基于所附着的每个姿态关键点模板回归确定关键点相对于锚点的关键点偏移量,每个姿态关键点模板得到一组关键点偏移量,对于多个姿态关键点模板会得到与特征点对应的多组关键点偏移量,同时关键点检测模型对于每组关键点偏移量会输出对应的置信度。例如,假设特征图的大小为5×5,即特征图包含25个特征点,有24个姿态关键点模板,则通过关键点回归网络进行回归计算后,得到25×24组关键点偏移量。其中,回归关键点偏移量时可以基于卷积层进行回归。
通过将多个姿态关键点模板分别以每个特征点作为匹配中心与特征图进行匹配,来回归关键点更精确的位置,从而对于一个图像包括多个目标对象时,也可以一次检测出来,提高了检测效率。
在本申请的一些实施例中,所述方法还包括:基于所述姿态关键点模板和样本图像,对初始关键点检测模型进行训练,得到所述关键点检测模型。
初始关键点检测模型可以是对其中的网络参数进行随机初始化得到的,通过基于姿态关键点模板和样本图像,对初始关键点检测模型进行训练,得到训练后的关键点检测模型。
在本申请的一些实施例中,所述关键点检测模型的训练步骤包括:
获取样本图像和样本图像中目标对象对应的关键点标注;通过主干网络提取样本图像的图像特征,得到样本图像对应的样本特征图;基于所述姿态关键点模板,通过所述初始关键点检测模型对所述样本特征图进行关键点检测,得到初始关键点检测模型输出的关键点集的预测偏移量;基于所述预测偏移量、预测偏移量对应的姿态关键点模板和所述关键点标注,对初始关键点检测模型进行训练,得到训练后的关键点检测模型。
其中,关键点标注可以是借助于其他的关键点检测方法检测得到的关键点位置坐标的标注,例如,可以是使用目标检测模型检测到样本图像中的目标对象后,对检测到的目标对象进行抠图,并使用传统的关键点检测模型检测得到关键点的坐标。
获取样本图像后,借助于其他的关键点检测方法对样本图像进行关键点检测,得到样本图像中目标对象对应的关键点标注,将样本图像输入主干网络,通过主干网络对样本图像进行特征提取,得到样本图像对应的样本特征图,将样本特征图输入初始关键点检测模型,初始关键点检测模型基于姿态关键点模板对样本特征图进行关键点检测,得到多个关键点集的预测偏移量,基于预测偏移量、预测偏移量对应的姿态关键点模板确定每个关键点集中的关键点坐标,基于关键点坐标和关键点标注,对初始关键点检测模型的网络参数进行调整,得到训练后的关键点检测模型。
在上述技术方案的基础上,所述关键点检测模型的训练步骤还包括基于所述姿态关键点模板中的关键点与所述关键点标注之间的距离或偏移量,确定每个姿态关键点模板在样本特征图中每个样本特征点处对应的置信度标注;
基于所述预测偏移量、预测偏移量对应的姿态关键点模板和所述关键点标注,对初始关键点检测模型进行训练,得到训练后的关键点检测模型,包括:基于预测偏移量、预测偏移量对应的姿态关键点模板、所述关键点标注和置信度标注,对初始关键点检测模型进行训练,得到训练后的关键点检测模型。
当有多个姿态关键点模板时,最后会需要确定取哪个姿态关键点模板的结果作为输出,这时可以通过在关键点检测模型中设置一个分类网络,通过分类网络来预测一个特征点附着的多个姿态关键点模板的置信度,这就需要确定样本特征图对应的置信度标注,以对关键点检测模型进行训练监督。分别将样本特征图中的每个特征点作为与姿态关键点模板进行匹配的中心点,确定将每个姿态关键点模板分别附着于中心点时,确定姿态关键点模板中的关键点相对于关键点标注的距离或偏移量,从而基于距离或偏移量,可以确定样本特征图中的每个特征点相对于每个姿态关键点模板的置信度标注。一个特征点对应多个姿态关键点模板会得到与姿态关键点模板数量相同的置信度标注结果,例如,具有24个姿态关键点模板时,样本特征图中的一个特征点会对应24个置信度标注结果。
在得到样本特征图的置信度标注后,可以基于预测偏移量、预测偏移量对应的姿态关键点模板、关键点标注和置信度标注,对初始关键点检测 模型进行训练,满足训练结束条件时结束训练,得到训练后的关键点检测模型。
在基于所述姿态关键点模板中的关键点与所述关键点标注之间的偏移量,确定每个姿态关键点模板在样本特征图中每个样本特征点处对应的置信度标注时,可以将所述距离或偏移量映射到0和1之间,得到样本特征图中每个特征点相对于每个姿态关键点模板的置信度标注。在一些实施例中,姿态关键点模板中的关键点与所述关键点标注之间的距离即为所述姿态关键点模板中的关键点与所述关键点标注之间的偏移量。
可以通过sigmoid等方式,将在每个特征点时姿态关键点模板与关键点标注的距离或偏移量映射到0和1之间,将得到的值作为每个特征点相对于每个姿态关键点模板的置信度标注。由于距离越大置信度越低,距离越小置信度越高,所以距离越大映射得到的分类标注越小,距离越小映射得到的分类标注越大。
在本申请的一些实施例中,所述初始关键点检测模型还输出所述预测偏移量对应的预测置信度。
在上述技术方案的基础上,基于所述预测偏移量、预测偏移量对应的姿态关键点模板、所述关键点标注和置信度标注,对初始关键点检测模型进行训练,得到训练后的关键点检测模型,包括:
根据所述预测偏移量、预测偏移量对应的姿态关键点模板、所述关键点标注,确定回归损失值;根据预测置信度和置信度标注确定置信度损失值;
根据所述回归损失值和所述置信度损失值,对所述初始关键点检测模型的网络参数进行调整,得到训练后的关键点检测模型。
根据预测偏移量、预测偏移量对应的姿态关键点模板可以得到与姿态关键点模板对应的关键点的预测坐标,将关键点的预测坐标和样本特征图对应的关键点标注代入回归损失函数,得到回归损失值;并将每组预测偏移量对应的预测置信度和置信度标注代入置信度损失函数,得到置信度损失值。将回归损失值和置信度损失值相加作为目标损失值,基于目标损失值对初始关键点检测模型的网络参数进行调整,得到训练后的关键点检测模型。
在本申请的一些实施例中,可以根据关键点检测模型的输出的预测偏移量与姿态关键点模板得到关键点的预测坐标,将关键点的预测坐标与关键点标注之差取绝对值作为回归损失函数。置信度损失函数可以是交叉熵损失函数(Cross-Entropy Loss)。
通过回归损失和置信度损失来约束关键点检测模型的训练,使得训练后的关键点检测模型可以较为准确的给出关键点和对应的置信度,从而可以提高整体的检测准确性。
图3是本申请实施例提供的一种关键点检测方法的步骤流程图,如图3所示,该方法可以包括:
步骤301,通过主干网络提取待检测图像的图像特征,得到特征图,所述待检测图像包括目标对象。
其中,主干网络用于提取待检测图像的图像特征,例如可以是ResNet-50网络等。所述目标对象例如可以是人脸、人体、宠物、车辆等。
将待检测图像输入ResNet-50等主干网络,得到待检测图像的高维特征表示,即得到特征图。
步骤302,根据多个姿态关键点模板,对所述特征图进行关键点检测,得到所述目标对象对应的至少一组关键点偏移量,其中,所述姿态关键点模板表征所述与姿态所对应的多个关键点的相对位置关系,每组关键点偏移量表征所述特征图中该组关键点与每个姿态关键点模板中关键点的偏移量。
其中,多个姿态关键点模板是预先定义的不同姿态所对应的关键点的相对位置关系,每个姿态关键点模板对应一种姿态下的关键点,由于有的姿态下会造成关键点的遮挡,所以每个姿态关键点模板中的关键点的数量不一定相同。
基于多个姿态关键点模板,回归特征图中特征点所对应的关键点偏移量,一个姿态关键点模板在一个特征点处可回归出一组关键点偏移量,从而对于特征图中的特征点和多个姿态关键点模板,可以得到目标对象对应的至少一组关键点偏移量。基于每组关键点偏移量和对应的姿态关键点模板可以得到一组关键点的坐标。
在本申请的一个实施例中,根据多个姿态关键点模板,对所述特征图 进行关键点检测,得到所述目标对象对应的至少一组关键点偏移量,包括:根据多个姿态关键点模板,对所述特征图进行关键点检测,得到多个候选关键点偏移量和各候选关键点偏移量对应的置信度;根据所述多个候选关键点偏移量和各候选关键点偏移量对应的置信度,确定所述目标对象对应的至少一组关键点偏移量。
基于多个姿态关键点模板,回归特征图中特征点所对应的关键点偏移量,基于每个特征点会得到多组关键点偏移量,同时可以得到每组关键点偏移量对应的置信度,这些关键点偏移量为候选关键点偏移量,基于每组候选关键点偏移量的置信度,可以从候选关键点偏移量中筛选出置信度大于或等于置信度阈值的候选关键点偏移量的组合,并对筛选出的候选关键点偏移量的组合进行非极大值抑制处理,得到目标对象对应的至少一组关键点偏移量。通过基于候选关键点偏移量对应的置信度来确定的目标对象对应的至少一组关键点偏移量,可以得到较为准确的关键点偏移量。
步骤303,根据所述至少一组关键点偏移量,得到所述目标对象的关键点检测结果。
由于每组关键点偏移量表征特征图中该组关键点与每个姿态关键点模板中关键点的偏移量,从而基于至少一组关键点偏移量和对应的姿态关键点模板,可以得到目标对象的关键点检测结果。
在本申请的一个实施例中,根据所述至少一组关键点偏移量,得到所述目标对象的关键点检测结果,包括:
根据所述至少一组关键点偏移量和所述至少一组关键点偏移量对应的姿态关键点模板,确定所述目标对象的关键点检测结果。
将至少一组关键点偏移量、每组关键点偏移量对应的姿态关键点模板中的关键点的坐标以及得到该组关键点偏移量的特征点的坐标进行相加,得到目标对象的关键点的坐标,即得到目标对象的关键点检测结果。
本实施例提供的关键点检测方法,通过主干网络提取待检测图像的特征图后,根据多个姿态关键点模板,对特征图进行关键点检测,得到目标对象对应的至少一组关键点偏移量,根据至少一组关键点偏移量,确定目标对象的关键点检测结果,由于基于多个姿态关键点模板可以直接确定待检测图像中所有目标对象的关键点偏移量,从而基于关键点偏移量可以得 到所有目标对象的关键点检测结果,不需要首先确定待检测图像中的目标对象的位置再进行关键点的检测,从而可以提高关键点的检测效率。
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请实施例所必须的。
图4是本申请实施例提供的一种关键点检测装置的结构框图,如图4所示,该关键点检测装置可以包括:
特征提取模块401,用于通过主干网络提取待检测图像的图像特征,得到特征图,所述待检测图像包括目标对象;
关键点检测模块402,用于根据多个姿态关键点模板,对所述特征图进行关键点检测,得到所述目标对象对应的至少一组关键点,其中,所述姿态关键点模板表征与姿态所对应的多个关键点的相对位置关系;
检测结果确定模块403,用于对所述至少一组关键点进行筛选,得到所述目标对象的关键点检测结果。
可选的,所述关键点检测模块包括:
关键点检测单元,用于根据多个姿态关键点模板,对所述特征图进行关键点检测,得到多个候选关键点偏移量和各候选关键点偏移量对应的置信度,其中,所述候选关键点偏移量为所述特征图中特征点与每个姿态关键点模板中关键点的偏移量;
关键点确定单元,用于根据所述多个候选关键点偏移量和所述置信度,确定所述目标对象对应的至少一组关键点。
可选的,所述检测结果确定模块具体用于:
根据所述至少一组关键点和每组组关键点对应的置信度,确定所述目标对象的关键点检测结果。
可选的,所述检测结果确定模块包括:
目标框确定单元,用于分别确定每组关键点对应的目标框;
检测结果确定单元,用于根据每组关键点对应的置信度,对所述至少 一组关键点对应的目标框进行非极大值抑制处理,得到所述目标对象的关键点检测结果。
可选的,所述目标框确定单元具体用于:
分别将每组关键点对应的最小外接矩形确定为每组关键点对应的目标框。
可选的,所述关键点确定单元具体用于:
从所述多个候选关键点偏移量中筛选出置信度大于或等于置信度阈值的关键点偏移量的组合,将筛选出的关键点偏移量的组合确定为至少一组关键点偏移量,每组关键点偏移量表征所述特征图中该组关键点与每个姿态关键点模板中关键点的偏移量;
根据所述至少一组关键点偏移量和每组关键点偏移量对应的姿态关键点模板,确定所述目标对象对应的至少一组关键点。
可选的,所述关键点检测单元具体用于:
分别基于所述特征图中的每个特征点,将多个所述姿态关键点模板与所述特征图进行匹配,确定所述特征点与每个姿态关键点模板中关键点的偏移量和所述特征点的置信度,得到至少一组关键点偏移量和与每组关键点偏移量对应的置信度。
可选的,所述检测结果确定模块包括:
检测结果确定单元,用于根据所述至少一组关键点和每组关键点对应的置信度,对所述至少一组关键点进行非极大值抑制处理,得到所述目标对象的关键点检测结果;
所述装置还包括:
目标框确定模块,用于根据所述关键点检测结果,确定目标框,所述目标框表征目标对象所在的位置。
可选的,所述关键点检测模块具体用于:
根据多个姿态关键点模板,通过关键点检测模型对所述特征图进行关键点检测,得到所述目标对象对应的至少一组关键点偏移量和与每组关键点偏移量对应的置信度。
可选的,所述装置还包括:
训练模块,用于基于所述姿态关键点模板和样本图像,对初始关键点 检测模型进行训练,得到所述关键点检测模型。
可选的,所述训练模块包括:
样本获取单元,用于获取样本图像和样本图像中目标对象对应的关键点标注;
样本特征提取单元,用于通过主干网络提取样本图像的图像特征,得到样本图像对应的样本特征图;
模型处理单元,用于基于所述姿态关键点模板,通过所述初始关键点检测模型对所述样本特征图进行关键点检测,得到初始关键点检测模型输出的关键点集的预测偏移量;
模型训练单元,用于基于所述预测偏移量、预测偏移量对应的姿态关键点模板和所述关键点标注,对初始关键点检测模型进行训练,得到训练后的关键点检测模型。
可选的,所述训练模块还包括:
置信度标注确定单元,用于基于所述姿态关键点模板中的关键点与所述关键点标注之间的距离或偏移量,确定每个姿态关键点模板在样本特征图中每个样本特征点处对应的置信度标注;
所述模型训练单元具体用于:基于所述预测偏移量、预测偏移量对应的姿态关键点模板、所述关键点标注和置信度标注,对初始关键点检测模型进行训练,得到训练后的关键点检测模型。
可选的,所述初始关键点检测模型还输出所述预测偏移量对应的预测置信度。
可选的,所述模型训练单元具体用于:
根据所述预测偏移量、预测偏移量对应的姿态关键点模板、所述关键点标注,确定回归损失值;根据预测置信度和置信度标注确定置信度损失值;
根据所述回归损失值和所述置信度损失值,对所述初始关键点检测模型的网络参数进行调整,得到训练后的关键点检测模型。
本实施例提供的关键点检测装置,通过主干网络提取待检测图像的特征图后,根据多个姿态关键点模板,对特征图进行关键点检测,得到至少一组关键点,对至少一组关键点进行筛选,得到目标对象的关键点检测结 果,由于基于多个姿态关键点模板可以直接确定待检测图像中所有目标对象的关键点检测结果,不需要首先确定待检测图像中的目标对象的位置再进行关键点的检测,从而可以提高关键点的检测效率。
图5是本申请实施例提供的一种关键点检测装置的结构框图,如图5所示,该关键点检测装置可以包括:
特征提取模块501,用于通过主干网络提取待检测图像的图像特征,得到特征图,所述待检测图像包括目标对象;
关键点检测模块502,用于根据多个姿态关键点模板,对所述特征图进行关键点检测,得到所述目标对象对应的至少一组关键点偏移量,其中,所述姿态关键点模板表征所述与姿态所对应的多个关键点的相对位置关系,每组关键点偏移量表征所述特征图中该组关键点与每个姿态关键点模板中关键点的偏移量;
检测结果确定模块503,用于根据所述至少一组关键点偏移量,得到所述目标对象的关键点检测结果。
可选的,所述关键点检测模块包括:
关键点检测单元,用于根据多个姿态关键点模板,对所述特征图进行关键点检测,得到多个候选关键点偏移量和各候选关键点偏移量对应的置信度;
偏移量确定单元,用于根据所述多个候选关键点偏移量和各候选关键点偏移量对应的置信度,确定所述目标对象对应的至少一组关键点偏移量。
可选的,所述检测结果确定模块具体用于:
根据所述至少一组关键点偏移量和所述至少一组关键点偏移量对应的姿态关键点模板,确定所述目标对象的关键点检测结果。
本实施例提供的关键点检测装置,通过主干网络提取待检测图像的特征图后,根据多个姿态关键点模板,对特征图进行关键点检测,得到目标对象对应的至少一组关键点偏移量,根据至少一组关键点偏移量,确定目标对象的关键点检测结果,由于基于多个姿态关键点模板可以直接确定待检测图像中的关键点偏移量,从而基于关键点偏移量可以得到所有目标对象的关键点检测结果,不需要首先确定待检测图像中的目标对象的位置再 进行关键点的检测,从而可以提高关键点的检测效率。
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。
本申请的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本申请实施例的计算处理设备中的一些或者全部部件的一些或者全部功能。本申请还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本申请的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。
例如,图6示出了可以实现根据本申请的方法的电子设备。该电子设备传统上包括处理器610和以存储器620形式的计算机程序产品或者计算机可读介质。存储器620可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器620具有用于执行上述方法中的任何方法步骤的程序代码631的存储空间630。例如,用于程序代码的存储空间630可以包括分别用于实现上面的方法中的各种步骤的各个程序代码631。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘,紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为如参考图7所述的便携式或者固定存储单元。该存储单元可以具有与图7的计算处理设备中的存储器620类似布置的存储段、存储空间等。程序代码可以例如以适当形式进行压缩。通常,存储单元包 括计算机可读代码631’,即可以由例如诸如610之类的处理器读取的代码,这些代码当由电子设备运行时,导致该电子设备执行上面所描述的方法中的各个步骤。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。
本领域内的技术人员应明白,本申请实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请实施例是参照根据本申请实施例的方法、终端设备(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请实施例的优选实施例,但本领域内的技术人员一 旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请实施例范围的所有变更和修改。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。
以上对本申请所提供的一种关键点检测方法、电子设备、程序及存储介质,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (20)

  1. 一种关键点检测方法,其特征在于,包括:
    通过主干网络提取待检测图像的图像特征,得到特征图,所述待检测图像包括目标对象;
    根据多个姿态关键点模板,对所述特征图进行关键点检测,得到所述目标对象对应的至少一组关键点,其中,所述姿态关键点模板表征与姿态所对应的多个关键点的相对位置关系;
    对所述至少一组关键点进行筛选,得到所述目标对象的关键点检测结果。
  2. 根据权利要求1所述的方法,其特征在于,根据多个姿态关键点模板,对所述特征图进行关键点检测,得到所述目标对象对应的至少一组关键点,包括:
    根据多个姿态关键点模板,对所述特征图进行关键点检测,得到多个候选关键点偏移量和各候选关键点偏移量对应的置信度,其中,所述候选关键点偏移量为所述特征图中特征点与每个姿态关键点模板中关键点的偏移量;
    根据所述多个候选关键点偏移量和所述置信度,确定所述目标对象对应的至少一组关键点。
  3. 根据权利要求2所述的方法,其特征在于,对所述至少一组关键点进行筛选,得到所述目标对象的关键点检测结果,包括:
    根据所述至少一组关键点和每组关键点对应的置信度,确定所述目标对象的关键点检测结果。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述至少一组关键点和每组关键点对应的置信度,确定所述目标对象的关键点检测结果,包括:
    分别确定每组关键点对应的目标框,所述目标框表征所述目标对象所在的位置;
    根据每组关键点对应的置信度,对所述至少一组关键点对应的目标框进行非极大值抑制处理,得到所述目标对象的关键点检测结果。
  5. 根据权利要求4所述的方法,其特征在于,所述分别确定每组关键点对应的目标框,包括:
    分别将每组关键点对应的最小外接矩形确定为每组关键点对应的目标框。
  6. 根据权利要求2-5任一项所述的方法,其特征在于,根据所述多个候选关键点偏移量和所述置信度,确定所述目标对象对应的至少一组关键点,包括:
    从所述多个候选关键点偏移量中筛选出置信度大于或等于置信度阈值的关键点偏移量的组合,将筛选出的关键点偏移量的组合确定为至少一组关键点偏移量,每组关键点偏移量表征所述特征图中该组关键点与每个姿态关键点模板中关键点的偏移量;
    根据所述至少一组关键点偏移量和每组关键点偏移量对应的姿态关键点模板,确定所述目标对象对应的至少一组关键点。
  7. 根据权利要求6所述的方法,其特征在于,所述对所述至少一组关键点进行筛选,得到所述目标对象的关键点检测结果,包括:
    根据所述至少一组关键点和每组关键点对应的置信度,对所述至少一组关键点进行非极大值抑制处理,得到所述目标对象的关键点检测结果;
    所述方法还包括:
    根据所述关键点检测结果,确定目标框,所述目标框表征目标对象所在的位置。
  8. 根据权利要求2-7任一项所述的方法,其特征在于,根据多个姿态关键点模板,对所述特征图进行关键点检测,得到多个候选关键点偏移量和各候选关键点偏移量对应的置信度,包括:
    分别基于所述特征图中的每个特征点,将多个所述姿态关键点模板与所述特征图进行匹配,确定所述特征点与每个姿态关键点模板中关键点的偏移量和所述特征点的置信度,得到至少一组关键点偏移量和与每组关键点偏移量对应的置信度。
  9. 根据权利要求1-8任一项所述的方法,其特征在于,根据多个姿态关键点模板,对所述特征图进行关键点检测,得到所述目标对象对应的至少一组关键点,包括:
    根据多个姿态关键点模板,通过关键点检测模型对所述特征图进行关键点检测,得到所述目标对象对应的至少一组关键点偏移量和与每组关键点偏移量对应的置信度。
  10. 根据权利要求8所述的方法,其特征在于,还包括:基于所述姿态关键点模板和样本图像,对初始关键点检测模型进行训练,得到所述关键点检测模型。
  11. 根据权利要求10所述的方法,其特征在于,所述关键点检测模型的训练步骤包括:
    获取样本图像和样本图像中目标对象对应的关键点标注;
    通过主干网络提取样本图像的图像特征,得到样本图像对应的样本特征图;
    基于所述姿态关键点模板,通过所述初始关键点检测模型对所述样本特征图进行关键点检测,得到初始关键点检测模型输出的关键点集的预测偏移量;
    基于所述预测偏移量、预测偏移量对应的姿态关键点模板和所述关键点标注,对初始关键点检测模型进行训练,得到训练后的关键点检测模型。
  12. 根据权利要求11所述的方法,其特征在于,所述关键点检测模型的训练步骤还包括:
    基于所述姿态关键点模板中的关键点与所述关键点标注之间的距离或偏移量,确定每个姿态关键点模板在样本特征图中每个样本特征点处对应的置信度标注;
    基于所述预测偏移量、预测偏移量对应的姿态关键点模板和所述关键点标注,对初始关键点检测模型进行训练,得到训练后的关键点检测模型,包括:
    基于所述预测偏移量、预测偏移量对应的姿态关键点模板、所述关键点标注和置信度标注,对初始关键点检测模型进行训练,得到训练后的关键点检测模型。
  13. 根据权利要求11或12所述的方法,其特征在于,所述初始关键点检测模型还输出所述预测偏移量对应的预测置信度。
  14. 根据权利要求12所述的方法,其特征在于,基于所述预测偏移量、预测偏移量对应的姿态关键点模板、所述关键点标注和置信度标注,对初始关键点检测模型进行训练,得到训练后的关键点检测模型,包括:
    根据所述预测偏移量、预测偏移量对应的姿态关键点模板、所述 关键点标注,确定回归损失值;根据预测置信度和置信度标注确定置信度损失值;
    根据所述回归损失值和所述置信度损失值,对所述初始关键点检测模型的网络参数进行调整,得到训练后的关键点检测模型。
  15. 一种关键点检测方法,其特征在于,包括:
    通过主干网络提取待检测图像的图像特征,得到特征图,所述待检测图像包括目标对象;
    根据多个姿态关键点模板,对所述特征图进行关键点检测,得到所述目标对象对应的至少一组关键点偏移量,其中,所述姿态关键点模板表征与姿态所对应的多个关键点的相对位置关系,每组关键点偏移量表征所述特征图中该组关键点与每个姿态关键点模板中关键点的偏移量;
    根据所述至少一组关键点偏移量,得到所述目标对象的关键点检测结果。
  16. 根据权利要求15所述的方法,其特征在于,根据多个姿态关键点模板,对所述特征图进行关键点检测,得到所述目标对象对应的至少一组关键点偏移量,包括:
    根据多个姿态关键点模板,对所述特征图进行关键点检测,得到多个候选关键点偏移量和各候选关键点偏移量对应的置信度;
    根据所述多个候选关键点偏移量和各候选关键点偏移量对应的置信度,确定所述目标对象对应的至少一组关键点偏移量。
  17. 根据权利要求15或16所述的方法,其特征在于,根据所述至少一组关键点偏移量,得到所述目标对象的关键点检测结果,包括:
    根据所述至少一组关键点偏移量和所述至少一组关键点偏移量对应的姿态关键点模板,确定所述目标对象的关键点检测结果。
  18. 一种电子设备,其特征在于,包括:
    存储器,其中存储有计算机可读代码;
    一个或多个处理器,当所述计算机可读代码被所述一个或多个处理器执行时,所述电子设备执行如权利要求1-14或权利要求15-17任一项所述的关键点检测方法。
  19. 一种计算机程序,其特征在于,包括计算机可读代码,当所述计算机可读代码在电子设备上运行时,导致所述电子设备执行根据权利要求1-14 或权利要求15-17任一项所述的关键点检测方法。
  20. 一种计算机可读存储介质,其特征在于,其中存储了如权利要求19所述的计算机程序。
PCT/CN2022/081229 2021-05-24 2022-03-16 关键点检测方法、电子设备、程序及存储介质 WO2022247403A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110568016.2 2021-05-24
CN202110568016.2A CN113378852A (zh) 2021-05-24 2021-05-24 关键点检测方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022247403A1 true WO2022247403A1 (zh) 2022-12-01

Family

ID=77571822

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/081229 WO2022247403A1 (zh) 2021-05-24 2022-03-16 关键点检测方法、电子设备、程序及存储介质

Country Status (2)

Country Link
CN (1) CN113378852A (zh)
WO (1) WO2022247403A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378852A (zh) * 2021-05-24 2021-09-10 北京迈格威科技有限公司 关键点检测方法、装置、电子设备及存储介质
CN116563371A (zh) * 2023-03-28 2023-08-08 北京纳通医用机器人科技有限公司 关键点确定方法、装置、设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017037584A (ja) * 2015-08-14 2017-02-16 株式会社デンソーアイティーラボラトリ キーポイント検出器、キーポイント検出方法、及びキーポイント検出プログラム
CN109584276A (zh) * 2018-12-04 2019-04-05 北京字节跳动网络技术有限公司 关键点检测方法、装置、设备及可读介质
CN110738110A (zh) * 2019-09-11 2020-01-31 北京迈格威科技有限公司 基于锚点的人脸关键点检测方法、装置、***和存储介质
CN111160288A (zh) * 2019-12-31 2020-05-15 北京奇艺世纪科技有限公司 手势关键点检测方法、装置、计算机设备和存储介质
CN112733700A (zh) * 2021-01-05 2021-04-30 风变科技(深圳)有限公司 人脸关键点检测方法、装置、计算机设备和存储介质
CN112784739A (zh) * 2021-01-21 2021-05-11 北京百度网讯科技有限公司 模型的训练方法、关键点定位方法、装置、设备和介质
CN113378852A (zh) * 2021-05-24 2021-09-10 北京迈格威科技有限公司 关键点检测方法、装置、电子设备及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017037584A (ja) * 2015-08-14 2017-02-16 株式会社デンソーアイティーラボラトリ キーポイント検出器、キーポイント検出方法、及びキーポイント検出プログラム
CN109584276A (zh) * 2018-12-04 2019-04-05 北京字节跳动网络技术有限公司 关键点检测方法、装置、设备及可读介质
CN110738110A (zh) * 2019-09-11 2020-01-31 北京迈格威科技有限公司 基于锚点的人脸关键点检测方法、装置、***和存储介质
CN111160288A (zh) * 2019-12-31 2020-05-15 北京奇艺世纪科技有限公司 手势关键点检测方法、装置、计算机设备和存储介质
CN112733700A (zh) * 2021-01-05 2021-04-30 风变科技(深圳)有限公司 人脸关键点检测方法、装置、计算机设备和存储介质
CN112784739A (zh) * 2021-01-21 2021-05-11 北京百度网讯科技有限公司 模型的训练方法、关键点定位方法、装置、设备和介质
CN113378852A (zh) * 2021-05-24 2021-09-10 北京迈格威科技有限公司 关键点检测方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN113378852A (zh) 2021-09-10

Similar Documents

Publication Publication Date Title
Kamal et al. Automatic traffic sign detection and recognition using SegU-Net and a modified Tversky loss function with L1-constraint
CN109977262B (zh) 从视频中获取候选片段的方法、装置及处理设备
CN108805170B (zh) 形成用于全监督式学习的数据集
WO2022247403A1 (zh) 关键点检测方法、电子设备、程序及存储介质
Joshi et al. Comparing random forest approaches to segmenting and classifying gestures
WO2019018063A1 (en) FINAL GRAIN IMAGE RECOGNITION
CN108921204B (zh) 电子装置、图片样本集生成方法和计算机可读存储介质
WO2020244075A1 (zh) 手语识别方法、装置、计算机设备及存储介质
WO2016015621A1 (zh) 人脸图片人名识别方法和***
JP6997369B2 (ja) プログラム、測距方法、及び測距装置
CN113221918B (zh) 目标检测方法、目标检测模型的训练方法及装置
CN111523537A (zh) 一种文字识别方法、存储介质及***
JP6989450B2 (ja) 画像解析装置、画像解析方法及びプログラム
JP6014120B2 (ja) 集合演算機能を備えたメモリ及びこれを用いた集合演算処理方法
CN116958957A (zh) 多模态特征提取网络的训练方法及三维特征表示方法
CN104021372A (zh) 一种人脸识别方法及装置
Bilgin et al. Road sign recognition system on Raspberry Pi
CN113780116A (zh) ***分类方法、装置、计算机设备和存储介质
CN112861934A (zh) 一种嵌入式终端的图像分类方法、装置及嵌入式终端
CN110516638B (zh) 一种基于轨迹和随机森林的手语识别方法
CN109977737A (zh) 一种基于循环神经网络的字符识别鲁棒性方法
JP5413156B2 (ja) 画像処理プログラム及び画像処理装置
CN110059180B (zh) 文章作者身份识别及评估模型训练方法、装置及存储介质
CN111767710A (zh) 印尼语的情感分类方法、装置、设备及介质
US20120051647A1 (en) Icon design and method of icon recognition for human computer interface

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22810137

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE