WO2022247403A1 - Procédé de détection de point clé, dispositif électronique, programme, et support de stockage - Google Patents

Procédé de détection de point clé, dispositif électronique, programme, et support de stockage Download PDF

Info

Publication number
WO2022247403A1
WO2022247403A1 PCT/CN2022/081229 CN2022081229W WO2022247403A1 WO 2022247403 A1 WO2022247403 A1 WO 2022247403A1 CN 2022081229 W CN2022081229 W CN 2022081229W WO 2022247403 A1 WO2022247403 A1 WO 2022247403A1
Authority
WO
WIPO (PCT)
Prior art keywords
key point
key
point detection
target object
offsets
Prior art date
Application number
PCT/CN2022/081229
Other languages
English (en)
Chinese (zh)
Inventor
李帮怀
袁野
Original Assignee
北京迈格威科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京迈格威科技有限公司 filed Critical 北京迈格威科技有限公司
Publication of WO2022247403A1 publication Critical patent/WO2022247403A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the technical field of image recognition, in particular to a key point detection method, electronic equipment, program and storage medium.
  • Key point detection is widely used in daily life. Common face recognition algorithms usually also rely on key point detection. Popular applications such as makeup testing, beautification, and face changing are also based on key point detection technology. On the other hand, the high requirement of keypoint detection accuracy is emphasized.
  • common key point detection methods are usually two-stage.
  • a target detection model is used to obtain the position of the target object in the image.
  • the target object is extracted according to the detected target frame, and then the key point
  • the point detection model detects key points, which is often referred to as the "top-down" approach. Since this method needs to be carried out step by step, when multiple target objects are included in one image, it needs to be cut out multiple times, and the key point detection model needs to be used for multiple detections respectively, so the detection efficiency is low.
  • the embodiments of the present application are proposed to provide a key point detection method, electronic device, program and storage medium that overcome the above problems or at least partially solve the above problems.
  • a key point detection method including:
  • multiple pose key point templates perform key point detection on the feature map to obtain at least one set of key points corresponding to the target object, wherein the pose key point template represents multiple keys in the pose key point template The relative positional relationship of the points;
  • the at least one group of key points is screened to obtain a key point detection result of the target object.
  • a key point detection method including:
  • a plurality of pose key point templates perform key point detection on the feature map to obtain at least one set of key point offsets corresponding to the target object, wherein the pose key point template represents the pose key point template
  • the relative positional relationship of a plurality of key points, each group of key point offsets characterizes the offset of the group of key points in the feature map and the key points in each attitude key point template
  • a key point detection result of the target object is obtained.
  • a key point detection device including:
  • the feature extraction module is used to extract the image features of the image to be detected through the backbone network to obtain a feature map, and the image to be detected includes the target object;
  • a key point detection module configured to perform key point detection on the feature map according to a plurality of pose key point templates to obtain at least one set of key points corresponding to the target object, wherein the pose key point template represents the pose The relative positional relationship of multiple key points in the key point template;
  • the detection result determining module is configured to screen the at least one group of key points to obtain the key point detection result of the target object.
  • a key point detection device including:
  • the feature extraction module is used to extract the image features of the image to be detected through the backbone network to obtain a feature map, and the image to be detected includes the target object;
  • a key point detection module configured to perform key point detection on the feature map according to a plurality of pose key point templates, to obtain at least one set of key point offsets corresponding to the target object, wherein the pose key point template represents The relative positional relationship of multiple key points in the attitude key point template, and the offset of each group of key points represents the offset between the key point in the feature map and the key point in each attitude key point template;
  • a detection result determining module configured to obtain a key point detection result of the target object according to the at least one set of key point offsets.
  • an electronic device including: a processor, a memory, and a computer program stored in the memory and operable on the processor, the computer program being executed by the The processor implements the key point detection method as described in the first aspect or the second aspect during execution.
  • a computer program including computer readable codes, when the computer readable codes are run on an electronic device, causing the electronic device to execute the first aspect or the second aspect The key point detection method described.
  • a computer-readable storage medium stores the computer program described in the sixth aspect.
  • the key point detection method, electronic equipment, program, and storage medium provided in the embodiments of the present application, after extracting the feature map of the image to be detected through the backbone network, perform key point detection on the feature map according to multiple pose key point templates to obtain the target object Corresponding to at least one set of key points, at least one set of key points is screened to obtain key point detection results of the target object, since the key point detection results of all target objects in the image to be detected can be directly determined based on multiple pose key point templates, It is not necessary to first determine the position of the target object in the image to be detected and then detect the key points, so that the detection efficiency of the key points can be improved.
  • Fig. 1 is a flow chart of the steps of a key point detection method provided in the embodiment of the present application
  • Figures 2a-2c are illustrations of gesture key point templates in the embodiment of the present application.
  • FIG. 3 is a flow chart of steps of another key point detection method provided in the embodiment of the present application.
  • FIG. 4 is a structural block diagram of a key point detection device provided in an embodiment of the present application.
  • Fig. 5 is a structural block diagram of another key point detection device provided by the embodiment of the present application.
  • Figure 6 schematically shows a block diagram of an electronic device for performing the method according to the present application.
  • Fig. 7 schematically shows a storage unit for holding or carrying program codes for realizing the method according to the present application.
  • Fig. 1 is a flow chart of the steps of a key point detection method provided in the embodiment of the present application. As shown in Fig. 1, the method may include:
  • Step 101 extracting image features of an image to be detected through a backbone network to obtain a feature map, and the image to be detected includes a target object.
  • the backbone network is used to extract image features of the image to be detected, for example, it may be a ResNet-50 network or the like.
  • the target object may be, for example, a human face, a human body, a pet, a vehicle, and the like.
  • the image to be detected is input into a backbone network such as ResNet-50, and the high-dimensional feature representation of the image to be detected is obtained, that is, the feature map is obtained.
  • Step 102 Perform key point detection on the feature map according to multiple pose key point templates to obtain at least one set of key points corresponding to the target object, wherein the pose key point templates represent multiple poses corresponding to the pose. The relative position relationship of key points.
  • the key point detection method in the embodiment of the present application is a "bottom-up" key point detection method, which is a single-stage complete end-to-end key point detection method, which can input any image to be detected and directly obtain the image to be detected
  • the key point position determination results of all target objects in the target object do not need to use the target detection model to determine the position of the target object and cut out the image.
  • the entire model forward process only needs one time, while the traditional "top-down" method needs to be For matting, the number of forwards is proportional to the number of target objects in the picture.
  • Each pose key point template corresponds to a key point in a pose. Because some poses will cause key points to be blocked, so The number of keypoints in each pose keypoint template is not necessarily the same.
  • 2a-2c are illustrations of pose key point templates in the embodiment of the present application. As shown in Figures 2a-2c, when the target object is a human face, three different pose key point templates are given, and one pose key point template defines the relative position relationship of the key points corresponding to a pose. The relative positional relationship can be, for example, the position of each key point relative to one of the center points. Each pose key point template includes the same center point. When the target object is a human face, the center point can be, for example, the nose in the key points. A point corresponding to the center (that is, the position of the tip of the nose), which is not limited in this embodiment of the present application.
  • the feature map can be directly input into the key point detection model, and the key point detection model is used to perform regression calculation on the key points in the image to be detected based on multiple pose key point templates to obtain the corresponding At least one set of keypoints for .
  • Step 103 Filter the at least one group of key points to obtain a key point detection result of the target object.
  • At least one set of key points corresponding to the target object is obtained, at least one set of key points is screened to filter out key point groups that indeed contain the target object, and a key point detection result of the target object in the image to be detected is obtained.
  • the key point detection model in the embodiment of the present application does not need to carry out the matting of the target object in the image to be detected first, and the key point detection results of all target objects in the image to be detected can be directly obtained based on the pose key point template, and the image to be detected includes When there are multiple target objects, the key point detection results of multiple target objects can be obtained through one detection.
  • key point detection is performed on the feature map to obtain at least one set of key points corresponding to the target object, including: multiple candidate key point offsets Confidence corresponding to each candidate key point offset, wherein the candidate key point offset is the offset between the feature point in the feature map and the key point in each pose key point template; according to the multiple Candidate key point offsets and the confidence level are used to determine at least one set of key points corresponding to the target object.
  • the feature point may be a pixel in the feature map, or may be a collection of multiple pixel points in a specific area in the feature map.
  • the feature map is input into the key point detection model, and the key point detection model performs key point detection on the feature map based on multiple pose key point templates to regress the offset between the feature points in the feature map and the key points in each pose key point template, A plurality of candidate key point offsets are obtained, and a confidence degree corresponding to the candidate key point offsets is determined. After obtaining multiple candidate key point offsets and confidences, the candidate key point offsets can be preliminarily screened based on the confidence to select the candidate key point offsets whose confidence meets the preset conditions.
  • Point templates, feature points, and filtered candidate key point offsets determine key point coordinates, and obtain at least one set of key points corresponding to the target object; or, after obtaining multiple candidate key point offsets and confidence degrees, you can Based on the key point template, feature points and key point offsets, a set of key point coordinates corresponding to a set of candidate key point offsets is determined to obtain at least one set of key points corresponding to the target object.
  • determining at least one set of key points corresponding to the target object according to the plurality of candidate key point offsets and the confidence level includes: offsetting from the plurality of candidate key point offsets Filter out the combination of key point offsets whose confidence is greater than or equal to the confidence threshold, and determine the combination of the filtered key point offsets as at least one set of key point offsets, and each set of key point offsets The amount characterizes the offset of the group of key points in the feature map and the key points in each attitude key point template; according to the at least one set of key point offsets and the attitude key points corresponding to each set of key point offsets A template for determining at least one set of key points corresponding to the target object.
  • the confidence of the combination of candidate key point offsets among multiple candidate key point offsets may be relatively small, such candidate key point offsets will not get the correct key point, so you can first select from multiple candidate key point offsets Screen out the combination of key point offsets whose confidence is greater than or equal to the confidence threshold from the key point offsets, determine a combination of key point offsets that has been screened out as a set of key point offsets, and filter out multiple key point offsets. Multiple sets of key point offsets can be obtained by combining key point offsets, so as to obtain at least one set of key point offsets.
  • the coordinates of at least one set of key points can be determined, and the corresponding At least one set of keypoints for . Preliminary screening of candidate key points based on confidence can reduce the amount of calculation and improve processing speed.
  • the at least one set of key points is screened to obtain the key point detection result of the target object, including: according to the at least one set of key points and the confidence corresponding to each set of key points , to determine the key point detection result of the target object.
  • the confidence of each set of key points in the at least one set of key points is to obtain the candidate key point offset of the set of key points Confidence of the quantity, so that at least one group of key points can be screened based on the corresponding confidence of at least one group of key points, and the key point detection result of the target object can be determined.
  • non-maximum value suppression processing When screening at least one group of key points, it can be screened based on non-maximum value suppression processing, and when performing non-maximum value suppression processing, it can directly perform non-maximum value suppression processing on at least one group of key points, or After the target box of each group of key points is determined, non-maximum value suppression is performed on the target boxes corresponding to at least one group of key points.
  • the determining the key point detection result of the target object according to the at least one set of key points and the confidence corresponding to each set of key points includes: respectively determining the corresponding according to the confidence corresponding to each group of key points, perform non-maximum value suppression processing on the target frames corresponding to the at least one group of key points, and obtain the key point detection result of the target object.
  • the target frame represents the location of the target object.
  • the target frame corresponding to each group of key points can be determined first, and then based on the confidence corresponding to each group of key points, at least one group of key points corresponding to The target frame is subjected to non-maximum suppression processing to obtain the final key point detection result.
  • non-maximum value suppression processing is performed based on the target frame. Compared with directly performing non-maximum value suppression processing on the key point, the amount of processed data can be reduced and the detection efficiency can be further improved.
  • the separately determining the target frame corresponding to each group of key points includes: respectively determining the minimum circumscribed rectangle corresponding to each group of key points as the target frame corresponding to the group of key points.
  • the minimum circumscribed rectangle of the group of key points may be determined first, and the minimum circumscribed rectangle is determined as the target frame corresponding to the group of key points.
  • the key The point detection results can accurately screen out the key points of the target object and improve the accuracy of key point detection.
  • key point detection is performed on the feature map to obtain multiple candidate key point offsets and confidence degrees corresponding to each candidate key point offset, including : Based on each feature point in the feature map, a plurality of the pose key point templates are matched with the feature map, and the offset between the feature points and the key points in each pose key point template is determined and the confidence of the feature points to obtain at least one set of key point offsets and the confidence corresponding to each set of key point offsets.
  • each feature point in the feature map is used as the center point for matching the feature map with the pose key point template, and multiple pose key point templates are matched with the feature map respectively, according to multiple The relative position relationship between the key point and the center point in the pose key point template, determine the offset between the center point and the key point in each pose key point template, one feature point corresponds to multiple pose key point templates, and multiple sets of key point offsets can be obtained.
  • multiple sets of key point offsets can be obtained, so that at least one set of key point offsets can be obtained after matching each feature point with the attitude key point template. During the matching process, it can be obtained Confidence for each set of keypoint offsets.
  • the screening of the at least one group of key points according to the confidence corresponding to each group of key points to obtain the key point detection result of the target object includes: according to the at least one The group of key points and the confidence corresponding to each group of key points, performing non-maximum value suppression processing on the at least one group of key points to obtain the key point detection result of the target object;
  • the method further includes: determining a target frame according to the key point detection result, and the target frame represents the location of the target object.
  • At least one group of key points can be screened by performing non-maximum value suppression processing, that is, at least one group of key points can be directly non-maximum Value suppression processing to obtain the key point detection results of the target object.
  • non-maximum value suppression processing that is, at least one group of key points can be directly non-maximum Value suppression processing to obtain the key point detection results of the target object.
  • a group of key points belonging to the same target object in the key point detection result can be determined, and the minimum circumscribed rectangle of the same group of key points can be determined, and the minimum circumscribed rectangle can be determined as the same group of key points.
  • the point's target box By determining the target box, the key point detection result and the corresponding target box can be displayed at the same time.
  • the key point detection method after extracting the feature map of the image to be detected through the backbone network, performs key point detection on the feature map according to multiple pose key point templates, and obtains at least one set of key points corresponding to the target object. At least one group of key points is screened to obtain the key point detection results of the target object. Since the key point detection results of all target objects in the image to be detected can be directly determined based on multiple pose key point templates, it is not necessary to first determine the key point detection results of the image to be detected. The position of the target object is then detected as a key point, so that the detection efficiency of the key point can be improved.
  • a plurality of pose key point templates perform key point detection on the feature map to obtain at least one set of key points corresponding to the target object, including: according to a plurality of pose key point templates, by The key point detection model performs key point detection on the feature map to obtain at least one set of key point offsets corresponding to the target object and a confidence degree corresponding to each set of key point offsets.
  • the key point detection model can be based on multiple pose key point template regression key points in the feature map and the offset of the key points in the pose key point template, and determine the confidence of each group of key point offsets.
  • the feature map is input into the key point detection model, and the key point detection model uses each feature point in the feature map as the center point for matching with the pose key point template, that is, as an anchor point (anchor), and each pose key point
  • the template is attached to the anchor point, and the key point offset of the key point relative to the anchor point is determined based on each pose key point template attached to it.
  • Each pose key point template gets a set of key point offsets. For multiple The pose key point template will get multiple sets of key point offsets corresponding to the feature points, and the key point detection model will output the corresponding confidence for each set of key point offsets.
  • the regression can be performed based on the convolutional layer.
  • the method further includes: training an initial key point detection model based on the gesture key point template and the sample image to obtain the key point detection model.
  • the initial key point detection model can be obtained by randomly initializing the network parameters, and the initial key point detection model is trained based on the pose key point template and sample images to obtain the trained key point detection model.
  • the training steps of the key point detection model include:
  • the key point labeling can be the labeling of the key point position coordinates detected by means of other key point detection methods, for example, after using the target detection model to detect the target object in the sample image, the detected target object can be Cut out the image, and use the traditional key point detection model to detect the coordinates of the key points.
  • the sample image After obtaining the sample image, perform key point detection on the sample image with the help of other key point detection methods, obtain the key point annotation corresponding to the target object in the sample image, input the sample image into the backbone network, and perform feature extraction on the sample image through the backbone network, Obtain the sample feature map corresponding to the sample image, input the sample feature map into the initial key point detection model, and the initial key point detection model performs key point detection on the sample feature map based on the pose key point template, and obtains the predicted offset of multiple key point sets , based on the predicted offset and the attitude key point template corresponding to the predicted offset, the key point coordinates of each key point set are determined, and based on the key point coordinates and key point labels, the network parameters of the initial key point detection model are adjusted to obtain The trained keypoint detection model.
  • the training step of the key point detection model also includes determining the key point of each pose based on the distance or offset between the key point in the pose key point template and the key point label. Confidence label corresponding to each sample feature point in the point template in the sample feature map;
  • the initial key point detection model is trained to obtain a trained key point detection model, including: based on the predicted offset , the posture key point template corresponding to the predicted offset, the key point label and the confidence level label, and train the initial key point detection model to obtain the key point detection model after training.
  • a classification network can be set in the key point detection model to predict the attachment of a feature point through the classification network. Confidence of multiple pose key point templates, which requires determining the confidence label corresponding to the sample feature map, in order to train and supervise the key point detection model.
  • Each feature point in the sample feature map is used as the center point for matching with the attitude key point template, and when each attitude key point template is respectively attached to the center point, the key point in the attitude key point template is determined relative to the key point.
  • the distance or offset of the point label so based on the distance or offset, the confidence label of each feature point in the sample feature map relative to each pose key point template can be determined.
  • One feature point corresponding to multiple pose key point templates will get the same confidence labeling results as the number of pose key point templates. For example, when there are 24 pose key point templates, one feature point in the sample feature map will correspond to 24 confidence levels Label the results.
  • the initial key point detection model can be trained based on the predicted offset, the attitude key point template corresponding to the predicted offset, the key point label, and the confidence label to meet the training end condition When the training ends, the key point detection model after training is obtained.
  • the distance or offset can be mapped between 0 and 1 to obtain the confidence label of each feature point in the sample feature map relative to each pose key point template.
  • the distance between the key points in the pose key point template and the key point annotation is the offset between the key points in the pose key point template and the key point annotation.
  • the distance or offset between the pose key point template and the key point label at each feature point can be mapped to between 0 and 1 by means of sigmoid, etc., and the obtained value is used as each feature point relative to each pose key Confidence labels for point templates. Since the greater the distance, the lower the confidence, and the smaller the distance, the higher the confidence, so the larger the distance, the smaller the classification label obtained by mapping, and the smaller the distance, the larger the classification label obtained by mapping.
  • the initial key point detection model also outputs a prediction confidence corresponding to the prediction offset.
  • the initial key point detection model is trained to obtain the trained Keypoint detection models, including:
  • the network parameters of the initial key point detection model are adjusted to obtain the key point detection model after training.
  • the predicted coordinates of the key points corresponding to the attitude key point template can be obtained, and the predicted coordinates of the key points and the key point labels corresponding to the sample feature map are substituted into the regression loss function , to obtain the regression loss value; and substitute the prediction confidence and confidence label corresponding to each group of prediction offsets into the confidence loss function to obtain the confidence loss value.
  • Add the regression loss value and the confidence loss value as the target loss value adjust the network parameters of the initial key point detection model based on the target loss value, and obtain the key point detection model after training.
  • the predicted coordinates of the key points can be obtained according to the predicted offset of the output of the key point detection model and the pose key point template, and the absolute value of the difference between the predicted coordinates of the key point and the key point label is taken as Regression loss function.
  • the confidence loss function may be a cross-entropy loss function (Cross-Entropy Loss).
  • Regression loss and confidence loss are used to constrain the training of the key point detection model, so that the trained key point detection model can accurately give key points and corresponding confidence levels, thereby improving the overall detection accuracy.
  • Fig. 3 is a flow chart of the steps of a key point detection method provided in the embodiment of the present application. As shown in Fig. 3, the method may include:
  • Step 301 extracting image features of an image to be detected through a backbone network to obtain a feature map, and the image to be detected includes a target object.
  • the backbone network is used to extract image features of the image to be detected, for example, it may be a ResNet-50 network or the like.
  • the target object may be, for example, a human face, a human body, a pet, a vehicle, and the like.
  • the image to be detected is input into a backbone network such as ResNet-50, and the high-dimensional feature representation of the image to be detected is obtained, that is, the feature map is obtained.
  • Step 302 Perform key point detection on the feature map according to multiple pose key point templates to obtain at least one set of key point offsets corresponding to the target object, wherein the pose key point templates represent the pose and pose The relative positional relationship of the corresponding multiple key points, and the offset of each group of key points represents the offset between the group of key points in the feature map and the key points in each pose key point template.
  • multiple pose key point templates are predefined relative positional relationships of key points corresponding to different poses, and each pose key point template corresponds to a key point in a pose, because some poses will cause key points to be occluded , so the number of key points in each pose key point template is not necessarily the same.
  • a pose key point template can regress a set of key point offsets at a feature point, so that for the feature map
  • the feature points and multiple pose key point templates can obtain at least one set of key point offsets corresponding to the target object.
  • the coordinates of a group of key points can be obtained based on the offset of each group of key points and the corresponding pose key point template.
  • performing key point detection on the feature map according to multiple pose key point templates to obtain at least one set of key point offsets corresponding to the target object including: A point template, performing key point detection on the feature map to obtain multiple candidate key point offsets and confidence levels corresponding to each candidate key point offset; according to the multiple candidate key point offsets and each candidate key point offset
  • the confidence degree corresponding to the point offset is to determine at least one set of key point offsets corresponding to the target object.
  • the key point offsets corresponding to the feature points in the regression feature map will get multiple sets of key point offsets based on each feature point, and at the same time, the corresponding key point offsets for each set of key point offsets can be obtained. Confidence, these key point offsets are candidate key point offsets, based on the confidence of each group of candidate key point offsets, the candidate key point offsets can be selected from the candidate key point offsets with a confidence greater than or equal to the confidence threshold A combination of candidate key point offsets, and performing non-maximum value suppression processing on the selected combination of candidate key point offsets to obtain at least one set of key point offsets corresponding to the target object. More accurate key point offsets can be obtained by determining at least one set of key point offsets corresponding to the target object based on the confidence levels corresponding to the candidate key point offsets.
  • Step 303 Obtain a key point detection result of the target object according to the at least one set of key point offsets.
  • each set of key point offsets represents the offset between the set of key points in the feature map and the key points in each pose key point template, based on at least one set of key point offsets and the corresponding pose key point template, it can be Get the key point detection results of the target object.
  • the key point detection result of the target object is obtained according to the at least one set of key point offsets, including:
  • a key point detection result of the target object is determined according to the at least one set of key point offsets and the pose key point template corresponding to the at least one set of key point offsets.
  • the key point detection method after extracting the feature map of the image to be detected through the backbone network, performs key point detection on the feature map according to multiple pose key point templates, and obtains at least one set of key point offsets corresponding to the target object Quantities, according to at least one set of key point offsets, determine the key point detection results of the target object, because the key point offsets of all target objects in the image to be detected can be directly determined based on multiple pose key point templates, so based on the key point
  • the offset can obtain the key point detection results of all target objects, and it is not necessary to first determine the position of the target object in the image to be detected before performing key point detection, thereby improving the detection efficiency of key points.
  • Fig. 4 is a structural block diagram of a key point detection device provided in an embodiment of the present application. As shown in Fig. 4, the key point detection device may include:
  • the feature extraction module 401 is used to extract the image features of the image to be detected through the backbone network to obtain a feature map, and the image to be detected includes the target object;
  • the key point detection module 402 is configured to perform key point detection on the feature map according to a plurality of pose key point templates to obtain at least one set of key points corresponding to the target object, wherein the pose key point template represents the same pose The relative positional relationship of the corresponding multiple key points;
  • the detection result determining module 403 is configured to filter the at least one group of key points to obtain a key point detection result of the target object.
  • the key point detection module includes:
  • the key point detection unit is used to perform key point detection on the feature map according to a plurality of posture key point templates, and obtain a plurality of candidate key point offsets and confidence degrees corresponding to each candidate key point offset, wherein the The candidate key point offset is the offset between the feature point in the feature map and the key point in each attitude key point template;
  • a key point determining unit configured to determine at least one group of key points corresponding to the target object according to the plurality of candidate key point offsets and the confidence.
  • the detection result determination module is specifically used for:
  • a key point detection result of the target object is determined according to the at least one group of key points and the confidence corresponding to each group of key points.
  • the detection result determination module includes:
  • a target frame determination unit is used to respectively determine the target frame corresponding to each group of key points
  • the detection result determination unit is used to perform non-maximum value suppression processing on the target frame corresponding to the at least one group of key points according to the confidence degree corresponding to each group of key points, so as to obtain the key point detection result of the target object.
  • the target frame determining unit is specifically used for:
  • the minimum bounding rectangle corresponding to each group of key points is determined as the target box corresponding to each group of key points.
  • the key point determining unit is specifically used for:
  • the key point detection unit is specifically used for:
  • a plurality of the gesture key point templates are matched with the feature map, and the offset and sum of the key points in the feature point and each pose key point template are determined.
  • the confidence of the feature points is obtained by obtaining at least one set of key point offsets and the confidence corresponding to each set of key point offsets.
  • the detection result determination module includes:
  • a detection result determination unit configured to perform non-maximum value suppression processing on the at least one group of key points according to the at least one group of key points and the confidence corresponding to each group of key points, to obtain the key point detection of the target object result;
  • the device also includes:
  • the target frame determining module is configured to determine a target frame according to the key point detection result, and the target frame represents the location of the target object.
  • the key point detection module is specifically used for:
  • key point detection is performed on the feature map through a key point detection model, and at least one set of key point offsets corresponding to the target object and confidence corresponding to each set of key point offsets are obtained.
  • Spend is performed on the feature map through a key point detection model, and at least one set of key point offsets corresponding to the target object and confidence corresponding to each set of key point offsets are obtained.
  • the device also includes:
  • the training module is used to train the initial key point detection model based on the posture key point template and the sample image, so as to obtain the key point detection model.
  • the training module includes:
  • a sample acquisition unit configured to acquire a sample image and a key point label corresponding to a target object in the sample image
  • the sample feature extraction unit is used to extract the image features of the sample image through the backbone network to obtain a sample feature map corresponding to the sample image;
  • a model processing unit configured to perform key point detection on the sample feature map through the initial key point detection model based on the posture key point template, and obtain a predicted offset of the key point set output by the initial key point detection model;
  • a model training unit configured to train an initial key point detection model based on the predicted offset, the pose key point template corresponding to the predicted offset, and the key point label, to obtain a trained key point detection model.
  • the training module also includes:
  • Confidence label determination unit used to determine each gesture key point template in each sample feature in the sample feature map based on the distance or offset between the key point in the gesture key point template and the key point label The corresponding confidence label at the point;
  • the model training unit is specifically used to: train the initial key point detection model based on the predicted offset, the attitude key point template corresponding to the predicted offset, the key point label and the confidence label, and obtain the training model. keypoint detection model.
  • the initial key point detection model also outputs the prediction confidence corresponding to the prediction offset.
  • model training unit is specifically used for:
  • the network parameters of the initial key point detection model are adjusted to obtain the key point detection model after training.
  • the key point detection device after extracting the feature map of the image to be detected through the backbone network, performs key point detection on the feature map according to multiple pose key point templates to obtain at least one set of key points, and at least one set of key points Points are screened to obtain the key point detection results of the target object. Since the key point detection results of all target objects in the image to be detected can be directly determined based on multiple pose key point templates, it is not necessary to first determine the position of the target object in the image to be detected. Then the detection of key points is carried out, so that the detection efficiency of key points can be improved.
  • Fig. 5 is a structural block diagram of a key point detection device provided in an embodiment of the present application. As shown in Fig. 5, the key point detection device may include:
  • the feature extraction module 501 is used to extract the image features of the image to be detected through the backbone network to obtain a feature map, and the image to be detected includes a target object;
  • the key point detection module 502 is configured to perform key point detection on the feature map according to a plurality of pose key point templates to obtain at least one set of key point offsets corresponding to the target object, wherein the pose key point template Characterizing the relative positional relationship of the plurality of key points corresponding to the posture, and each group of key point offsets characterizes the offset of the group of key points in the feature map and the key points in each posture key point template;
  • the detection result determination module 503 is configured to obtain a key point detection result of the target object according to the at least one set of key point offsets.
  • the key point detection module includes:
  • a key point detection unit configured to perform key point detection on the feature map according to a plurality of posture key point templates, to obtain a plurality of candidate key point offsets and confidence degrees corresponding to each candidate key point offset;
  • An offset determination unit configured to determine at least one set of key point offsets corresponding to the target object according to the plurality of candidate key point offsets and the confidence corresponding to each candidate key point offset.
  • the detection result determination module is specifically used for:
  • a key point detection result of the target object is determined according to the at least one set of key point offsets and the pose key point template corresponding to the at least one set of key point offsets.
  • the key point detection device after extracting the feature map of the image to be detected through the backbone network, performs key point detection on the feature map according to multiple pose key point templates, and obtains at least one set of key point offsets corresponding to the target object amount, according to at least one set of key point offsets, determine the key point detection results of the target object, because the key point offsets in the image to be detected can be directly determined based on multiple pose key point templates, so based on the key point offsets Key point detection results of all target objects can be obtained without first determining the position of the target object in the image to be detected and then performing key point detection, thereby improving the detection efficiency of key points.
  • the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network elements. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without any creative efforts.
  • the various component embodiments of the present application may be realized in hardware, or in software modules running on one or more processors, or in a combination thereof.
  • a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all functions of some or all components in the computing processing device according to the embodiments of the present application.
  • DSP digital signal processor
  • the present application can also be implemented as an apparatus or apparatus program (eg, computer program and computer program product) for performing a part or all of the methods described herein.
  • Such a program implementing the present application may be stored on a computer-readable medium, or may be in the form of one or more signals.
  • Such a signal may be downloaded from an Internet site, or provided on a carrier signal, or provided in any other form.
  • FIG. 6 shows an electronic device that can implement the method according to the present application.
  • the electronic device conventionally includes a processor 610 and a computer program product in the form of a memory 620 or a computer readable medium.
  • Memory 620 may be electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM.
  • the memory 620 has a storage space 630 for program code 631 for performing any method steps in the methods described above.
  • the storage space 630 for program codes may include respective program codes 631 for respectively implementing various steps in the above methods. These program codes can be read from or written into one or more computer program products.
  • These computer program products comprise program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks.
  • Such a computer program product is typically a portable or fixed storage unit as described with reference to FIG. 7 .
  • the storage unit may have storage segments, storage spaces, etc. arranged similarly to the memory 620 in the computing processing device of FIG. 7 .
  • the program code can eg be compressed in a suitable form.
  • the memory unit includes computer readable code 631', i.e. code readable by, for example, a processor such as 610, which when executed by the electronic device causes the electronic device to perform each of the methods described above. step.
  • embodiments of the embodiments of the present application may be provided as methods, devices, or computer program products. Therefore, the embodiment of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to the embodiments of the present application. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor or processor of other programmable data processing terminal equipment to produce a machine such that instructions executed by the computer or processor of other programmable data processing terminal equipment Produce means for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing terminal to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the The instruction means implements the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

Les modes de réalisation de la présente demande fournissent un procédé de détection de point clé, un dispositif électronique, un programme, et un support de stockage. Le procédé consiste à : au moyen d'un réseau fédérateur, extraire des caractéristiques d'image d'une image à détecter et obtenir une carte de caractéristiques, l'image à détecter comprenant un objet cible ; réaliser une détection de point clé sur la carte de caractéristiques selon de multiples modèles de point clé de pose et obtenir au moins un ensemble de points clés correspondant à l'objet cible, les modèles de points clés de pose représentant des relations de position relative entre de multiples points clés correspondant à une pose ; cribler le ou les ensembles de points clés et obtenir un résultat de détection de point clé de l'objet cible. Dans la présente demande, des résultats de détection de point clé de tous les objets cibles dans une image à détecter peuvent être directement déterminés sur la base de multiples modèles de point clé de pose, ce qui augmente ainsi l'efficacité de la détection de point clé.
PCT/CN2022/081229 2021-05-24 2022-03-16 Procédé de détection de point clé, dispositif électronique, programme, et support de stockage WO2022247403A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110568016.2 2021-05-24
CN202110568016.2A CN113378852A (zh) 2021-05-24 2021-05-24 关键点检测方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022247403A1 true WO2022247403A1 (fr) 2022-12-01

Family

ID=77571822

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/081229 WO2022247403A1 (fr) 2021-05-24 2022-03-16 Procédé de détection de point clé, dispositif électronique, programme, et support de stockage

Country Status (2)

Country Link
CN (1) CN113378852A (fr)
WO (1) WO2022247403A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378852A (zh) * 2021-05-24 2021-09-10 北京迈格威科技有限公司 关键点检测方法、装置、电子设备及存储介质
CN116563371A (zh) * 2023-03-28 2023-08-08 北京纳通医用机器人科技有限公司 关键点确定方法、装置、设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017037584A (ja) * 2015-08-14 2017-02-16 株式会社デンソーアイティーラボラトリ キーポイント検出器、キーポイント検出方法、及びキーポイント検出プログラム
CN109584276A (zh) * 2018-12-04 2019-04-05 北京字节跳动网络技术有限公司 关键点检测方法、装置、设备及可读介质
CN110738110A (zh) * 2019-09-11 2020-01-31 北京迈格威科技有限公司 基于锚点的人脸关键点检测方法、装置、***和存储介质
CN111160288A (zh) * 2019-12-31 2020-05-15 北京奇艺世纪科技有限公司 手势关键点检测方法、装置、计算机设备和存储介质
CN112733700A (zh) * 2021-01-05 2021-04-30 风变科技(深圳)有限公司 人脸关键点检测方法、装置、计算机设备和存储介质
CN112784739A (zh) * 2021-01-21 2021-05-11 北京百度网讯科技有限公司 模型的训练方法、关键点定位方法、装置、设备和介质
CN113378852A (zh) * 2021-05-24 2021-09-10 北京迈格威科技有限公司 关键点检测方法、装置、电子设备及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017037584A (ja) * 2015-08-14 2017-02-16 株式会社デンソーアイティーラボラトリ キーポイント検出器、キーポイント検出方法、及びキーポイント検出プログラム
CN109584276A (zh) * 2018-12-04 2019-04-05 北京字节跳动网络技术有限公司 关键点检测方法、装置、设备及可读介质
CN110738110A (zh) * 2019-09-11 2020-01-31 北京迈格威科技有限公司 基于锚点的人脸关键点检测方法、装置、***和存储介质
CN111160288A (zh) * 2019-12-31 2020-05-15 北京奇艺世纪科技有限公司 手势关键点检测方法、装置、计算机设备和存储介质
CN112733700A (zh) * 2021-01-05 2021-04-30 风变科技(深圳)有限公司 人脸关键点检测方法、装置、计算机设备和存储介质
CN112784739A (zh) * 2021-01-21 2021-05-11 北京百度网讯科技有限公司 模型的训练方法、关键点定位方法、装置、设备和介质
CN113378852A (zh) * 2021-05-24 2021-09-10 北京迈格威科技有限公司 关键点检测方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN113378852A (zh) 2021-09-10

Similar Documents

Publication Publication Date Title
Kamal et al. Automatic traffic sign detection and recognition using SegU-Net and a modified Tversky loss function with L1-constraint
CN109977262B (zh) 从视频中获取候选片段的方法、装置及处理设备
CN108805170B (zh) 形成用于全监督式学习的数据集
WO2022247403A1 (fr) Procédé de détection de point clé, dispositif électronique, programme, et support de stockage
Joshi et al. Comparing random forest approaches to segmenting and classifying gestures
WO2019018063A1 (fr) Reconnaissance d'image à grain fin
TW201926140A (zh) 影像標註方法、電子裝置及非暫態電腦可讀取儲存媒體
CN108921204B (zh) 电子装置、图片样本集生成方法和计算机可读存储介质
WO2020244075A1 (fr) Procédé et appareil de reconnaissance de langage des signes, dispositif informatique et support d'informations
WO2016015621A1 (fr) Procédé et système de reconnaissance de nom d'image de visage humain
JP6997369B2 (ja) プログラム、測距方法、及び測距装置
CN111523537A (zh) 一种文字识别方法、存储介质及***
JP6989450B2 (ja) 画像解析装置、画像解析方法及びプログラム
JP6014120B2 (ja) 集合演算機能を備えたメモリ及びこれを用いた集合演算処理方法
CN116958957A (zh) 多模态特征提取网络的训练方法及三维特征表示方法
CN110516638B (zh) 一种基于轨迹和随机森林的手语识别方法
CN104021372A (zh) 一种人脸识别方法及装置
Bilgin et al. Road sign recognition system on Raspberry Pi
CN115203408A (zh) 一种多模态试验数据智能标注方法
CN113780116A (zh) ***分类方法、装置、计算机设备和存储介质
CN112861934A (zh) 一种嵌入式终端的图像分类方法、装置及嵌入式终端
CN109977737A (zh) 一种基于循环神经网络的字符识别鲁棒性方法
JP5413156B2 (ja) 画像処理プログラム及び画像処理装置
CN110059180B (zh) 文章作者身份识别及评估模型训练方法、装置及存储介质
US20120051647A1 (en) Icon design and method of icon recognition for human computer interface

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22810137

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE