CN111191622A - Posture recognition method and system based on thermodynamic diagram and offset vector and storage medium - Google Patents

Posture recognition method and system based on thermodynamic diagram and offset vector and storage medium Download PDF

Info

Publication number
CN111191622A
CN111191622A CN202010006031.3A CN202010006031A CN111191622A CN 111191622 A CN111191622 A CN 111191622A CN 202010006031 A CN202010006031 A CN 202010006031A CN 111191622 A CN111191622 A CN 111191622A
Authority
CN
China
Prior art keywords
key points
thermodynamic diagram
target image
offset vector
offset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010006031.3A
Other languages
Chinese (zh)
Other versions
CN111191622B (en
Inventor
肖菁
李海超
屈光卓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China Normal University
Original Assignee
South China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China Normal University filed Critical South China Normal University
Priority to CN202010006031.3A priority Critical patent/CN111191622B/en
Publication of CN111191622A publication Critical patent/CN111191622A/en
Application granted granted Critical
Publication of CN111191622B publication Critical patent/CN111191622B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method, a system and a storage medium for gesture recognition based on thermodynamic diagrams and offset vectors, wherein the method comprises the following steps: acquiring a target image to be identified; extracting the characteristics of the target image to be recognized; predicting the positions of key points according to the extracted features; correcting the predicted key points and determining the final positions of the key points; and determining the attitude information of the target to be recognized according to the key points. The invention can correct the prediction result by extracting the characteristics of the image, predicting the positions of the key points and finally identifying to obtain the attitude information.

Description

Posture recognition method and system based on thermodynamic diagram and offset vector and storage medium
Technical Field
The invention relates to the technical field of deep learning, in particular to a method, a system and a storage medium for gesture recognition based on thermodynamic diagrams and offset vectors.
Background
Thermodynamic diagrams: the probability graph is a probability graph, the probability of a pixel point closer to the central point is closer to 1, and the probability of a pixel point farther from the central point is closer to 0, and the probability graph can be specifically simulated through a corresponding function, such as Gaussian and the like.
Offset vector: the displacement between a point and a reference point is deduced from the distance between the point and the reference point.
And (3) attitude estimation: the specific tasks of determining the pose of an object in an image (or stereo image, image sequence), reconstructing the joints and limbs of a person.
People usually record life by taking photos in daily life, and in order to better understand the character information in the photos, people want to locate the positions of people and know the activities carried out by people, and how to realize the targets is the main problem of human posture estimation. Pose estimation, also known as human keypoint detection, primarily identifies the location of key parts of the human body, such as the nose, left eye, right eye, left ear, right ear, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left hip, right hip, left knee, right knee, left ankle, and right ankle, among others. Despite years of research, the computer vision is still a very challenging problem in computer vision, and the difficulties mainly come from complex background under natural scenes, blurring, shading, brightness of illumination and colors of clothes. Furthermore, the interaction of limbs between people can cause strong interference, such as the overlapping of limbs and the occlusion between limbs.
Because more than one person is often found in an actual application scene, the current posture estimation algorithm is mainly a multi-person posture algorithm. The multi-person pose estimation algorithm has two main trends, one is a Top-down (Top-down) method, and the other is a Bottom-up (Bottom-up) method. From top to bottom, the Object detection (Object detection) method, such as fast-RCNN (fast Region-based volumetric Neural Networks) or ssd (single Shot multi box detector), is used to obtain the detection frames of multiple people in the image, and then cut them from the original image and transmit them to the pose estimation network at the back, where the network predicts the key points of the human body separately for the cut image. The top-down approach translates the problem of multi-person pose estimation into single-person pose estimation. The bottom-up multi-person posture estimation method is characterized in that key points on all persons are detected firstly, then clustering processing is carried out on the key points, different key points of different persons are connected together, and thus different individuals are generated through clustering. The bottom-up multi-person posture estimation method focuses on exploring a key point clustering method, namely how to construct the relationship between different key points.
With the rapid development of the deep learning technology in the field of computer vision, a large amount of research work for solving human body key point detection by adopting deep learning is emerged in recent years. However, most of the existing work focuses on how to design the data transmission path in the network to obtain the abundant spatial information and detail information in the picture. For example, Feature Pyramid Networks (Feature Pyramid Networks for Object Detection), cascaded convolutional neural Networks (Cascade Pyramid Networks for Multi-Person Pose Estimation), and stacked hourglass Networks (Stack Hourglass Networks for Human Pose Estimation), among others. The methods can improve the accuracy of human body key point detection, but the methods neglect that a small offset occurs in the process of mapping the predicted point from low resolution to high resolution, which causes a certain precision loss.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, a system, and a storage medium for gesture recognition based on a thermodynamic diagram and an offset vector with high accuracy.
The invention provides a gesture recognition method based on thermodynamic diagrams and offset vectors, which comprises the following steps:
acquiring a target image to be identified;
extracting the characteristics of the target image to be recognized;
predicting the positions of key points according to the extracted features;
correcting the predicted key points and determining the final positions of the key points; and
and determining the attitude information of the target to be recognized according to the key points.
Further, the step of performing feature extraction on the target image to be recognized includes:
cutting the obtained target image to be recognized;
inputting each image obtained by cutting into a residual error network; and
and carrying out coding processing through the residual error network to obtain a first characteristic diagram.
Further, the residual error network comprises five convolutional layers;
in addition, the step of obtaining the feature map by performing the encoding process through the residual error network includes the steps of:
carrying out variable-dimension processing on each channel of the feature map through convolution kernel, wherein the variable-dimension processing comprises ascending and descending dimensions;
carrying out normalization processing on each channel; and
and carrying out nonlinear activation processing on the result after the normalization processing.
Further, the step of extracting the features of the target image to be recognized further includes a decoding step, and the decoding step includes:
inputting the obtained first feature map into a deconvolution structure;
decoding the first feature map by a deconvolution structure; and
and acquiring a characteristic response graph of each channel.
Further, the predicting the location of the keypoint based on the extracted features comprises:
acquiring thermodynamic diagrams from output results of the channels;
calculating the maximum value of each thermodynamic diagram to obtain the position information of each key point on the thermodynamic diagram; and
and mapping the position information of the key points to the target image to be recognized according to the size relation between the target image to be recognized and the thermodynamic diagram.
Further, the step of correcting the predicted key points and determining the final positions of the key points includes the following steps:
determining the offset vector of the key point according to the output result of each channel; and
and adding the offset vector to the maximum value of the thermodynamic diagram according to the offset vector to determine the final position of the key point.
Further, the method also comprises the following steps:
training a thermodynamic diagram by adopting a mean square error loss function; and
in training the offset vector, a smooth penalty function is used to handle the gap between the true offset and the predicted offset.
The invention provides in a second aspect a system for gesture recognition based on thermodynamic diagrams and offset vectors, comprising:
the acquisition module is used for acquiring a target image to be identified;
the characteristic extraction module is used for extracting the characteristics of the target image to be identified;
the key point prediction module is used for predicting the position of a key point according to the extracted features;
the key point correction module is used for correcting the predicted key points and determining the final positions of the key points; and
and the gesture determining module is used for determining gesture information of the target to be recognized according to the key points.
A third aspect of the invention provides a system for gesture recognition based on thermodynamic diagrams and offset vectors, comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method.
A fourth aspect of the invention provides a storage medium having stored therein processor-executable instructions for performing the method when executed by a processor.
One or more of the above-described embodiments of the present invention have the following advantages: according to the embodiment of the invention, the characteristics of the image are extracted, the positions of the key points are predicted, the prediction result can be corrected, and finally the attitude information is obtained through recognition.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart illustrating the overall steps of an embodiment of the present invention;
FIG. 2 is a first exemplary flow chart of an embodiment of the present invention;
FIG. 3 is a second exemplary flow chart of an embodiment of the present invention;
FIG. 4 is a schematic diagram of a coordinate offset correction thermodynamic diagram predicted coordinate location according to an embodiment of the present invention;
FIG. 5 is a comparison of the results of various algorithms on the MSCOCO data set according to embodiments of the present invention;
FIG. 6 is a comparison of various algorithms on an MPII data set according to an embodiment of the present invention;
FIG. 7 is a comparison of results of various algorithms of embodiments of the present invention on a CROWPOSE data set;
FIG. 8 shows the results of the detection of HOPE on the MSCOCO data set according to the embodiment of the present invention;
FIG. 9 shows the result of the detection of the HOPE on the MPII data set according to the embodiment of the present invention;
fig. 10 shows the detection result of the HOPE on the CROWDPOSE data set according to the embodiment of the invention.
Detailed Description
The invention will be further explained and explained with reference to the drawings and the embodiments in the description. The step numbers in the embodiments of the present invention are set for convenience of illustration only, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adaptively adjusted according to the understanding of those skilled in the art.
Since the prior art mostly only has innovative thermodynamic methods for network architecture and focuses mainly on the loss function. However, the thermodynamic diagram based method has a coordinate mapping process, which ignores the loss caused by the mapping of the coordinates obtained by the thermodynamic diagram with low resolution of the predicted points back to the original image loss, and this limits the improvement of the accuracy.
Therefore, the application provides a human body posture estimation method based on thermodynamic diagrams and coordinate migration, which predicts thermodynamic diagrams and migration vectors of key points by extracting features through a convolutional neural network with strong robustness, predicts the coordinates of the key points by using the thermodynamic diagrams, and corrects the coordinates of the key points by using the migration vectors so as to obtain more accurate position information.
Referring to fig. 1, the specific implementation steps of the embodiment of the present application include:
s1: acquiring a target image to be identified;
s2: extracting the characteristics of the target image to be recognized;
as shown in fig. 2 and fig. 3, the feature extraction in the embodiment of the present application is to convert a picture into features, and the network structure of the model is mainly divided into two parts, one is an encoding module and the other is a decoding module. The coding module of the application adopts a residual error network of 50 layers and deletes the last convolution layer of 1x1, and the module extracts the characteristics of the input image in a full convolution mode, particularly the design of the residual error makes the coding module perform very well in many computer vision tasks and have very strong characteristic expression capability.
The residual network of the embodiment is composed of c1, c2, c3, c4, c5 and 5 groups of convolutional layers, and each layer contains N residual modules. The residual module is composed of alternating convolutional layers, BN layers and ReLU, the 1x1 convolution kernel is mainly used for reducing or increasing the dimension of a channel of the feature map, and the calculated amount of the next convolution kernel can be effectively reduced through 1x1 dimension reduction before the 3x3 convolution kernel is input. And the BN layer is a batch normalization layer, each channel has four corresponding parameters, namely a mean value, a variance, a coefficient of telescopic transformation and an offset, and the four parameters are used for normalizing the characteristics input into the BN layer, so that the problem that the data distribution of the middle layer is changed in the model training process, and the gradient disappears or the gradient explodes is solved. The ReLU is used as a nonlinear activation function, on one hand, the nonlinear expression capability of the network is improved, and on the other hand, the problem that the parameters of the Sigmoid function are updated slowly in a saturation area is solved. As shown in fig. 2, the specific implementation steps of the encoding step in the embodiment of the present application are: firstly, cutting an acquired image; secondly, inputting the image into a residual error network; and thirdly, acquiring the coded characteristic diagram from the residual error network.
In addition, the embodiment of the present application further includes a decoding step, as shown in fig. 2, the decoding step includes: firstly, inputting an acquired feature map into a deconvolution structure; secondly, coding the feature map by the deconvolution structure; third, convolution from a 1x1 will result in a characteristic response map of 3x n channels. As shown in fig. 3, the network outputs a thermodynamic diagram and an offset vector, respectively, where the thermodynamic diagram corresponds to n channels and is used to predict the positions of n keypoints, and the offset vector corresponds to 2 × n channels and is used to predict the offset of the keypoints at each position in the x and y directions, respectively, and the size of the final network end feature map is 64 × 48, which is one fourth of the input image in width and height.
S3: predicting the positions of key points according to the extracted features;
specifically, the embodiment of the present application assumes that the location of the kth key point is lkIf position x on thermodynamic diagramiAnd a key point lkThe distance of (a) does not exceed the radius R, the probability that each position in the circle is a real key point is subjected to Gaussian distribution, thus being more beneficial to network learning, namely hk(xi)=G(xi-lk)if||xi-lk| | < R, otherwise hk(xi) 0, wherein G represents a gaussian function. It is clear that the distance key point lkThe closer, the greater the probability that the location on the thermodynamic diagram is a keypoint. Specifically, the implementation step of predicting the key points includes: firstly, obtaining a thermodynamic diagram in an output channel of a network; second, for each keypoint lkAll correspond to a thermodynamic diagram hkOn each thermodynamic diagramObtaining the position of each key point on the thermodynamic diagram; and thirdly, mapping the coordinates from the thermodynamic diagram to the input image according to the multiple relation between the size of the input image and the thermodynamic diagram size.
S4: correcting the predicted key points and determining the final positions of the key points;
specifically, in the embodiment of the present application, there is a precision loss when the key point is mapped from the low-resolution image to the high-resolution image, as shown in fig. 4(b), each grid represents the position of one pixel, and the area enclosed by the rectangular frame in fig. 4(a) is intended to thermally predict the position of the left wrist, but when the predicted coordinates are mapped to the resolution of the input image, a large precision loss occurs. As can be seen from fig. 4(b), one pixel in the thermodynamic diagram actually represents the position of 16 pixels of the original image, because the width and the height are both one fourth of the original image, and each time the coordinate product 4 on the thermodynamic diagram can only be mapped to the first pixel of the corresponding area of the input image, that is, the position of the upper left corner of the 16 grids in fig. 4(b), which is the root of the precision loss in the coordinate mapping process. Many efforts have been made to reduce the loss of accuracy in coordinate mapping by manually shifting the thermodynamic predicted keypoint locations by a quarter of a pixel at this stage, i.e., by a distance of 1 pixel on the original input image, which does reduce the expected error between the mapped keypoint and the true keypoint, resulting in a slight improvement in accuracy, but does not solve the problem of loss of accuracy at the root.
Based on such a situation, the network of the present application predicts each location x in addition to outputting the thermodynamic diagramiTwo-dimensional offset vector o with respect to input imagek(xi) Let the neural network actively learn the offset, o, between the mapped key points and the true key pointsk(xi) Which represents the shift of a certain position xi on the kth thermodynamic diagram after mapping with respect to the kth keypoint on the input image, with the aim of correcting the predicted position of the keypoint. Since there are k keypoints, the network of the present application generates k such offset fields, one for each keypoint and othersThe nearby locations solve a two-dimensional regression problem.
Referring to fig. 2, the specific implementation steps of the correction step include a first step of taking the position of the maximum value of the thermodynamic diagram after the network generates the thermodynamic diagram and the offset vector, and a second step of adding the offset vector to the maximum position of the thermodynamic diagram to obtain the key point position which is finally mapped to the input image, namely keypoints positions.
S5: and determining the attitude information of the target to be recognized according to the key points.
In addition, the embodiment of the application also provides steps of model training and testing, specifically:
the thermodynamic diagram is trained by adopting a classical mean square error loss function, and it is noted that the loss is calculated only in the probability value of the region within a distance R near the key point, that is, only those points near the key point are trained, so that the convergence of the network is facilitated, and the loss function is as shown.
Figure BDA0002355320320000061
With respect to training the offset vector, inspired by the target detection domain regression detection box coordinates, the present application employs a smooth loss function to penalize the gap between the true offset and the predicted offset, as shown.
Figure BDA0002355320320000062
The loss function can make the loss more robust to some abnormal outliers, thereby controlling the gradient to better propagate backwards in the network. Likewise, the present application only calculates the penalty at those locations that are no more than R from the keypoint. After fusing these two losses, the final loss function is shown in the formula, where λhAnd λoThe weights representing the two losses are respectively, the ratio is 4:1, and the optimizer used by the training model is adam.
L(θ)=λhLh(θ)+λoLo(θ) (3)
In addition, the present application selects three test data sets disclosed in the field of attitude estimation to perform experimental measurements, to further illustrate the advantages of the present application over the prior art:
the operation environment of the embodiment of the application is as follows: 6 cores, Intel Xeon E5-2620 processor, 64GB memory, Titan X display card, Ubuntu 16.04 operating system.
The three data sets were: (1) MSCOCO: the MSCOCO data set can be applied to tasks such as target detection, semantic segmentation, key point detection and the like. The patent mainly uses a COCO data set of 2017, wherein a training set comprises 118287 pictures, a testing set comprises 5000 pictures, and no picture has labels of multiple characters.
(2) MPII human pose data sets are the most advanced criteria for assessing articulated human pose estimates. The data set includes approximately 25K images containing over 40K of people with annotated human joints. These images are collected according to a classification system of human daily activities. The entire data set covers 410 human activities, with activity labels for each image. Each image is extracted from the YouTube video. There are approximately 25000 pictures in the data, containing over 4 million unlabeled human keypoint instances, of which 28000 are used for network training, and the remaining 12000 samples are used for testing.
(3) We also evaluated our approach in a CrowdPose dataset, which contained 2 million pictures and 8 million human instances. The crowd pose dataset is designed to improve performance in crowded situations, making the model suitable for different scenarios.
In order to evaluate the effectiveness of the algorithm, the experiment of this embodiment employs AP and PCK performance evaluation indexes, where AP is used as an evaluation index in the COCO and crowpost data set, and PCK is used as an evaluation index in the MPII data set. Object keypoint similarity oks (object keypoint similarity), which is used to calculate the similarity between predicted keypoints and annotated keypoints, is formulated as follows:
Figure BDA0002355320320000071
wherein DiRepresenting Euclidean distances between predicted and labeled keypoints, s being the scale of the object, kiIs a key control constant, v, controlling attenuationiIndicating whether the keypoint is visible. Given the OKS threshold s, the average accuracy over the test set can then be calculated from the following equation:
Figure BDA0002355320320000081
another important criterion for keypoints is PCK, which indicates the proportion of all predicted keypoints that fall within a certain standardized distance around the corresponding labeled keypoint. This normalized distance is often related to the longest distance of the human torso in the picture. Generally, the normalized distance is represented as PCK @ σ, where σ is a decimal between intervals [0,1], and the normalized distance in the evaluation index is obtained by multiplying σ by the longest trunk distance, and the specific calculation method is as follows:
Figure BDA0002355320320000082
where N represents the total number of samples and k represents the kth individual key point, so the overall PCK is:
Figure BDA0002355320320000083
the evaluation index used on the MPII dataset is PCKh, and unlike PCK, it replaces the longest torso distance used in normalizing the distance with the longest head distance.
In the embodiment of the application, the AP and PCK values of other algorithms are compared on three data sets of MSCOCO, MPII and CROWPOSE respectively. These methods include Simple Human body pose estimation and tracking baseline (SB), accurate multi-person pose estimation in the world (G-RMI), Cascaded pyramid network for multi-person pose estimation (CPN), Stacked Hourglass network for Human body pose estimation (FPN), Quantized tightly connected mesh-like networks for Human body pose estimation (heat regression for Human body pose estimation, FPN), Quantized tightly connected convolution-aided estimation (heat map estimation), abbreviated Vcph). The algorithm of the present application is abbreviated as HOPE.
FIG. 5 shows the results of the present application and other algorithms on the MSCOCO data set; FIG. 6 is the results of the present application and other algorithms on MPII data sets; fig. 7 shows the results of the application and other algorithms on the cordpost data set.
As can be seen from fig. 5, 6 and 7, the AP and PCK values of the present application are superior to those of other algorithms in both sparse and crowded scenarios. In addition, fig. 8 shows the detection result of the HOPE on the mscoco data set, fig. 9 shows the detection result of the HOPE on the MPII data set, and fig. 10 shows the detection result of the HOPE on the CROWPOSE data set. As can be seen from fig. 8 to fig. 10, the correlation of the results returned by the present application in the detection of the key points is relatively high, which further illustrates that the present application has a better effect in the detection of the key points.
The embodiment of the invention also provides a posture identification system based on thermodynamic diagrams and offset vectors, which comprises the following steps:
the acquisition module is used for acquiring a target image to be identified;
the characteristic extraction module is used for extracting the characteristics of the target image to be identified;
the key point prediction module is used for predicting the position of a key point according to the extracted features;
the key point correction module is used for correcting the predicted key points and determining the final positions of the key points; and
and the gesture determining module is used for determining gesture information of the target to be recognized according to the key points.
The embodiment of the invention also provides a posture identification system based on thermodynamic diagrams and offset vectors, which comprises the following steps:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method.
Embodiments of the present invention also provide a storage medium having stored therein processor-executable instructions, which when executed by a processor, are configured to perform the method.
In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. The gesture recognition method based on the thermodynamic diagram and the offset vector is characterized by comprising the following steps:
acquiring a target image to be identified;
extracting the characteristics of the target image to be recognized;
predicting the positions of key points according to the extracted features;
correcting the predicted key points and determining the final positions of the key points; and
and determining the attitude information of the target to be recognized according to the key points.
2. The method for gesture recognition based on thermodynamic diagrams and offset vectors according to claim 1, wherein the step of feature extraction of the target image to be recognized comprises:
cutting the obtained target image to be recognized;
inputting each image obtained by cutting into a residual error network; and
and carrying out coding processing through the residual error network to obtain a first characteristic diagram.
3. The thermodynamic diagram and offset vector based pose recognition method of claim 2, wherein the residual network comprises five convolutional layers;
in addition, the step of obtaining the feature map by performing the encoding process through the residual error network includes the steps of:
carrying out variable-dimension processing on each channel of the feature map through convolution kernel, wherein the variable-dimension processing comprises ascending and descending dimensions;
carrying out normalization processing on each channel; and
and carrying out nonlinear activation processing on the result after the normalization processing.
4. The thermodynamic diagram and offset vector-based pose recognition method according to claim 2, wherein the step of performing feature extraction on the target image to be recognized further comprises a decoding step, and the decoding step comprises:
inputting the obtained first feature map into a deconvolution structure;
decoding the first feature map by a deconvolution structure; and
and acquiring a characteristic response graph of each channel.
5. The thermodynamic diagram and offset vector based gesture recognition method of claim 4, wherein the predicting keypoint locations from the extracted features comprises:
acquiring thermodynamic diagrams from output results of the channels;
calculating the maximum value of each thermodynamic diagram to obtain the position information of each key point on the thermodynamic diagram; and
and mapping the position information of the key points to the target image to be recognized according to the size relation between the target image to be recognized and the thermodynamic diagram.
6. The thermodynamic diagram and offset vector based pose recognition method according to claim 4, wherein the step of modifying the predicted keypoints and determining the final positions of the keypoints comprises the steps of:
determining the offset vector of the key point according to the output result of each channel; and
and adding the offset vector to the maximum value of the thermodynamic diagram according to the offset vector to determine the final position of the key point.
7. The thermodynamic diagram and offset vector based gesture recognition method of claim 6, further comprising the steps of:
training a thermodynamic diagram by adopting a mean square error loss function; and
in training the offset vector, a smooth penalty function is used to handle the gap between the true offset and the predicted offset.
8. A system for gesture recognition based on thermodynamic diagrams and offset vectors, comprising:
the acquisition module is used for acquiring a target image to be identified;
the characteristic extraction module is used for extracting the characteristics of the target image to be identified;
the key point prediction module is used for predicting the position of a key point according to the extracted features;
the key point correction module is used for correcting the predicted key points and determining the final positions of the key points; and
and the gesture determining module is used for determining gesture information of the target to be recognized according to the key points.
9. A system for gesture recognition based on thermodynamic diagrams and offset vectors, comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method of any one of claims 1-7.
10. A storage medium having stored therein processor-executable instructions, which when executed by a processor, are for performing the method of any one of claims 1-7.
CN202010006031.3A 2020-01-03 2020-01-03 Gesture recognition method, system and storage medium based on thermodynamic diagram and offset vector Active CN111191622B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010006031.3A CN111191622B (en) 2020-01-03 2020-01-03 Gesture recognition method, system and storage medium based on thermodynamic diagram and offset vector

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010006031.3A CN111191622B (en) 2020-01-03 2020-01-03 Gesture recognition method, system and storage medium based on thermodynamic diagram and offset vector

Publications (2)

Publication Number Publication Date
CN111191622A true CN111191622A (en) 2020-05-22
CN111191622B CN111191622B (en) 2023-05-26

Family

ID=70708632

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010006031.3A Active CN111191622B (en) 2020-01-03 2020-01-03 Gesture recognition method, system and storage medium based on thermodynamic diagram and offset vector

Country Status (1)

Country Link
CN (1) CN111191622B (en)

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111680623A (en) * 2020-06-05 2020-09-18 北京百度网讯科技有限公司 Attitude conversion method and apparatus, electronic device, and storage medium
CN111695519A (en) * 2020-06-12 2020-09-22 北京百度网讯科技有限公司 Key point positioning method, device, equipment and storage medium
CN111814593A (en) * 2020-06-19 2020-10-23 浙江大华技术股份有限公司 Traffic scene analysis method and device, and storage medium
CN111860276A (en) * 2020-07-14 2020-10-30 咪咕文化科技有限公司 Human body key point detection method, device, network equipment and storage medium
CN111860300A (en) * 2020-07-17 2020-10-30 广州视源电子科技股份有限公司 Key point detection method and device, terminal equipment and storage medium
CN111914639A (en) * 2020-06-30 2020-11-10 吴�荣 Driving action recognition method of lightweight convolution space-time simple cycle unit model
CN111967406A (en) * 2020-08-20 2020-11-20 高新兴科技集团股份有限公司 Method, system, equipment and storage medium for generating human body key point detection model
CN111985556A (en) * 2020-08-19 2020-11-24 南京地平线机器人技术有限公司 Key point identification model generation method and key point identification method
CN112101490A (en) * 2020-11-20 2020-12-18 支付宝(杭州)信息技术有限公司 Thermodynamic diagram conversion model training method and device
CN112132131A (en) * 2020-09-22 2020-12-25 深兰科技(上海)有限公司 Measuring cylinder liquid level identification method and device
CN112417972A (en) * 2020-10-23 2021-02-26 奥比中光科技集团股份有限公司 Heat map decoding method, human body joint point estimation method and system
CN112446302A (en) * 2020-11-05 2021-03-05 杭州易现先进科技有限公司 Human body posture detection method and system, electronic equipment and storage medium
CN112528858A (en) * 2020-12-10 2021-03-19 北京百度网讯科技有限公司 Training method, device, equipment, medium and product of human body posture estimation model
CN112597955A (en) * 2020-12-30 2021-04-02 华侨大学 Single-stage multi-person attitude estimation method based on feature pyramid network
CN112651316A (en) * 2020-12-18 2021-04-13 上海交通大学 Two-dimensional and three-dimensional multi-person attitude estimation system and method
CN112837336A (en) * 2021-02-23 2021-05-25 浙大宁波理工学院 Method and system for estimating and acquiring room layout based on heat map correction of key points
CN112862920A (en) * 2021-02-18 2021-05-28 清华大学 Human body image generation method and system based on hand-drawn sketch
CN112926648A (en) * 2021-02-24 2021-06-08 北京优创新港科技股份有限公司 Method and device for detecting abnormality of tobacco leaf tip in tobacco leaf baking process
CN113011402A (en) * 2021-04-30 2021-06-22 中国科学院自动化研究所 System and method for estimating postures of primates based on convolutional neural network
CN113076891A (en) * 2021-04-09 2021-07-06 华南理工大学 Human body posture prediction method and system based on improved high-resolution network
CN113128436A (en) * 2021-04-27 2021-07-16 北京百度网讯科技有限公司 Method and device for detecting key points
CN113159198A (en) * 2021-04-27 2021-07-23 上海芯物科技有限公司 Target detection method, device, equipment and storage medium
CN113343762A (en) * 2021-05-07 2021-09-03 北京邮电大学 Human body posture estimation grouping model training method, posture estimation method and device
CN113537234A (en) * 2021-06-10 2021-10-22 浙江大华技术股份有限公司 Quantity counting method and device, electronic device and computer equipment
CN114359974A (en) * 2022-03-08 2022-04-15 广东履安实业有限公司 Human body posture detection method and device and storage medium
CN114463534A (en) * 2021-12-28 2022-05-10 佳都科技集团股份有限公司 Target key point detection method, device, equipment and storage medium
CN114863237A (en) * 2022-03-25 2022-08-05 中国人民解放军国防科技大学 Method and system for recognizing swimming postures
CN115272992A (en) * 2022-09-30 2022-11-01 松立控股集团股份有限公司 Vehicle attitude estimation method
CN115331153A (en) * 2022-10-12 2022-11-11 山东省第二人民医院(山东省耳鼻喉医院、山东省耳鼻喉研究所) Posture monitoring method for assisting vestibule rehabilitation training
CN116645699A (en) * 2023-07-27 2023-08-25 杭州华橙软件技术有限公司 Key point detection method, device, terminal and computer readable storage medium
CN117437433A (en) * 2023-12-07 2024-01-23 苏州铸正机器人有限公司 Sub-pixel level key point detection method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033946A (en) * 2018-06-08 2018-12-18 东南大学 Merge the estimation method of human posture of directional diagram
CN109657631A (en) * 2018-12-25 2019-04-19 上海智臻智能网络科技股份有限公司 Human posture recognition method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033946A (en) * 2018-06-08 2018-12-18 东南大学 Merge the estimation method of human posture of directional diagram
CN109657631A (en) * 2018-12-25 2019-04-19 上海智臻智能网络科技股份有限公司 Human posture recognition method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘唐波;杨锐;王文伟;何楚;: "基于姿态估计的驾驶员手部动作检测方法研究" *

Cited By (51)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111680623B (en) * 2020-06-05 2023-04-21 北京百度网讯科技有限公司 Gesture conversion method and device, electronic equipment and storage medium
CN111680623A (en) * 2020-06-05 2020-09-18 北京百度网讯科技有限公司 Attitude conversion method and apparatus, electronic device, and storage medium
CN111695519A (en) * 2020-06-12 2020-09-22 北京百度网讯科技有限公司 Key point positioning method, device, equipment and storage medium
EP3869402A1 (en) * 2020-06-12 2021-08-25 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for positioning key point, device, storage medium and computer program product
JP2021197157A (en) * 2020-06-12 2021-12-27 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Key point specification method, device, apparatus, and storage media
CN111695519B (en) * 2020-06-12 2023-08-08 北京百度网讯科技有限公司 Method, device, equipment and storage medium for positioning key point
US11610389B2 (en) 2020-06-12 2023-03-21 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for positioning key point, device, and storage medium
JP7194215B2 (en) 2020-06-12 2022-12-21 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド KEYPOINT IDENTIFICATION METHOD AND DEVICE, DEVICE, STORAGE MEDIUM
CN111814593A (en) * 2020-06-19 2020-10-23 浙江大华技术股份有限公司 Traffic scene analysis method and device, and storage medium
CN111914639A (en) * 2020-06-30 2020-11-10 吴�荣 Driving action recognition method of lightweight convolution space-time simple cycle unit model
CN111860276A (en) * 2020-07-14 2020-10-30 咪咕文化科技有限公司 Human body key point detection method, device, network equipment and storage medium
CN111860276B (en) * 2020-07-14 2023-04-11 咪咕文化科技有限公司 Human body key point detection method, device, network equipment and storage medium
CN111860300A (en) * 2020-07-17 2020-10-30 广州视源电子科技股份有限公司 Key point detection method and device, terminal equipment and storage medium
CN111985556A (en) * 2020-08-19 2020-11-24 南京地平线机器人技术有限公司 Key point identification model generation method and key point identification method
CN111967406A (en) * 2020-08-20 2020-11-20 高新兴科技集团股份有限公司 Method, system, equipment and storage medium for generating human body key point detection model
CN112132131A (en) * 2020-09-22 2020-12-25 深兰科技(上海)有限公司 Measuring cylinder liquid level identification method and device
CN112132131B (en) * 2020-09-22 2024-05-03 深兰科技(上海)有限公司 Measuring cylinder liquid level identification method and device
CN112417972A (en) * 2020-10-23 2021-02-26 奥比中光科技集团股份有限公司 Heat map decoding method, human body joint point estimation method and system
CN112446302A (en) * 2020-11-05 2021-03-05 杭州易现先进科技有限公司 Human body posture detection method and system, electronic equipment and storage medium
CN112446302B (en) * 2020-11-05 2023-09-19 杭州易现先进科技有限公司 Human body posture detection method, system, electronic equipment and storage medium
CN112101490A (en) * 2020-11-20 2020-12-18 支付宝(杭州)信息技术有限公司 Thermodynamic diagram conversion model training method and device
CN112101490B (en) * 2020-11-20 2021-03-02 支付宝(杭州)信息技术有限公司 Thermodynamic diagram conversion model training method and device
CN112528858A (en) * 2020-12-10 2021-03-19 北京百度网讯科技有限公司 Training method, device, equipment, medium and product of human body posture estimation model
CN112651316A (en) * 2020-12-18 2021-04-13 上海交通大学 Two-dimensional and three-dimensional multi-person attitude estimation system and method
CN112597955B (en) * 2020-12-30 2023-06-02 华侨大学 Single-stage multi-person gesture estimation method based on feature pyramid network
CN112597955A (en) * 2020-12-30 2021-04-02 华侨大学 Single-stage multi-person attitude estimation method based on feature pyramid network
CN112862920A (en) * 2021-02-18 2021-05-28 清华大学 Human body image generation method and system based on hand-drawn sketch
CN112837336A (en) * 2021-02-23 2021-05-25 浙大宁波理工学院 Method and system for estimating and acquiring room layout based on heat map correction of key points
CN112837336B (en) * 2021-02-23 2022-02-22 浙大宁波理工学院 Method and system for estimating and acquiring room layout based on heat map correction of key points
CN112926648B (en) * 2021-02-24 2021-11-16 北京优创新港科技股份有限公司 Method and device for detecting abnormality of tobacco leaf tip in tobacco leaf baking process
CN112926648A (en) * 2021-02-24 2021-06-08 北京优创新港科技股份有限公司 Method and device for detecting abnormality of tobacco leaf tip in tobacco leaf baking process
CN113076891B (en) * 2021-04-09 2023-08-22 华南理工大学 Human body posture prediction method and system based on improved high-resolution network
CN113076891A (en) * 2021-04-09 2021-07-06 华南理工大学 Human body posture prediction method and system based on improved high-resolution network
CN113128436B (en) * 2021-04-27 2022-04-01 北京百度网讯科技有限公司 Method and device for detecting key points
CN113159198A (en) * 2021-04-27 2021-07-23 上海芯物科技有限公司 Target detection method, device, equipment and storage medium
CN113128436A (en) * 2021-04-27 2021-07-16 北京百度网讯科技有限公司 Method and device for detecting key points
CN113011402A (en) * 2021-04-30 2021-06-22 中国科学院自动化研究所 System and method for estimating postures of primates based on convolutional neural network
CN113343762A (en) * 2021-05-07 2021-09-03 北京邮电大学 Human body posture estimation grouping model training method, posture estimation method and device
CN113343762B (en) * 2021-05-07 2022-03-29 北京邮电大学 Human body posture estimation grouping model training method, posture estimation method and device
CN113537234A (en) * 2021-06-10 2021-10-22 浙江大华技术股份有限公司 Quantity counting method and device, electronic device and computer equipment
CN114463534A (en) * 2021-12-28 2022-05-10 佳都科技集团股份有限公司 Target key point detection method, device, equipment and storage medium
CN114359974A (en) * 2022-03-08 2022-04-15 广东履安实业有限公司 Human body posture detection method and device and storage medium
CN114359974B (en) * 2022-03-08 2022-06-07 广东履安实业有限公司 Human body posture detection method and device and storage medium
CN114863237B (en) * 2022-03-25 2023-07-14 中国人民解放军国防科技大学 Method and system for recognizing swimming gesture
CN114863237A (en) * 2022-03-25 2022-08-05 中国人民解放军国防科技大学 Method and system for recognizing swimming postures
CN115272992A (en) * 2022-09-30 2022-11-01 松立控股集团股份有限公司 Vehicle attitude estimation method
CN115331153A (en) * 2022-10-12 2022-11-11 山东省第二人民医院(山东省耳鼻喉医院、山东省耳鼻喉研究所) Posture monitoring method for assisting vestibule rehabilitation training
CN116645699A (en) * 2023-07-27 2023-08-25 杭州华橙软件技术有限公司 Key point detection method, device, terminal and computer readable storage medium
CN116645699B (en) * 2023-07-27 2023-09-29 杭州华橙软件技术有限公司 Key point detection method, device, terminal and computer readable storage medium
CN117437433A (en) * 2023-12-07 2024-01-23 苏州铸正机器人有限公司 Sub-pixel level key point detection method and device
CN117437433B (en) * 2023-12-07 2024-03-19 苏州铸正机器人有限公司 Sub-pixel level key point detection method and device

Also Published As

Publication number Publication date
CN111191622B (en) 2023-05-26

Similar Documents

Publication Publication Date Title
CN111191622B (en) Gesture recognition method, system and storage medium based on thermodynamic diagram and offset vector
CN112597941B (en) Face recognition method and device and electronic equipment
Sapp et al. Parsing human motion with stretchable models
JP6639123B2 (en) Image processing apparatus, image processing method, and program
Yan et al. Crowd counting via perspective-guided fractional-dilation convolution
CN107886069A (en) A kind of multiple target human body 2D gesture real-time detection systems and detection method
Zhu et al. Convolutional relation network for skeleton-based action recognition
Ma et al. Ppt: token-pruned pose transformer for monocular and multi-view human pose estimation
CN110555481A (en) Portrait style identification method and device and computer readable storage medium
CN112784810B (en) Gesture recognition method, gesture recognition device, computer equipment and storage medium
CN111539941B (en) Parkinson&#39;s disease leg flexibility task evaluation method and system, storage medium and terminal
CN110874865A (en) Three-dimensional skeleton generation method and computer equipment
CN112257526A (en) Action identification method based on feature interactive learning and terminal equipment
CN112036260A (en) Expression recognition method and system for multi-scale sub-block aggregation in natural environment
CN117238034A (en) Human body posture estimation method based on space-time transducer
Nguyen et al. Combined YOLOv5 and HRNet for high accuracy 2D keypoint and human pose estimation
CN114612545A (en) Image analysis method and training method, device, equipment and medium of related model
Dong et al. ADORE: An adaptive holons representation framework for human pose estimation
CN112329662B (en) Multi-view saliency estimation method based on unsupervised learning
CN112199994B (en) Method and device for detecting interaction of3D hand and unknown object in RGB video in real time
CN116664677B (en) Sight estimation method based on super-resolution reconstruction
CN116434010A (en) Multi-view pedestrian attribute identification method
CN113343762B (en) Human body posture estimation grouping model training method, posture estimation method and device
CN114841887A (en) Image restoration quality evaluation method based on multi-level difference learning
CN118119971A (en) Electronic device and method for determining height of person using neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant