CN112101314A

CN112101314A - Human body posture recognition method and device based on mobile terminal

Info

Publication number: CN112101314A
Application number: CN202011281717.XA
Authority: CN
Inventors: 裘实
Original assignee: Health Hope (beijing) Technology Co ltd
Current assignee: Health Hope (beijing) Technology Co ltd
Priority date: 2020-11-17
Filing date: 2020-11-17
Publication date: 2020-12-18
Anticipated expiration: 2040-11-17
Also published as: CN112101314B

Abstract

The invention relates to a human body posture identification method and a human body posture identification device based on a mobile terminal, wherein the position parameters of a human body in an image to be identified are obtained by acquiring the image to be identified and extracting the characteristics of the image to be identified by utilizing a neural network comprising a plurality of bottleeck modules and an spp module; then determining key point information of the human body in the image to be identified according to the image to be identified and the position parameter; and determining the result of the human body posture recognition in the image to be recognized according to the key point information of the human body in the image to be recognized. The human body posture recognition algorithm can be deployed on equipment with limited computing power, such as a mobile terminal and the like, because the characteristic extraction of the image to be recognized is carried out by utilizing the neural network comprising a plurality of bottleeck modules and an spp module, and the neural network is more suitable for the mobile terminal in the aspects of operation speed and precision compared with the neural network consisting of a VGG module and a ResNet module.

Description

Human body posture recognition method and device based on mobile terminal

Technical Field

The invention relates to the technical field of computer vision, in particular to a human body posture identification method and device based on a mobile terminal.

Background

At present, in the field of computer vision, through a deep learning method, a machine can understand multimedia information, particularly information related to people, such as face recognition, pedestrian recognition and the like, at a higher level. However, most of the prior art only focuses on human faces, and the focus on body or human body posture is still in the starting stage, especially on the algorithm of terminal equipment (such as a mobile terminal). Due to the lack of high-performance algorithms suitable for terminal equipment, many embodiments based on human body posture recognition cannot be really realized.

In summary, the existing human body gesture recognition algorithm cannot be deployed on devices with limited computing power (such as a mobile terminal) such as a terminal device, and cannot support embodiment applications taking the algorithm as a core. Or even if the gesture recognition system can be deployed on some terminal equipment with limited computing power, the gesture recognition accuracy is low, and the requirements of the embodiment application cannot be met.

In addition, the related application based on gesture recognition is not realized on the terminal equipment in the prior art.

Disclosure of Invention

The invention aims to solve the technical problem that the existing human body posture recognition algorithm cannot be deployed on equipment with limited computing power, such as terminal equipment, and provides a human body posture recognition method and a human body posture recognition device based on a mobile terminal aiming at the defects in the prior art.

In order to solve the technical problem, the invention provides a human body posture identification method based on a mobile terminal, which comprises the following steps:

acquiring an image to be identified;

performing feature extraction on the image to be recognized by using a neural network comprising a plurality of bottleeck modules and a spp module to obtain position parameters of a human body in the image to be recognized, wherein the position parameters comprise position coordinates and the length and width of an area where the human body is located in the image to be recognized;

determining key point information of a human body in the image to be recognized according to the image to be recognized and the position parameters;

and determining the result of human body posture recognition in the image to be recognized according to the key point information of the human body in the image to be recognized.

In a possible implementation manner, the performing feature extraction on the image to be recognized by using a neural network including a plurality of bottleeck modules and a spp module to obtain a position parameter of a human body in the image to be recognized includes:

utilizing a first bottleeck module to perform feature extraction on the image to be identified to obtain a first low-dimensional feature map;

performing feature extraction on the first low-dimensional feature map by using a second bottleeck module to obtain a first medium-dimensional feature map;

performing feature extraction on the middle-dimensional feature map by using the spp module to obtain a first high-dimensional feature map;

performing feature fusion on the first low-dimensional feature, the first medium-dimensional feature and the first high-dimensional feature to obtain a fused feature map;

and determining the position parameters of the human body in the image to be recognized according to the fusion feature map.

In a possible implementation manner, the performing feature fusion on the first low-dimensional feature, the first mid-dimensional feature, and the first high-dimensional feature to obtain a fused feature map includes:

sequentially processing the first high-dimensional feature map by using a third bottleeck module, a 1 × 1 convolution kernel and an up-sampling module, and performing feature fusion on the processed feature map and the first medium-dimensional feature map to obtain a second medium-dimensional feature map;

sequentially processing the second middle-dimensional feature map by using a fourth bottleeck module, a 1 × 1 convolution kernel and an up-sampling module, and performing feature fusion on the processed feature map and the first low-dimensional feature map to obtain a second low-dimensional feature map;

processing the second low-dimensional feature map in sequence by using a fifth bottleeck module and a convolution kernel of 3 × 3, and performing feature fusion on the feature map obtained after processing and the feature map obtained after processing the second medium-dimensional feature map by using a convolution kernel of 1 × 1 to obtain a third medium-dimensional feature map;

processing the third medium-dimensional feature map in sequence by using a sixth bottleeck module and a convolution kernel of 3 × 3, and performing feature fusion on the feature map obtained after processing and the feature map obtained after processing the first high-dimensional feature map in sequence by using the third bottleeck module and a convolution kernel of 1 × 1 to obtain a second high-dimensional feature map;

wherein the second low-dimensional feature map, the third middle-dimensional feature map, and the second high-dimensional feature map are the fused feature map.

In a possible implementation manner, the determining, according to the fused feature map, a position parameter of a human body in the image to be recognized includes:

processing the second low-dimensional feature map by using the fifth bottleeck module and a convolution kernel of 1 x 1 to obtain a first output feature map;

processing the third medium-dimensional feature map by using the sixth bottleeck module and a convolution kernel of 1 x 1 to obtain a second output feature map;

processing the second high-dimensional feature map by using a seventh bottleeck module and a convolution kernel of 1 x 1 to obtain a third output feature map;

and determining the position parameters of the human body in the image to be recognized according to the first output characteristic diagram, the second output characteristic diagram and the third output characteristic diagram.

In a possible implementation manner, the determining, according to the key point information of the human body in the image to be recognized, a result of recognizing the posture of the human body in the image to be recognized includes:

inputting the key point information into a classifier, wherein the structure of the classifier at least comprises a double-channel pooling mode, a first full-connection network, a nonlinear activation function, a second full-connection network and a normalization function, the number of input neurons of the first full-connection network is an input dimension, the number of output neurons of the first full-connection network is a parameter value obtained by training, the number of input neurons of the second full-connection network is a dimension output by an upper network, and the number of output neurons of the second full-connection network is a preset number of each posture category;

through two-channel pooling in the classifier, two preset pooling modes are adopted to respectively carry out dimensionality reduction processing on the key point information, the characteristics subjected to dimensionality reduction in the two preset pooling modes are spliced, the spliced characteristics are subjected to characteristic extraction sequentially through a first full-connection network, a nonlinear activation function and a second full-connection network, posture scores of the human body in the image to be recognized, which respectively correspond to the preset posture categories, are obtained, the posture scores are normalized to a preset value range through a normalization function, and the posture recognition result of the image to be recognized is output according to the normalized posture scores.

In one possible implementation, the key point information includes position information of at least two key points in the human body;

the determining the result of the human body posture recognition in the image to be recognized according to the key point information of the human body in the image to be recognized comprises the following steps:

determining the relative position relationship between the at least two key points according to the position information of the at least two key points;

and determining the result of human body posture recognition in the image to be recognized according to the relative position relationship between the at least two key points.

In one possible implementation manner, the key point information includes position information of at least two head key points, position information of at least two torso key points, position information of at least two left-hand key points, position information of at least two right-hand key points, position information of at least two left-leg key points, and position information of at least two right-leg key points in the human body;

determining a relative position relationship between the at least two key points according to the position information of the at least two key points, including:

acquiring first coordinates of the at least two head key points in the image to be recognized according to the position information of the at least two head key points, wherein the first coordinates are average values of horizontal coordinates and vertical coordinates of the at least two head key points in the image to be recognized;

acquiring second coordinates of the at least two trunk key points in the image to be recognized according to the position information of the at least two trunk key points, wherein the second coordinates are average values of horizontal coordinates and vertical coordinates of the at least two trunk key points in the image to be recognized;

acquiring third coordinates of the at least two left-hand key points in the image to be recognized according to the position information of the at least two left-hand key points, wherein the third coordinates are average values of horizontal coordinates and vertical coordinates of the at least two left-hand key points in the image to be recognized;

acquiring fourth coordinates of the at least two right-hand key points in the image to be recognized according to the position information of the at least two right-hand key points, wherein the fourth coordinates are average values of horizontal coordinates and vertical coordinates of the at least two right-hand key points in the image to be recognized;

acquiring a fifth coordinate of the at least two left leg key points in the image to be recognized according to the position information of the at least two left leg key points, wherein the fifth coordinate is an average value of a horizontal coordinate and a vertical coordinate of the at least two left leg key points in the image to be recognized;

acquiring a sixth coordinate of the at least two right leg key points in the image to be recognized according to the position information of the at least two right leg key points, wherein the sixth coordinate is an average value of a horizontal coordinate and a vertical coordinate of the at least two right leg key points in the image to be recognized;

and determining the height relationship among the first coordinate, the second coordinate, the third coordinate, the fourth coordinate, the fifth coordinate and the sixth coordinate and/or the included angle relationship among connecting lines as the relative position relationship between the at least two key points.

The invention also provides a human body posture recognition device based on the mobile terminal, which comprises:

the acquisition module is used for acquiring an image to be identified;

the position parameter determining module is used for performing feature extraction on the image to be recognized by using a neural network comprising a plurality of bottleeck modules and a spp module to obtain position parameters of a human body in the image to be recognized, wherein the position parameters comprise position coordinates and the length and width of an area where the human body is located in the image to be recognized;

the key point information determining module is used for determining key point information of a human body in the image to be identified according to the image to be identified and the position parameter;

and the recognition result determining module is used for determining the result of human posture recognition in the image to be recognized according to the key point information of the human body in the image to be recognized.

The invention also provides a human body posture recognition device based on the mobile terminal, which comprises: at least one memory and at least one processor;

the at least one memory to store a machine readable program;

the at least one processor is configured to invoke the machine readable program to perform the method as described above.

The invention also provides a computer readable medium having stored thereon computer instructions which, when executed by a processor, cause the processor to perform the method as described above.

The human body posture recognition method and the human body posture recognition device based on the mobile terminal have the following beneficial effects that:

the method has the advantages that the neural network comprising the plurality of bottleeck modules and the spp module is used for extracting the features of the image to be recognized, and the neural network is more suitable for the mobile terminal in terms of operation speed and accuracy compared with the neural network formed by the VGG module ResNet module, so that the human posture recognition algorithm can be deployed on equipment with limited computing power, such as the mobile terminal.

Drawings

Fig. 1 is a flowchart of a human body gesture recognition method based on a mobile terminal according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a device where a human body posture recognition apparatus based on a mobile terminal according to an embodiment of the present invention is located;

fig. 3 is a schematic diagram of a human body posture recognition device based on a mobile terminal according to an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

The inventor finds that, in the process of implementing the invention, in the solutions provided by the related art, the existing human body posture recognition algorithm cannot be deployed on devices with limited computing power (such as a mobile terminal) such as a terminal device, and cannot support the embodiment application taking the algorithm as a core. Or even if the gesture recognition system can be deployed on some terminal equipment with limited computing power, the gesture recognition accuracy is low, and the requirements of the embodiment application cannot be met.

Based on this, the embodiment of the invention provides a human body posture recognition method and device, which can deploy a human body posture recognition algorithm on equipment with limited computing power, such as a mobile terminal.

As shown in fig. 1, a method for recognizing a human body gesture based on a mobile terminal according to an embodiment of the present invention includes:

step 101, obtaining an image to be identified.

The image to be recognized is generated by shooting a plurality of persons, so that the subsequent human body posture recognition is performed on the image to be recognized containing the plurality of persons.

The image to be recognized may be derived from an image acquired by the mobile terminal in real time, for example, the mobile terminal is a smart phone, and the smart phone is configured with a camera, or may be an image pre-stored by the mobile terminal.

In other words, for the human body posture recognition device deployed in the mobile terminal, the image to be recognized acquired in real time may be acquired, so as to perform human body posture recognition on the image to be recognized in real time, and the image to be recognized acquired in a historical time period may also be acquired, so as to perform human body posture recognition on the image to be recognized when the processing task is few, or perform human body posture recognition on the image to be recognized under the instruction of the operator, which is not specifically limited in this embodiment.

Further, for the camera shooting assembly configured for the mobile terminal, if the camera shooting assembly can be used as an independent device, such as a camera, a video recorder, and the like, the camera shooting assembly can be arranged around the environment where the plurality of people are located, so as to shoot the plurality of people from different angles, thereby obtaining images to be recognized reflecting the plurality of people from different angles, and being beneficial to ensuring the accuracy of subsequent gesture recognition.

It should be noted that the shooting may be a single shooting or a continuous shooting, and accordingly, in the case of a single shooting, the obtained image to be recognized is a picture, and in the case of a continuous shooting, a video including a plurality of images to be recognized is obtained. Therefore, in each embodiment of the present invention, the image to be recognized for human body gesture recognition may be a single picture shot at a time, or may also be a certain image to be recognized in a section of video shot continuously, which is not specifically limited in the present invention.

When the human body posture is recognized by the mobile terminal, a neural network for recognizing the position parameter of the human body and a neural network for recognizing the human body posture are preset in the mobile terminal. Correspondingly, the training process of the neural network is trained aiming at the preset human body and human body posture key point information of the mobile terminal. Specifically, for example, a key point information map of a labeled human body is shot, more than 20 million pictures of various actions and postures of the human body are collected, wherein the pictures comprise teenagers, young people, middle-aged people and old people, an indoor scene accounts for 45%, 55% of the indoor scene is an outdoor scene, the ratio of male to female is 1:1.25, the ratio of children to young people to middle-aged people to old people is 2: 2.3: 2.7: 3.2, and the pictures are subjected to operations of translation, turning, gray scale, sharpening and the like by using image processing tools such as opencv and the like, so that the generalization capability of neural network recognition for recognizing position parameters of the human body and neural network recognition for recognizing postures of the human body is increased. For example, the sgd optimizer is used to train the neural network for recognizing the position parameters of the human body for 2 weeks and the neural network for recognizing the posture of the human body for 3 weeks to ensure that the loss function is minimized to reach the fitting state. After the training is finished, 5w pictures can be used for testing, the mAP (mean Average Precision) of the mobile terminal is 0.65, the Average transmission frame per second of the mobile terminal is 22fps, and the real-time performance and the accuracy are basically met.

102, performing feature extraction on an image to be recognized by using a neural network comprising a plurality of bottleeck modules and a spp module to obtain position parameters of a human body in the image to be recognized.

In this step, the position parameters include position coordinates, and the length and width of the region where the human body is located in the image to be recognized. That is, feature extraction is performed on an image to be recognized by using a neural network including a plurality of bounding boxes and a spp module, so that bounding boxes (bounding boxes) of a human body in the image to be recognized can be obtained.

It should be noted that the bottleeck module and the spp module are well known to those skilled in the art, and detailed structural components of both are not described herein. The bottleeck module utilizes the characteristic fusion of high-dimensional characteristics and low-dimensional characteristics of an input image to increase the fitting performance to a nonlinear problem. In addition, when the human body is subjected to target detection, the overall appearance characteristics of the human body are most important, and a large Receptive Field (RF) is required, so that in the scheme, a bottomless module is used to deepen the complexity and the Depth of the neural network, and meanwhile, a DW convolution kernel (Depth Wise Separable convolution kernel) in the bottomless module can reduce the parameter number of the human body posture recognition model, improve the operation speed, further reduce the calculation time of the neural network in a mobile terminal, reduce 35.4% of time operation in experimental tests, reduce the parameter amount by 83%, and ensure that the precision loss is not more than 2.3%.

In addition, the spp module uses channel splitting to separate three branches, and the MaxPooling layers with feature dimensionality reduction, reduce the parameter volume of the neural network while extracting the most representative features and preserving the position and rotation invariance of the features, which has a greater effect on processing global features, pass through 5X 5, 9X 9 and 13X 13 MaxPooling layers, because under cpu conditions, equally perceived 3X 3 convolution requires multiple layers, which 3X 3 convolution computation speed is 23.6% slower than single 5X 5 or 9X 9 or 13X 13, and through the use of the above size MaxPooling layers, the computation time is reduced by 3.6% compared to the MaxPooling layers with 3X 3 without loss of accuracy.

In an embodiment of the present invention, step 102 may specifically include:

performing feature extraction on an image to be recognized by using a first bottleeck module to obtain a first low-dimensional feature map;

performing feature extraction on the middle-dimensional feature map by using an spp module to obtain a first high-dimensional feature map;

performing feature fusion on the first low-dimensional feature, the first medium-dimensional feature and the first high-dimensional feature to obtain a fusion feature map;

and determining the position parameters of the human body in the image to be recognized according to the fusion characteristic graph.

In the embodiment, the high-dimensional features are used for representing all features of a human body in the image to be recognized, the low-dimensional features are used for representing local features of the human body in the image to be recognized, and the low-dimensional features can capture details of the image, so that the high-dimensional feature map and the low-dimensional feature map are fused, the global features of the human body in the image to be recognized and the local features focused on the human body can be effectively fused together, and the feature robustness of the fused feature map can be greatly enhanced.

In order to further improve the feature robustness of the fused feature map, the fused feature map can be obtained as follows:

sequentially processing the first high-dimensional feature map by using a third bottleeck module, a 1 × 1 convolution kernel and an up-sampling module, and performing feature fusion on the processed feature map and the first middle-dimensional feature map to obtain a second middle-dimensional feature map;

sequentially processing the second low-dimensional feature map by using a fifth bottleeck module and a convolution kernel of 3 x 3, and performing feature fusion on the feature map obtained after processing and the feature map obtained after processing the second medium-dimensional feature map by using a convolution kernel of 1 x 1 to obtain a third medium-dimensional feature map;

and the second low-dimensional feature map, the third middle-dimensional feature map and the second high-dimensional feature map are all fusion feature maps.

In this embodiment, the fusion of feature maps with different feature dimensions is realized by using the bottleeck module in combination with the convolution kernel of 1 × 1 and the upsampling module to obtain the global features and the local features focused with different emphasis fused to the human body in the image to be recognized, so that the fused feature map can be enhanced to include more global features and local features, thereby being beneficial to improving the recognition accuracy of the neural network.

In order to improve the recognition accuracy of the position parameters of the human body in the image to be recognized, outputting feature maps with multiple dimensions may be considered.

In one embodiment of the present invention, determining the position parameter of the human body in the image to be recognized according to the fused feature map includes:

processing the second low-dimensional feature map by using a fifth bottleeck module and a convolution kernel of 1 x 1 to obtain a first output feature map;

processing the third middle-dimensional feature map by using a sixth bottleeck module and a 1 x 1 convolution kernel to obtain a second output feature map;

processing the second high-dimensional feature map by using a seventh bottleeck module and a 1 × 1 convolution kernel to obtain a third output feature map;

In the embodiment of the invention, the fifth bottleeck module is used for extracting the features of the second low-dimensional feature map, so that the obtained first output feature map contains more global features; performing feature extraction on the third medium-dimensional feature map by using a sixth bottleeck module, so that the obtained second output feature map contains more global features; performing feature extraction on the second high-dimensional feature map by using a seventh bottleeck module, so that the obtained third output feature map contains more global features; by further extracting the features of the fused feature map, the recognition accuracy of the position parameters of the human body in the image to be recognized can be higher.

And 103, determining key point information of the human body in the image to be recognized according to the image to be recognized and the position parameter.

In one embodiment, the neural network for recognizing the human body gesture may adopt a neural network model as disclosed in the patent with publication number CN110298257A, which is not described herein.

And step 104, determining the result of human posture recognition in the image to be recognized according to the key point information of the human body in the image to be recognized.

In this step, the result of determining the human body posture recognition in the image to be recognized may have the following two embodiments.

To reduce the loss of feature dimensionality reduction and to enable redundant features to be removed, in one embodiment of the present invention, step 104 comprises:

inputting the key point information into a classifier, wherein the structure of the classifier at least comprises a double-channel pooling mode, a first full-connection network, a nonlinear activation function, a second full-connection network and a normalization function, the number of input neurons of the first full-connection network is an input dimension, the number of output neurons is a parameter value obtained by training, the number of input neurons of the second full-connection network is a dimension output by an upper network, and the number of output neurons is a preset number of each posture category;

performing dimensionality reduction treatment on key point information by adopting two preset pooling modes through two-channel pooling in the classifier, splicing the dimensionality reduced characteristics of the two preset pooling modes, sequentially performing characteristic extraction on the spliced characteristics through a first full-connection network, a nonlinear activation function and a second full-connection network, obtaining posture scores of a human body in an image to be recognized, which respectively correspond to each preset posture category, normalizing the posture scores into a preset value range through a normalization function, and outputting a posture recognition result of the image to be recognized according to the normalized posture scores.

In the embodiment of the present invention, the dual-channel pooling is used for removing redundant features, and in order to reduce loss of feature dimension reduction, the dual-channel pooling is adopted in the embodiment of the present invention, and a basic pooling method called by the dual-channel pooling mode may be Mean (Mean) pooling, Maximum (MAX) pooling, or Mean-maximum (Mean-MAX) pooling, where Mean-MAX pooling refers to calling MAX pooling and Mean pooling respectively to implement the dual-channel pooling mode.

The number of input neurons of the first fully-connected network is determined by input dimensions, the number of output neurons is a parameter value obtained by training, the number of output neurons of the first fully-connected network is a hyper-parameter, and optimization needs to be selected in a training stage. The number of input neurons of the second fully-connected network is the dimension of upper-layer network output, and the number of output neurons is the number of preset posture categories.

The non-linear activation function is to increase a non-linear relationship between network structure layers, and for example, a modified linear unit (ReLU) method may be adopted, which is not limited in the embodiment of the present invention.

The normalization function is used for normalizing the attitude score to a preset value range, so that the attitude score belonging to each attitude category can be evaluated more visually, for example, a SoftMax method can be adopted, and the attitude score can be normalized to be between 0 and 1.

It should be noted that the preset gesture categories may include, for example, the following: squatting, raising legs, kneeling, offering, crawling, lying prone, lying down, bending down, standing and sitting, which are not limited herein.

In another embodiment of the present invention, if the keypoint information comprises position information of at least two keypoints in the human body, step 104 comprises:

determining the relative position relationship between at least two key points according to the position information of the at least two key points;

and determining the result of human body posture recognition in the image to be recognized according to the relative position relationship between at least two key points.

In the embodiment of the invention, when the posture of the human body changes, the relative positions of key points of each part in the human body also change, and certain conditions are met aiming at the relative positions of the key points which conform to one human body posture. For example, when the human body is standing, the head key point may be higher than the trunk key point, or a connection line between the head key point and the trunk key point may be at an angle. Based on the principle, the scheme provided by the invention determines the human body posture of the target human body according to the position relation among the key points in the target human body.

In another embodiment of the present invention, the key point information includes position information of at least two head key points, position information of at least two torso key points, position information of at least two left-hand key points, position information of at least two right-hand key points, position information of at least two left-leg key points, and position information of at least two right-leg key points in the human body;

determining a relative position relationship between at least two key points according to the position information of the at least two key points, including:

acquiring a fifth coordinate of the at least two left leg key points in the image to be recognized according to the position information of the at least two left leg key points, wherein the fifth coordinate is an average value of the abscissa and the ordinate of the at least two left leg key points in the image to be recognized;

acquiring a sixth coordinate of the at least two right leg key points in the image to be recognized according to the position information of the at least two right leg key points, wherein the sixth coordinate is an average value of the abscissa and the ordinate of the at least two right leg key points in the image to be recognized;

and determining the height relationship among the first coordinate, the second coordinate, the third coordinate, the fourth coordinate, the fifth coordinate and the sixth coordinate and/or the included angle relationship among the connecting lines as the relative position relationship between at least two key points.

In the embodiment of the invention, the relative position relationship between at least two key points can be more accurately determined by the height relationship among the coordinates of the head key point, the trunk key point, the left-hand key point, the right-hand key point, the left-leg key point and the right-leg key point and/or the included angle relationship among the connecting lines, so that the determination of the human body posture of the target human body is facilitated.

For example, since the vertical direction in the image captured by the image capturing component is generally the same as the vertical direction in the actual three-dimensional space, in the coordinate system with the vertical direction in the image to be recognized as the ordinate axis direction, when the first coordinate of the head key point of the target human body is higher than the second coordinate of the trunk key point, it is indicated that the head of the target human body is generally flush with the lower body or higher than the trunk, and at this time, the human body posture of the target human body may be considered as the standing posture.

For another example, when the human body is in a falling posture, the head, shoulder and crotch of the human body are close to the ground, so the heights of the head, shoulder and crotch on the same side of the human body are substantially equal, and correspondingly, the included angle between the connecting line between the head, shoulder and crotch on the same side of the human body and the vertical direction in the image to be recognized is larger, and the included angle between the connecting line and the horizontal direction in the image to be recognized is smaller. Otherwise, if the obtained included angle is not larger than the preset angle threshold value, the human body posture of the target human body is determined to be a non-falling posture.

It should be noted that the at least two head key points may include: any two or more than two of 8 key points such as a left eye key point, a right eye key point, a left ear key point, a right ear key point, a nose key point, a neck key point, a left shoulder key point, a right shoulder key point and the like.

In summary, in the human body posture identification method based on the mobile terminal provided in the embodiment of the present invention, by acquiring an image to be identified, a neural network including a plurality of bottleeck modules (a first bottleeck module, a second bottleeck module, etc.) and a spp module is used to perform feature extraction on the image to be identified, so as to obtain a position parameter of a human body in the image to be identified, where the position parameter includes a position coordinate, and a length and a width of an area where the human body is located in the image to be identified; then determining key point information of the human body in the image to be identified according to the image to be identified and the position parameter; and determining the result of the human body posture recognition in the image to be recognized according to the key point information of the human body in the image to be recognized. The human body posture recognition algorithm can be deployed on equipment with limited computing power, such as a mobile terminal and the like, because the characteristic extraction of the image to be recognized is carried out by utilizing the neural network comprising a plurality of bottleeck modules and an spp module, and the neural network is more suitable for the mobile terminal in the aspects of operation speed and precision compared with the neural network consisting of a VGG module and a ResNet module.

As shown in fig. 2 and 3, the embodiment of the invention provides a device where a human body posture recognition device based on a mobile terminal is located and a human body posture recognition device based on a mobile terminal. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. From a hardware level, as shown in fig. 2, a hardware structure diagram of a device in which a human body posture recognition apparatus based on a mobile terminal according to an embodiment of the present invention is located is provided, where the device in the embodiment may further include other hardware, such as a forwarding chip responsible for processing a packet, in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 2. Taking a software implementation as an example, as shown in fig. 3, as a logical apparatus, the apparatus is formed by reading, by a CPU of a device in which the apparatus is located, corresponding computer program instructions in a non-volatile memory into a memory for execution.

As shown in fig. 3, the human body gesture recognition apparatus based on a mobile terminal provided in this embodiment includes:

an obtaining module 301, configured to obtain an image to be identified;

a position parameter determining module 302, configured to perform feature extraction on the image to be recognized by using a neural network including a plurality of bottleeck modules and an spp module, to obtain a position parameter of a human body in the image to be recognized, where the position parameter includes a position coordinate, and a length and a width of an area where the human body is located in the image to be recognized;

a key point information determining module 303, configured to determine key point information of a human body in the image to be recognized according to the image to be recognized and the position parameter;

and the recognition result determining module 304 is configured to determine a result of human posture recognition in the image to be recognized according to the key point information of the human body in the image to be recognized.

In this embodiment of the present invention, the obtaining module 301 may be configured to perform step 101 in the foregoing method embodiment, and the location parameter determining module 302 may be configured to perform step 102 in the foregoing method embodiment; the key point information determining module 303 may be configured to perform step 103 in the above method embodiment; the recognition result determination module 304 may be used to perform step 104 in the above-described method embodiments.

In an embodiment of the present invention, the location parameter determining module 302 is configured to perform the following operations:

In an embodiment of the present invention, when performing the feature fusion on the first low-dimensional feature, the first middle-dimensional feature and the first high-dimensional feature to obtain a fused feature map, the location parameter determining module 302 is configured to perform the following operations:

In an embodiment of the present invention, the recognition result determining module 304 is configured to perform the following operations:

In one embodiment of the present invention, the key point information includes position information of at least two key points in the human body;

the recognition result determining module 304 is configured to perform the following operations:

In one embodiment of the present invention, the key point information includes position information of at least two head key points, position information of at least two torso key points, position information of at least two left-hand key points, position information of at least two right-hand key points, position information of at least two left-leg key points, and position information of at least two right-leg key points in the human body;

the recognition result determining module 304 is configured to, when performing the determining of the relative position relationship between the at least two key points according to the position information of the at least two key points, perform the following operations:

It is understood that the structure illustrated in the embodiment of the present invention does not constitute a specific limitation to the human body posture recognition device based on the mobile terminal. In other embodiments of the present invention, the mobile terminal based human body gesture recognition apparatus may include more or fewer components than those shown, or combine some components, or split some components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Because the content of information interaction, execution process, and the like among the modules in the device is based on the same concept as the method embodiment of the present invention, specific content can be referred to the description in the method embodiment of the present invention, and is not described herein again.

The embodiment of the invention also provides a human body posture recognition device based on the mobile terminal, which comprises: at least one memory and at least one processor;

the at least one memory to store a machine readable program;

the at least one processor is configured to call the machine-readable program to execute the human body posture recognition method based on the mobile terminal in any embodiment of the present invention.

Embodiments of the present invention also provide a computer-readable medium storing instructions for causing a computer to execute the method for recognizing a human body posture based on a mobile terminal as described herein. Specifically, a method or an apparatus equipped with a storage medium on which a software program code that realizes the functions of any of the above-described embodiments is stored may be provided, and a computer (or a CPU or MPU) of the method or the apparatus is caused to read out and execute the program code stored in the storage medium.

In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.

Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.

Further, it should be clear that the functions of any one of the above-described embodiments can be implemented not only by executing the program code read out by the computer, but also by performing a part or all of the actual operations by an operation method or the like operating on the computer based on instructions of the program code.

Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A human body posture identification method based on a mobile terminal is characterized by comprising the following steps:

acquiring an image to be identified;

2. The method according to claim 1, wherein the extracting features of the image to be recognized by using a neural network comprising a plurality of bottleeck modules and a spp module to obtain the position parameters of the human body in the image to be recognized comprises:

3. The method of claim 2, wherein said feature fusing the first low-dimensional feature, the first mid-dimensional feature, and the first high-dimensional feature to obtain a fused feature map comprises:

4. The method according to claim 3, wherein the determining the position parameter of the human body in the image to be recognized according to the fused feature map comprises:

5. The method according to any one of claims 1 to 4, wherein the determining the result of the human body posture recognition in the image to be recognized according to the key point information of the human body in the image to be recognized comprises:

6. The method according to any one of claims 1-4, wherein the keypoint information comprises position information of at least two keypoints in the human body;

7. The method of claim 6, wherein the keypoint information comprises position information of at least two head keypoints, position information of at least two torso keypoints, position information of at least two left-hand keypoints, position information of at least two right-hand keypoints, position information of at least two left-leg keypoints, and position information of at least two right-leg keypoints in the human body;

8. A human body posture recognition device based on a mobile terminal is characterized by comprising:

the acquisition module is used for acquiring an image to be identified;

9. A human body posture recognition device based on a mobile terminal is characterized by comprising: at least one memory and at least one processor;

the at least one memory to store a machine readable program;

the at least one processor, configured to invoke the machine readable program to perform the method of any of claims 1-7.

10. A computer readable medium having stored thereon computer instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1-7.