CN112861723A

CN112861723A - Physical exercise recognition counting method and device based on human body posture recognition and computer readable storage medium

Info

Publication number: CN112861723A
Application number: CN202110175688.7A
Authority: CN
Inventors: 叶佳林; 延瑾瑜
Original assignee: Beijing Sinoits Tech Co ltd
Current assignee: Beijing Sinoits Tech Co ltd
Priority date: 2021-02-07
Filing date: 2021-02-07
Publication date: 2021-05-28
Anticipated expiration: 2041-02-07
Also published as: CN112861723B

Abstract

The application relates to a physical exercise recognition counting method, a device and a computer readable storage medium based on human body posture recognition, which belong to the field of image processing technology, the recognition counting method comprises the steps of drawing a high-bit line, a low-bit line and a horizontal bar line on an obtained real-time video stream image of a testee with a pull-up direction, carrying out frame-separation storage on the real-time video stream image to obtain an original image, preprocessing the original image, sending the preprocessed original image into a human body skeleton point detection network to obtain a human body skeleton point bitmap, and judging whether the pull-up is finished once according to a skeleton point coordinate to carry out pull-up counting; the identification counting device comprises an original image acquisition module, a detection image acquisition module, a point bitmap acquisition module and a judgment counting module; a computer-readable storage medium is also provided. Compared with the related art, the method and the device have the effect of improving the problem of low counting accuracy.

Description

Physical exercise recognition counting method and device based on human body posture recognition and computer readable storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for recognizing and counting physical exercise based on human body gesture recognition, and a computer-readable storage medium.

Background

With the development of information technology and internet technology, many things in life, study or work are developing in intellectualization, for example, for convenience of counting, there is a pull-up counting device capable of automatically counting. Pull-up counting assembly can be on the basis that the posture is up to standard, automatic counting for need not artifical count, and only the posture has up to standard just can count, do benefit to the exerciser and accomplish the exercise index. The existing chin counting device mainly has three counting modes, namely, infrared counting based on a single chip microcomputer, chin counting based on ultrasonic waves and chin counting based on machine vision.

A pull-up counting mode based on single-chip microcomputer infrared mainly comprises a horizontal bar main body and a control box, wherein a transmitter and a receiver of an infrared sensor are fixed at the two ends of the side part of a horizontal bar of the horizontal bar main body and at the position of the upper part of the horizontal bar main body, a pressure sensor is fixed at the upper part of the side wall of the horizontal bar, an ultrasonic sensor is fixed at the lower part of the side wall of the horizontal bar, and horizontal bar counting is realized through the infrared sensor and the pressure sensor.

The pull-up counting mode based on ultrasonic waves is mainly characterized in that an ultrasonic probe is arranged on a horizontal bar, so that when a person is detected to pull up, the distance between the top of the head and the ultrasonic probe is increased, and whether pull-up action is standard or not is judged according to the distance.

A machine vision-based chin up-counting mode is adopted, a Haar-like characteristic human face and hand classifier is constructed by utilizing an AdaBoost algorithm, the human face area and the centroid coordinate of a human hand are obtained by calculating the invariant moment of an image, and the human face area and the vertical distance between the human face and the centroid of the human hand are respectively taken as thresholds; extracting a motion foreground by using a mixed Gaussian background model, counting the gray level change of an ROI (region of interest) area in an image sequence, then carrying out ellipse skin color detection, calculating the invariant moment of an image to obtain the centroid coordinate of the face in the image sequence, analyzing the gray level value and the centroid coordinate change of each frame, and comparing with a set threshold value to obtain the times of pull-up.

In view of the above-mentioned related technologies, the inventor believes that a pull-up counting mode based on the infrared ray of the single chip microcomputer is prone to damage in rainy or snowy severe weather, and construction and maintenance are difficult; in a pull-up counting mode based on ultrasonic waves, once a person who pulls up lifts his head, counting or failure of counting is easy to happen due to irregular actions, and construction is difficult due to the influence of severe weather; the chin up-counting mode based on machine vision carries out the chin up-counting through traditional image operator, and the robustness is relatively poor, under comparatively complicated environment, like the condition that many people or part shelter from, is difficult to pinpoint the position of people's hand and face, leads to the count accuracy lower.

Disclosure of Invention

In order to solve the problem of low counting accuracy, the application provides a physical exercise recognition counting method and device based on human body posture recognition and a computer readable storage medium.

In a first aspect, the application provides a physical education action recognition counting method based on human body posture recognition, which adopts the following technical scheme:

a physical exercise motion recognition counting method based on human body posture recognition comprises the steps of drawing a high-bit line, a low-bit line and a horizontal bar line on an obtained real-time video stream image of a tested person with upward pull, storing the real-time video stream image at intervals to obtain an original image, preprocessing the original image, sending the original image into a human body bone point detection network to obtain a human body bone point bitmap, judging whether one pull-up is completed according to bone point coordinates to conduct pull-up counting, wherein the bone points comprise a chin, a left shoulder, a right shoulder, a left hand and a right hand, the high-bit line represents the lowest pull-up line of the pull-up motion, and the low-bit line represents the highest restoring line of the pull-up motion.

By adopting the technical scheme, after the video stream image of the testee with the upward chin is obtained, the original image with the high bit line, the low bit line and the horizontal bar line which are drawn is obtained, the original image is sent into a human body bone point detection network to obtain a human body bone point bitmap, so that the position relation between the bone point and the high bit line, the low bit line and the horizontal bar line can be judged for counting, namely whether the chin-up action is finished or not is judged according to the position of the bone point, the influence of the environment can be eliminated to a certain extent, the accuracy of the chin-up counting is improved, and the problem of lower counting accuracy is solved; and various detection devices do not need to be installed on the horizontal bar, so that the construction and maintenance cost can be reduced.

Optionally, the identification counting method specifically includes,

acquiring an original image, namely acquiring a real-time video stream image of a testee with an upward pull body, drawing a high bit line, a low bit line and a horizontal bar line on the video stream image, analyzing the video stream image, and storing the video stream image at intervals of frames to obtain the original image of the testee with the upward pull body;

acquiring a detection image, correcting the original image, and adjusting the resolution to a certain size to obtain the detection image;

acquiring a point bitmap, namely sending a detection map into a human skeleton point detection network to obtain a human skeleton point bitmap; and the number of the first and second groups,

and judging counting, mapping the human body skeleton point bitmap back to the detection graph to obtain the positions of the skeleton points, and judging whether the one-time pull-up is completed or not by combining a high bit line, a low bit line and a horizontal bar line on the detection graph so as to count.

By adopting the technical scheme, the original image is obtained from the video stream image, the original image is corrected and the size of the original image is adjusted to obtain the detection image, the detection image is provided with the drawn high-bit line, the drawn low-bit line and the horizontal bar line, the detection image is sent into the human skeleton point detection network to obtain the human skeleton point bitmap, the human skeleton point bitmap is mapped back to the detection image, the position of the skeleton point can be obtained, the position relation between the skeleton point and the high-bit line, the low-bit line and the horizontal bar line can be easily obtained at the moment, and therefore, whether the pull-up is completed or not is conveniently judged so as to count, and the problem of low counting accuracy can be improved.

Optionally, the human skeleton point detection network includes a human body detection network and a skeleton point detection network, the human body detection network obtains human body contour coordinates by taking a detection map as input, and the skeleton point detection network obtains a human skeleton point bitmap by combining the human body contour coordinates and part of feature information of the human body detection network.

By adopting the technical scheme, the human body detection network extracts the characteristics according to the input detection graph to obtain the human body contour coordinates of the tested person, and the human body contour coordinates are sent to the skeleton point detection network to obtain the human body skeleton point bitmap, so that the human body skeleton point bitmap can be conveniently obtained.

Optionally, the human body detection network includes a backbone network, a feature fusion layer, and an output layer, the backbone network includes a resnet101 network, the method for obtaining the human body contour coordinate by the human body detection network includes,

the detection graph enters a backbone network, and a feature layer P1, a feature layer P2 and a feature layer P3 with different resolutions are generated;

the feature fusion layer takes a feature layer P1, a feature layer P2 and a feature layer P3 as input, and obtains a feature result through multilayer convolution, down sampling, up sampling, pooling layer and connection; and the number of the first and second groups,

the output layer connects the characteristic results and inhibits the non-maximum value to obtain the human body contour coordinate;

the feature layer P1 is partial feature information, the feature layer P1 is a feature layer output by the 21 st layer of convolution, the feature layer P2 is a feature layer output by the 90 th layer of convolution, and the feature layer P3 is a feature layer output by the 99 th layer of convolution.

By adopting the technical scheme, the resnet101 network can obtain deep features of the detection graph to obtain the feature layer P1, the feature layer P2 and the feature layer P3, the three obtained feature layers are used as input by the feature fusion layer, finally, the feature result is output, and the output layer carries out connection and non-maximum suppression processing on the feature result so as to obtain better human body contour coordinates.

Optionally, the skeleton point detection network includes a coordinate mapping layer and a CPM network, the coordinate mapping layer multiplexes feature information of a feature layer P1, and maps the human body contour coordinates back to the feature layer P1 to obtain a single feature map, and the CPM network returns the single feature map as an input to the human body skeleton point bitmap.

By adopting the technical scheme, the characteristic information of the characteristic layer P1 is multiplexed, the human body contour coordinate is mapped to the characteristic layer P1 to obtain the single characteristic diagram, the CPM network carries out regression processing on the single characteristic diagram, and the position information is continuously extracted to supplement the position of each bone point so as to obtain the human body bone point bitmap.

Optionally, the method for obtaining the feature result by using the feature layer P1, the feature layer P2 and the feature layer P3 as input by the feature fusion layer comprises the following steps,

a first convolution process, namely respectively convolving the feature layer P1, the feature layer P2 and the feature layer P3 by 3-3 with the step size of 2 four times to generate a corresponding feature layer P1_1, a corresponding feature layer P2_1 and a corresponding feature layer P3_ 1;

performing first fusion, namely connecting a feature layer A1 obtained by performing 3-3 convolution with the step length of 1 on the feature layer P1_1, a feature layer B1 generated by performing up-sampling on the feature layer P2_1 and a feature layer C1 generated by performing secondary up-sampling on the feature layer P3_1 to generate a feature layer P1_ 2;

performing second fusion, namely connecting a feature layer A2 obtained by performing 3 × 3 convolution with the step size of 2 on the feature layer P1_1, a feature layer B2 generated by performing 3 × 3 convolution with the step size of 1 on the feature layer P2_1 and a feature layer C2 generated by performing primary up-sampling on the feature layer P3_1 to generate a feature layer P2_ 2;

for the third fusion, the feature layer P1_2 is respectively connected through the 3 × 3 convolution with the step size of 2 and then through the feature A3 generated by the pooling layer, the feature layer P2_2 is subjected to one-time down-sampling to generate a feature layer B3, and the feature layer P3_1 is connected through the feature layer C3 generated by the 3 × 3 convolution with the step size of 1 to generate a feature layer P3_ 2;

performing fourth fusion, namely connecting a feature layer A4 generated by three times of 3 × 3 convolution with the step size of 2 on the feature layer P1_1, a feature layer B4 generated by two times of 3 × 3 convolution with the step size of 2 on the feature layer P2_1 and a feature layer C4 generated by one time of 3 × 3 convolution with the step size of 1 on the feature layer P3_1 to generate a feature layer P4_ 2; and the number of the first and second groups,

and a second convolution process, namely performing three times of 3-3 convolution on the feature layer P1_2, the feature layer P2_2, the feature layer P3_2 and the feature layer P4_2 with the step size of 1 to generate a feature layer P1_3, a feature layer P2_3, a feature layer P3_3 and a feature layer P4_3 which respectively correspond to the feature layer P1_2, the feature layer P2_2, the feature layer P4_3, and the feature result comprises a feature layer P1_3, a feature layer P2_3, a feature layer P3_3 and a feature layer P4_ 3.

Optionally, the method for obtaining the human body contour coordinate by connecting and suppressing the non-maximum value of the feature result by the output layer includes,

convolution processing, namely feeding the feature result into 3-by-3 convolution with the step size of 1 to generate a feature layer P1_4, a feature layer P2_4, a feature layer P3_4 and a feature layer P4_4 with different resolutions; and the number of the first and second groups,

and performing connection suppression processing, namely connecting the characteristic layer P1_4, the characteristic layer P2_4, the characteristic layer P3_4 and the characteristic layer P4_4 to generate a coordinate vector, and performing non-maximum suppression on the coordinate vector to obtain human body contour coordinates.

Optionally, the method for judging whether to finish one pull-up includes,

judging a preparation state, and entering a timing preparation state when the distances between the left hand and the horizontal bar line and the distances between the right hand and the horizontal bar line do not exceed a proportion threshold of the original image height proportion;

counting, wherein when the position of the chin is higher than the high bit line, the chin is pulled up, and when the position of the chin is lower than the low bit line, the chin is reduced, and the chin is pulled up and reduced continuously for one time; and

and stopping counting when the distance between the left hand and the right hand from the horizontal bar exceeds a proportion threshold value of the height proportion of the original image.

By adopting the technical scheme, whether the pulling action is performed or not is judged according to the position relation between the chin and the high bit line, whether the reduction action is performed or not is judged according to the position relation between the chin and the low bit line, continuous pulling and reduction actions are performed, one-time counting is performed, whether the upward movement of the pull body is finished or is about to start or not is judged according to the distance between the left hand and the right hand and the horizontal line, and counting is performed on the premise of ensuring the movement standard of a tested person to a certain extent.

In a second aspect, the present application provides a sports motion recognition counting device based on human body posture recognition, which adopts the following technical solution:

a physical exercise motion recognition counting device based on human body posture recognition comprises an original image acquisition module, a detection image acquisition module, a point bitmap acquisition module and a judgment counting module; wherein,

the original image acquisition module is used for acquiring a real-time video stream image of a testee with an upward lead, drawing a high bit line, a low bit line and a horizontal bar line on the video stream image, analyzing the video stream image, and storing the video stream image at intervals of frames to obtain an original image of the testee with the upward lead, wherein the high bit line represents a pull-up lowest line of the upward motion of the lead, and the low bit line represents a restoring highest line of the upward motion of the lead;

the detection image acquisition module is used for correcting the original image and adjusting the resolution ratio to a certain size to obtain a detection image;

the point bitmap acquisition module is used for sending the detection map into a human skeleton point detection network to obtain a human skeleton point bitmap;

and the judgment counting module is used for mapping the human body skeleton point bitmap back to the detection chart to obtain the position of the skeleton point, and judging whether the pull-up is finished once or not by combining a high bit line, a low bit line and a horizontal line on the detection chart so as to count.

By adopting the technical scheme, the original image acquisition module acquires an original image from a video stream image, the detection image acquisition module corrects and adjusts the original image to obtain a detection image, the detection image is provided with a drawn high-bit line, a drawn low-bit line and a drawn horizontal bar line, the point bitmap acquisition module sends the detection image into a human skeleton point detection network to obtain a human skeleton point bitmap, the judgment counting module maps the human skeleton point bitmap back to the detection image to obtain the position of the skeleton point, and the position relation between the position of the skeleton point and the high-bit line, the position relation between the position of the low-bit line and the horizontal bar line can be easily obtained at the moment, so that whether the counting is finished by leading up is conveniently judged, and the problem of low counting accuracy can be improved.

In a third aspect, the present application provides a computer-readable storage medium, which adopts the following technical solutions:

a computer readable storage medium storing a computer program capable of being loaded by a processor and performing a method as in any one of the first aspects.

Drawings

Fig. 1 is a first flowchart of a sports motion recognition counting method based on human body gesture recognition according to an embodiment of the present application.

Fig. 2 is a second flowchart of a sports motion recognition counting method based on human body gesture recognition according to an embodiment of the present application.

FIG. 3 is a schematic diagram of the relationship between the horizontal bar and the high and low bit lines according to the embodiment of the present application.

Fig. 4 is a schematic structural diagram of a human skeletal point detection network according to an embodiment of the present application.

Fig. 5 is a third flowchart of a sports motion recognition counting method based on human body gesture recognition according to an embodiment of the present application.

Fig. 6 is a fourth flowchart of a sports motion recognition counting method based on human body gesture recognition according to an embodiment of the present application.

Fig. 7 is a fifth flowchart of a sports motion recognition counting method based on human body gesture recognition according to an embodiment of the present application.

Fig. 8 is a sixth flowchart of a sports motion recognition counting method based on human body gesture recognition according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Any feature disclosed in this specification (including any accompanying drawings) may be replaced by alternative features serving equivalent or similar purposes, unless expressly stated otherwise. That is, unless expressly stated otherwise, each feature is only an example of a generic series of equivalent or similar features.

The present application is described in further detail below with reference to figures 1-8.

The embodiment of the application discloses a physical exercise recognition counting method based on human body posture recognition. Referring to fig. 1, 2 and 3, the identification and counting method includes drawing a high bit line, a low bit line and a horizontal bar line on an obtained real-time video stream image of a testee with a pull-up direction, performing frame-by-frame storage on the real-time video stream image to obtain an original image, preprocessing the original image, sending the preprocessed original image to a human body bone point detection network to obtain a human body bone point bitmap, and judging whether the pull-up is completed once according to coordinates of the bone points to perform pull-up counting.

The bone points comprise a chin, a left shoulder, a right shoulder, a left hand and a right hand, the high bit line represents a pull-up lowest line of the pull-up action, and the low bit line represents a restoring highest line of the pull-up action.

One complete pull-up action, including pull-up and reduction. The high bit line and the low bit line can be defined according to a unified pull-up action standard, the chin position exceeds the high bit line, the pull-up action standard is adopted, and the chin position is lower than the low bit line, the pull-down action standard is adopted. Therefore, after the human body skeleton point bitmap is obtained, the skeleton point coordinates can be obtained, and whether a complete pull-up action is completed or not can be judged by combining the skeleton point coordinates and the positions of the high bit line, the low bit line and the horizontal bar line.

Referring to fig. 1 and 2, the specific method of the above-mentioned identification and counting method includes the following embodiments:

acquiring 101 an original image, acquiring a real-time video stream image of a testee with an upward pull body, drawing a high bit line, a low bit line and a horizontal bar line on the video stream image, analyzing the video stream image, and storing the video stream image at intervals of frames to obtain the original image of the testee with the upward pull body;

acquiring a detection image 102, correcting an original image, and adjusting the resolution to a certain size to obtain the detection image;

acquiring a point bitmap 103, and sending the detection map into a human skeleton point detection network to obtain a human skeleton point bitmap; and the number of the first and second groups,

and judging counting 104, mapping the human body skeleton point bitmap back to the detection graph to obtain the positions of skeleton points, and judging whether one-time pull-up is completed or not by combining a high bit line, a low bit line and a horizontal bar line on the detection graph so as to count.

In the original image acquisition 101, a plurality of original images are continuously obtained from the acquired video stream image. In the inspection chart acquisition 102, the original image may be corrected to any desired shape, for example, in the present embodiment, the original image is corrected to a rectangle, and the resolution may be adjusted to any desired size, for example, in the present embodiment, the resolution is adjusted to 352 × 352. Each original image corresponds to one detection graph, and the shapes and the resolutions of all the detection graphs are consistent.

In the determination count 104, the continuously obtained human skeleton point bitmaps are mapped back to the corresponding detection maps to obtain a plurality of skeleton point positions with or without change, so that whether one pull-up is completed can be determined according to the position relations between the skeleton point positions and the high bit lines, the low bit lines and the horizontal bars.

Referring to fig. 4, as an embodiment of a human skeleton point detection network, the human skeleton point detection network includes a human body detection network and a skeleton point detection network, the human body detection network obtains human body contour coordinates with a detection map as an input, and the skeleton point detection network obtains a human skeleton point bitmap by combining the human body contour coordinates and partial feature information of the human body detection network.

Referring to fig. 4, as an embodiment of the human body detection network, the human body detection network includes a backbone network, a feature fusion layer, and an output layer, and the backbone network includes a resnet101 network.

The resnet101 network, i.e., the 101-layer residual network, is a convolutional neural network, the internal residual block uses a hopping connection, so that the problem of gradient disappearance caused by increasing depth in the deep neural network can be solved, and the resnet101 network has 99 layers of convolution. The method has the characteristics of easy optimization and capability of improving the accuracy by increasing the equivalent depth.

Referring to fig. 5, based on the human body detection network, as an embodiment of obtaining human body contour coordinates by the human body detection network, the method includes steps 201, 202 and 203:

201. the detection graph enters a backbone network, and a feature layer P1, a feature layer P2 and a feature layer P3 with different resolutions are generated.

The feature layer P1 is partial feature information, the feature layer P1 is a feature layer output by the 21 st layer of convolution, the feature layer P2 is a feature layer output by the 90 th layer of convolution, and the feature layer P3 is a feature layer output by the 99 th layer of convolution. In this embodiment, the resolution of feature layer P1 is 64 × 88, the resolution of feature layer P2 is 128 × 44, the resolution of feature layer P3 is 256 × 22, and the feature size of feature layer P1 is greater than the feature sizes of feature layers P2 and P3.

202. The feature fusion layer takes a feature layer P1, a feature layer P2 and a feature layer P3 as input, and obtains a feature result through multilayer convolution, down sampling, up sampling, pooling layers and connection.

203. And the output layer connects the characteristic results and inhibits the non-maximum value to obtain the human body contour coordinate.

The feature layer P1 is part of the feature information. The output of the main network is used as the input of the characteristic fusion layer, the output of the characteristic fusion layer is used as the input of the output layer, and finally the human body contour coordinate is obtained.

Non-maximum suppression, namely nms (non maximum suppression), suppresses elements that are not maxima, which can be understood as local maximum search, where a neighborhood is represented locally, and the neighborhood has two variable parameters, namely the dimension of the neighborhood and the size of the neighborhood.

Referring to fig. 6, as a further embodiment of step 202, step 202 comprises:

a first convolution process 2021, which respectively convolves feature layer P1, feature layer P2, and feature layer P3 by four times of 3 × 3 convolution with a step size of 2 to generate corresponding feature layer P1_1, feature layer P2_1, and feature layer P3_ 1;

the first fusion 2022, connecting a feature layer a1 obtained by convolving the feature layer P1_1 by 3 × 3 with the step size of 1, a feature layer B1 generated by upsampling the feature layer P2_1, and a feature layer C1 generated by performing secondary upsampling on the feature layer P3_1 to generate a feature layer P1_ 2;

the second fusion 2023, connecting a feature layer a2 obtained by convolving the feature layer P1_1 by 3 × 3 with the step size of 2, a feature layer B2 generated by convolving the feature layer P2_1 by 3 × 3 with the step size of 1, and a feature layer C2 generated by performing one-time upsampling on the feature layer P3_1 to generate a feature layer P2_ 2;

the third fusion 2024, which is to connect feature layer P1_2 with feature layer A3 generated by 3 × 3 convolution with step size 2 and then by pooling layer, feature layer B3 generated by performing one-time down-sampling on feature layer P2_2, and feature layer P3_1 with feature layer C3 generated by 3 × 3 convolution with step size 1, to generate feature layer P3_ 2;

fourth fusion 2025, connecting feature layer P1_1 with feature layer a4 generated by three times of 3 × 3 convolution with step size 2, feature layer P2_1 with feature layer B4 generated by two times of 3 × 3 convolution with step size 2, and feature layer P3_1 with feature layer C4 generated by one time of 3 × 3 convolution with step size 1, to generate feature layer P4_ 2; and the number of the first and second groups,

the second convolution process 2026 convolves each of feature layer P1_2, feature layer P2_2, feature layer P3_2, and feature layer P4_2 by 3 × 3 with a step size of 1 three times, and generates feature layer P1_3, feature layer P2_3, feature layer P3_3, and feature layer P4_3, which correspond to each other.

Note that the feature result includes a feature layer P1_3, a feature layer P2_3, a feature layer P3_3, and a feature layer P4_ 3.

The first convolution processing 2021, the first fusion 2022, the second fusion 2023, the third fusion 2024, the fourth fusion 2025, and the second convolution processing 2026 are all the works performed by the feature fusion layer, that is, the work content of the feature fusion layer.

As an implementation mode of the skeleton point detection network, the skeleton point detection network comprises a coordinate mapping layer and a CPM network, wherein the coordinate mapping layer multiplexes feature information of a feature layer P1, the human body contour coordinates are mapped back to the feature layer P1 to obtain a single-person feature map, and the CPM network returns the single-person feature map as an input to the human body skeleton point bitmap.

It should be noted that the feature scale of the feature layer P1 is more suitable for bone point detection than the feature layer P2 and the feature layer P3.

CPM networks, namely, Conditional Point Machines (CPMs), integrate a conditional Network into the Point Machines to learn image features and image-dependent (image-dependent) spatial models and estimate human body postures.

Referring to fig. 7, as a further embodiment of step 203, step 203 includes a convolution process 2031 and a connection suppression process 2032.

The convolution process 2031 performs a convolution with 3 × 3 with a step size of 1 to generate a feature layer P1_4, a feature layer P2_4, a feature layer P3_4, and a feature layer P4_4, which have different resolutions.

In the present embodiment, the resolution of the feature layer P1_4 is 12 × 88, the resolution of the feature layer P2_4 is 12 × 44, the resolution of the feature layer P3_4 is 12 × 22, and the resolution of the feature layer P4_4 is 12 × 11.

The connection suppression processing 2032 is to generate coordinate vectors by connecting the feature layer P1_4, the feature layer P2_4, the feature layer P3_4, and the feature layer P4_4, and suppress the non-maximum values of the coordinate vectors to obtain human contour coordinates.

The human body skeleton point detection network, namely the human body detection network and the skeleton point detection network, are obtained by training. The convolution processing 2031 and the connection suppression processing 2032 are both performed in the output layer, that is, the output layer performs the convolution processing 2031 and the connection suppression processing 2032.

The human body detection network before training comprises a backbone network, a feature fusion layer and an output layer, wherein the backbone network is a resnet101 network. The untrained skeleton point detection network comprises a mapping feature layer and a CPM network, and the skeleton point detection network takes the output of the human body detection network and partial feature information of the backbone network as input, namely the human body detection network and the skeleton point detection network share the backbone network.

The training method comprises the following steps:

acquiring a plurality of videos and pictures which are used as pull-ups through a crawler or a downloading mode and the like, and mixing the video images obtained by cutting each frame of the video with the pictures to obtain training set pictures;

marking the human body contour in the training set picture to manufacture a pedestrian detection data set (VOC data set);

fixing the initial weight of resnet101, randomly initializing the weights of the feature fusion layer and the output layer, sending a training set picture as input into a human body detection network, taking a corresponding picture in a pedestrian detection data set (VOC data set) as a label, combining a GIOU loss function and a focus loss function (focus loss), training until the obtained loss is not reduced any more, and deriving the human body detection network, wherein the human body detection network at the moment is the trained human body detection network;

cutting the training set picture to obtain a human body picture, and marking skeleton points on the human body picture to prepare a true confidence map; and the number of the first and second groups,

randomly initializing a skeleton point detection network, inputting a human body picture as input into the skeleton point detection network, using a corresponding real confidence map as a label, and combining hot spot map loss to obtain a loss value f generated by constructing Gaussian distribution near each real (marked) skeleton point_tWhen the loss value f_tAnd when the skeleton point is not reduced any more, deriving a skeleton point detection network, wherein the skeleton point detection network is the trained skeleton point detection network.

Wherein,

p represents the number of human body pictures, Z represents the number of skeleton points in each human body picture,

coordinates representing the bone points in the true confidence map,

and the coordinates of the predicted bone points are input by the human body picture in the bone point detection network.

In the embodiment of the training method, the initial weight of the resnet101 network and the weights of the initialized feature fusion layer and the output layer are random, and when the human body contour and the bone point are labeled, labelme software can be used for labeling, and minimization is performed through a loss function. In other embodiments, the loss functions used in training the human detection network and the bone point detection network may be replaced with other loss functions.

Referring to fig. 8, as an embodiment of determining whether to finish the pull-up process, the method includes:

and (3) judging a preparation state 301, and entering a timing preparation state when the distances between the left hand and the horizontal bar line do not exceed the proportion threshold value of the height proportion of the original image.

In the present embodiment, the point 0 represents the chin, the point 1 represents the left hand, the point 2 represents the right hand, and the proportional threshold is 0.05.

Count 302, pull up when the position of the chin is above the high bit line, restore when the position of the chin is below the low bit line, pull up and restore in a sequence, pull up and restore in a pull up.

In the present embodiment, the left shoulder is indicated by point 3 and the right shoulder is indicated by point 4.

And stopping counting 303, when the distance between the left hand and the right hand from the horizontal bar line exceeds the proportion threshold value of the height proportion of the original image, stopping counting.

In the above embodiment of determining whether or not the pull-up is completed once, the counting can be completed, and the counting can be ready and stopped according to the positional relationship between the positions of the left hand, the right hand, the left shoulder, the right shoulder and the chin and the positions of the high-level line, the low-level line and the horizontal bar line.

The application provides a sports motion recognition counting method based on human body posture recognition, through the adjustment of adaptability, also can be applicable to the recognition counting of other sports motions, such as sit up, push-up etc..

The embodiment of the application also discloses a sports motion recognition counting device based on human body posture recognition. The identification counting device comprises an original image acquisition module, a detection image acquisition module, a point bitmap acquisition module and a judgment counting module.

And the original image acquisition module is used for acquiring a real-time video stream image of the testee in the upward direction of the pull body, drawing a high bit line, a low bit line and a horizontal bar line on the video stream image, analyzing the video stream image, and storing the video stream image at intervals of frames to obtain the original image of the testee in the upward direction of the pull body.

The high bit line represents the pull-up lowest line of the body-up action and the low bit line represents the restore highest line of the body-up action.

As an embodiment of acquiring the video stream image, a Camera is included, a lens of the Camera is installed obliquely downward toward the horizontal bar line, and an interface type of the Camera may be GigE, USB, Camera Link, or the like.

And the detection image acquisition module is used for correcting the original image and adjusting the resolution ratio to a certain size to obtain a detection image.

And the point bitmap acquisition module is used for sending the detection map into a human skeleton point detection network to obtain a human skeleton point bitmap.

And the judgment counting module is used for mapping the human body skeleton point bitmap back to the detection graph to obtain the position of the skeleton point, and judging whether the pull-up is finished once or not by combining a high bit line, a low bit line and a horizontal bar line on the detection graph so as to count.

The application provides a pair of sports action discernment counting assembly based on human posture discernment, required field configuration is comparatively simple, is difficult to appear the condition that detection equipment failures such as sensor caused the wrong report, also is difficult to appear causing the problem that the instrument damaged because of the weather problem. In addition, this application judges through the skeleton point that detects whether accomplish the chin action, and it is higher to detect the rate of accuracy, and the interference killing feature is stronger.

The embodiment of the application also discloses a computer readable storage medium which stores a computer program capable of being loaded by a processor and executing any one of the sports motion recognition counting methods based on human body gesture recognition.

The computer-readable storage medium includes, for example: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Claims

1. A physical exercise recognition counting method based on human body posture recognition is characterized in that: the method comprises the steps of drawing a high-bit line, a low-bit line and a horizontal bar line on an obtained real-time video stream image of a tested person with upward pull, performing frame-separated storage on the real-time video stream image to obtain an original image, preprocessing the original image, sending the preprocessed original image into a human body bone point detection network to obtain a human body bone point bitmap, judging whether one-time pull-up is completed according to bone point coordinates so as to count the upward pull, wherein the bone points comprise a chin, a left shoulder, a right shoulder, a left hand and a right hand, the high-bit line represents the lowest pull-up line of the upward pull action of the pull, and the low-bit line represents the highest restore line of the upward pull action of the pull.

2. A sports motion recognition counting method based on human body gesture recognition according to claim 1, characterized in that: the identification and counting method specifically comprises the following steps,

acquiring an original image (101), acquiring a real-time video stream image of a testee with an upward pull body, drawing a high bit line, a low bit line and a horizontal bar line on the video stream image, analyzing the video stream image, and storing the video stream image at intervals of frames to obtain the original image of the testee with the upward pull body;

acquiring a detection image (102), correcting the original image, and adjusting the resolution to a certain size to obtain the detection image;

acquiring a point bitmap (103), and sending the detection map into a human skeleton point detection network to obtain a human skeleton point bitmap; and the number of the first and second groups,

and judging counting (104), mapping the human body skeleton point bitmap back to the detection graph to obtain the positions of skeleton points, and judging whether the pull-up is finished once or not by combining a high bit line, a low bit line and a horizontal bar line on the detection graph so as to count.

3. A sports motion recognition counting method based on human body gesture recognition according to claim 1 or 2, characterized in that: the human body skeleton point detection network comprises a human body detection network and a skeleton point detection network, the human body detection network obtains human body contour coordinates by taking a detection graph as input, and the skeleton point detection network obtains a human body skeleton point bitmap by combining the human body contour coordinates and partial characteristic information of the human body detection network.

4. A sports motion recognition counting method based on human body gesture recognition according to claim 3, characterized in that: the human body detection network comprises a backbone network, a feature fusion layer and an output layer, the backbone network comprises a resnet101 network, the method for obtaining the human body contour coordinate by the human body detection network comprises the following steps,

5. A sports motion recognition counting method based on human body gesture recognition according to claim 4, characterized in that: the skeleton point detection network comprises a coordinate mapping layer and a CPM network, wherein the coordinate mapping layer is used for multiplexing characteristic information of a characteristic layer P1 and mapping human body contour coordinates back to the characteristic layer P1 to obtain a single characteristic diagram, and the CPM network takes the single characteristic diagram as input to return a human body skeleton point bitmap.

6. A sports motion recognition counting method based on human body gesture recognition according to claim 4, characterized in that: the method for obtaining the feature result by using the feature layer P1, the feature layer P2 and the feature layer P3 as input by the feature fusion layer comprises the following steps,

a first convolution process (2021) for respectively convolving feature layer P1, feature layer P2, and feature layer P3 by 3 × 3 with a step size of 2 four times to generate corresponding feature layer P1_1, feature layer P2_1, and feature layer P3_ 1;

first fusing (2022), connecting a feature layer A1 obtained by convolving a feature layer P1_1 by 3 × 3 with the step size of 1, a feature layer B1 generated by upsampling the feature layer P2_1 and a feature layer C1 generated by performing secondary upsampling on the feature layer P3_1 to generate a feature layer P1_ 2;

second fusing (2023), connecting a feature layer A2 obtained by convolving the feature layer P1_1 by 3 × 3 with the step size of 2, a feature layer B2 generated by convolving the feature layer P2_1 by 3 × 3 with the step size of 1, and a feature layer C2 generated by performing one-time upsampling on the feature layer P3_1 to generate a feature layer P2_ 2;

performing third fusion (2024), and connecting the feature layer P1_2 by 3 × 3 convolution with the step size of 2 and then by the feature A3 generated by the pooling layer, the feature layer P2_2 by performing one-time downsampling to generate a feature layer B3, and the feature layer P3_1 by the feature layer C3 generated by 3 × 3 convolution with the step size of 1 to generate a feature layer P3_ 2;

fourth fusing (2025), respectively connecting the feature layer A4 generated by three times of 3 × 3 convolution with the step size of 2 for the feature layer P1_1, the feature layer B4 generated by two times of 3 × 3 convolution with the step size of 2 for the feature layer P2_1, and the feature layer C4 generated by one time of 3 × 3 convolution with the step size of 1 for the feature layer P3_1 to generate a feature layer P4_ 2; and the number of the first and second groups,

and a second convolution process (2026) of convolving feature layer P1_2, feature layer P2_2, feature layer P3_2 and feature layer P4_2 by 3 × 3 with a step size of 1 three times to generate a feature layer P1_3, a feature layer P2_3, a feature layer P3_3 and a feature layer P4_3 which respectively correspond to the feature layer P1_3, the feature layer P2_3, the feature layer P3_3 and the feature layer P4_ 3.

7. A sports motion recognition counting method based on human body gesture recognition according to claim 4, characterized in that: the method for connecting and suppressing the non-maximum value of the characteristic result by the output layer to obtain the human body outline coordinate comprises the following steps,

convolution processing (2031) for sending the feature result into 3 × 3 convolution with step size of 1 to generate feature layers P1_4, P2_4, P3_4 and P4_4 with different resolutions; and the number of the first and second groups,

and a connection suppression process (2032) for generating coordinate vectors by connecting the feature layer P1_4, the feature layer P2_4, the feature layer P3_4 and the feature layer P4_4 and suppressing the non-maximum values of the coordinate vectors to obtain human body contour coordinates.

8. A sports motion recognition counting method based on human body gesture recognition according to claim 1 or 2, characterized in that: the method for judging whether the one-time pull-up is finished comprises the following steps,

a preparation state judgment 301, which enters a timing preparation state when the distances between the left hand and the horizontal bar line do not exceed the proportional threshold of the original image height proportion;

counting 302, when the position of the chin is higher than the high bit line, pulling up, when the position of the chin is lower than the low bit line, reducing, and pulling up and reducing continuously for one time, wherein the pulling is upward for one time; and

9. The utility model provides a sports action discernment counting assembly based on human posture discernment which characterized in that: the device comprises an original image acquisition module, a detection image acquisition module, a point bitmap acquisition module and a judgment counting module; wherein,

10. A computer-readable storage medium characterized by: a computer program which can be loaded by a processor and which performs the method according to any one of claims 1 to 8.