CN115731608A

CN115731608A - Physical exercise training method and system based on human body posture estimation

Info

Publication number: CN115731608A
Application number: CN202210979776.7A
Authority: CN
Inventors: 赵小虎; 有鹏; 石传寿; 胡明
Original assignee: China University of Mining and Technology CUMT
Current assignee: China University of Mining and Technology CUMT
Priority date: 2022-08-16
Filing date: 2022-08-16
Publication date: 2023-03-03

Abstract

The invention discloses a physical exercise training method based on human body posture estimation, which comprises the steps of collecting physical exercise video stream data of athletes through a camera; uploading the obtained video stream data to a background server; analyzing and processing video stream data by using a sports motion intelligent analysis module deployed in a background server; pushing an evaluation result obtained after analysis processing to an interaction platform; the coach and the athlete can check the evaluation result and the action suggestion by themselves.

Description

Physical exercise training method and system based on human body posture estimation

Technical Field

The invention relates to the technical field of intelligent training and evaluation, in particular to a physical exercise training method and system based on human body posture estimation.

Background

In recent years, with the continuous improvement of the living standard of residents in China, people pay more and more attention to sports, and the hot tide of nationwide body building is raised.

But no matter what kind of sports, it is the teaching of one-to-one or one-to-many face-to-face of coach or sports teacher in the sports teaching. The one-to-one coach teaching is high in cost and short in coach resource, the sports teaching level is good and uneven in middle and primary schools and college physical education course teaching, the physical situation of each student is difficult to be considered by teachers, and therefore the phenomenon of 'grabbing at one' exists in the teaching. In addition, many ball games are fast in movement speed, and the naked eyes cannot accurately observe whether the movement of the sporter is standard or not at a moment. In order to solve the problem, a professional sports team coach performs static image analysis on the motion of a game video, but the manual judgment mode is difficult to popularize and apply in common classroom teaching.

In the prior art, the sun developed a human motion information acquisition system based on MEMS, and this system realizes the motion information acquisition of sports athletes through a wearable sports detection system composed of multiple sensors, and uses a wireless transmission technology to send data to a computer for computational analysis, and gives corresponding motion assessment opinions. However, this approach requires more sensors to be deployed on the athlete, which may affect the performance of the athlete.

In view of the above, many researchers have been working on developing training aid systems for sports with full functionality. Researchers utilize wearable sensors, kinect depth cameras and space virtualization technologies to continuously realize technological breakthroughs in the field of intelligent training of various sports. However, these systems have various problems, such as the wearable sensor affecting the movement of the athlete, the Kinect camera being expensive and having poor gesture recognition, the effectiveness of the evaluation of the movement, etc.

Disclosure of Invention

Aiming at the problems in the prior art, the invention aims to provide a physical exercise training method and system based on human body posture estimation, and particularly comprises the following steps:

the first step of analyzing the movement of the athlete is to acquire the posture information of the athlete. In order to solve the problems that a wearable sensor is inconvenient to deploy, a Kinect depth camera is expensive and the posture estimation effect is poor, the video source data are obtained by a common camera, and posture information is extracted through a human body posture estimation model. However, the existing attitude estimation model is too large, so that the model is difficult to deploy. In addition, due to the lack of the sports data set, the existing model has poor recognition effect on sports. The invention has the first characteristic of realizing a lightweight human body posture estimation model, and the performance of the network is reduced due to overhigh depth and larger parameter quantity of an original model characteristic extraction network VGG19 through analyzing a characteristic extraction network of an open-source human body posture estimation model OpenPose with a better effect at present and a convolution structure in the model. A large number of 7 × 7 convolution kernels are used inside the network, resulting in a slow convergence rate of the network. The VGG19 feature extraction network is replaced by the lightweight MobileNet network, and the 7 x 7 convolution kernel structure inside the model is transformed into a series structure formed by four micro convolutions, so that the model is light. The improved model is trained on a self-constructed sports data set BSD, followed by comparative experiments on public data sets with other superior models as well as the original model.

The second step of analyzing the motion of the athlete is to code and describe the bone information output in the previous step. The existing human body posture description model is mainly divided into rough description and fine description, the posture information of the rough description is simple, the relative relation between human body joints is ignored, the fine posture description covers the limb relation of each dimension, but the timeliness is poor due to the large parameter quantity. The second characteristic of the invention is that on the premise of not losing important posture information, the timeliness of the model is improved, and through the analysis of sports, a 14-point sparse human body posture description model is provided, wherein 4 irrelevant joint points of the left ear, the right ear and the left eye are removed, 6 joint points of the upper limb arm are described in a fine mode, and other main 8 joint points of the body are described roughly. Through looking up professional sports data, to the body joint angle of every action, high quantitative analysis, the standardized index of 8 kinds of bat actions of waving has been defined.

The third characteristic of the invention is to realize the sports training system with convenient deployment and simple interaction.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a method of physical exercise training based on human body pose estimation, the method comprising:

acquiring sports video stream data of athletes through a camera; uploading the obtained video stream data to a background server; analyzing and processing video stream data by using a sports intelligent analysis module deployed in a background server; pushing an evaluation result obtained after analysis processing to an interaction platform; the coach and the athlete can check the evaluation result and the action suggestion by themselves.

It should be noted that the intelligent sports motion analysis module includes a human body posture estimation sub-module for improving OpenPose and a sports motion standard degree evaluation sub-module.

It should be noted that the feature extraction network of the human body posture estimation submodule of the improved openpos is a lightweight MobileNet network, and a convolution kernel structure of the feature extraction network is a series structure formed by a plurality of micro convolutions.

It should be noted that, an input image generates a group of feature mapping F after being subjected to MobileNet network feature extraction, a Branch1 and Branch2 Branch prediction limb confidence coefficient and part affinity vector fields are entered in parallel in the next step, a group of limb confidence coefficient maps and a group of part affinity vector fields are obtained after passing through such a stage, then the output of the previous stage and the original feature mapping F are converged again to enter the next stage, and the process is continuously executed for six times, and finally, human skeleton information in the image is output.

It should be noted that the micro convolutions include a series structure of a 1 × 1 convolution, a 3 × 3 depth separable convolution and a hole convolution with a dilation coefficient of 2, wherein a 1 × 1 convolution kernel and 2 3 × 3 convolution kernels are used to consider correlation of local information, then according to the idea of depth separable convolution, the middle layer 3 × 3 standard convolution kernel is decomposed into 13 × 3 depth convolution and a 1 × 1 channel point convolution, and the last layer uses a 3 × 3 hole convolution with a dilation coefficient of 2 to compensate for the existing loss of receptive field.

It should be noted that the version of the MobileNet network is MobileNet v3-small.

It should be noted that the athletic movement standard degree evaluation sub-module includes human body tracking, athletic movement standardization, a sparse representation model of human body posture and athletic movement scoring, wherein a particle filter algorithm is adopted to realize human body tracking on a result output by improving the human body posture estimation of OpenPose, eight athletic movement standards and evaluation indexes are defined, and the similarity between the athlete movement and the standard movement is judged according to the sparse representation model of human body posture.

It should be noted that the sparse representation model according to the human body posture includes a fine description and a rough description, wherein, when the sports motion is characterized, the eyes, ears or other useless joint points of the athlete in the obtained image data are removed, the number of the described joint points is 14, and the two-dimensional coordinates of all the bone points are described as one

The skeleton data are sorted according to the serial numbers and then normalized by the unit of row and column to obtain a uniform skeleton data format of

According to the standardized parameters of sports, the requirements of the angular position of the upper limb movement of the athlete are highIn the lower limb movement, the upper limb joints are described finely, and the other joint points are described roughly.

It should be noted that, when the similarity between the player's movement and the standard movement is judged, the gesture distance is used to calculate the similarity, wherein a mahalanobis distance calculation formula is used to calculate the similarity.

It should be noted that the sports score is also included, a gesture distance is adopted to define a score, and the lower the distance value is, the higher the score of the action to be scored is, wherein a numerical value obtained by using mahalanobis distance measurement is between 0 and 1, a unit 1 is used to subtract the distance value between gestures, and the larger the value at this time is, the higher the score is; the score is defined as: (unit 1 minus mahalanobis distance) x 100, i.e.:

Score＝(1-D _M )×100；

wherein D is _M Representing the mahalanobis distance between the swing and the standard action of the athlete to be scored.

The invention also provides a system for realizing the physical exercise training method based on human body posture estimation, which comprises a motion video acquisition and transmission module, a database storage module, a physical exercise intelligent analysis module and an evaluation result display module; the system is used for acquiring motion video images, comprises network cameras deployed on the roof around an indoor sports ground, and has a function of transmitting video streams in real time by 5G transmission equipment.

The invention has the beneficial effects that:

1. in order to solve the problems that a wearable sensor is inconvenient to deploy, a Kinect depth camera is expensive to use, and the posture estimation effect is poor, the video source data are obtained by using a common camera, the posture information is extracted through a human body posture estimation model, the lightweight human body posture estimation model is researched, a convolution mechanism inside the model is designed in a lightweight mode on the basis of improving a feature extraction network, and the performance of the model is improved by 3 times. The improved model is trained through the self-made sports data set, and then a comparison experiment is carried out on the public data set, other excellent models and the original model, and the result shows that the accuracy of the model for recognizing the skeletal points of the human arms is improved by 3.57% compared with the original model, so that the lightweight of the model is realized.

2. In order to solve the problem that no effective evaluation method exists for sports actions at present, the invention provides a new evaluation route. Firstly, tracking a video human body by using a particle filtering algorithm; secondly, the posture is coded, and the invention provides a 14-point sparse human body posture description model which eliminates 4 irrelevant joint points of a left ear, a right ear and a left eye and right eye, finely describes 6 joint points of an upper limb arm and roughly describes other 8 main joint points of a body aiming at sports motion evaluation. By looking up professional sports data and carrying out quantitative analysis on the body joint angle and the height of each action, the standardized indexes of 8 swinging actions are defined. Experiments prove that compared with an 18-point fine description model with the highest accuracy, the attitude description model has the precision difference of 1.3% and the speed increased by 2.98 times. The practice proves that the standard motion analysis technical route of the sports provided by the invention is effective and feasible, and can provide ideas for motion analysis of other sports, such as golf, tennis, table tennis and the like.

3. Through the requirement analysis of the physical exercise training system based on human body posture estimation, a system architecture comprising a video image acquisition and transmission layer, an infrastructure layer, a platform analysis layer and an application service layer is designed, and system development is carried out. And finally, testing the function, performance and reliability of the system, and verifying that the physical exercise training system based on human body posture estimation can smoothly and stably run the functions of all the modules.

Drawings

FIG. 1 is a schematic flow diagram of a basic system of the present invention;

FIG. 2 is a schematic diagram of a badminton motion classification decision of the present invention;

FIG. 3 is an internal structure of the OpenPose model according to the present invention;

FIG. 4 is a diagram of the present invention for compensating for missing receptive field hole convolutions during convolution refinement;

FIG. 5 is a schematic diagram of the improved flow of the convolution structure of the present invention;

FIG. 6 is a schematic diagram of the novel convolution structure of the present invention;

FIG. 7 is a diagram illustrating the recognition result of the model in the present invention.

Detailed Description

The present invention will be further described below, and it should be noted that the following examples are provided to illustrate the detailed embodiments and specific procedures based on the technical solution, but the scope of the present invention is not limited to the examples.

As shown in fig. 1, the present invention is a physical exercise training method based on human body posture estimation, the method includes:

Furthermore, the intelligent analysis module for sports motion comprises a human body posture estimation submodule for improving OpenPose and a standard degree evaluation submodule for sports motion.

Furthermore, the feature extraction network of the human body posture estimation submodule of the improved OpenPose is a lightweight MobileNet network, and the convolution kernel structure of the network is a series structure formed by a plurality of micro convolutions.

Furthermore, after the input image is subjected to MobileNet network feature extraction, a group of feature mapping F is generated, next, branch1 and Branch2 Branch prediction limb confidence coefficient and part affinity vector fields are parallelly entered, after the stage, a group of limb confidence coefficient maps and a group of part affinity vector fields are obtained, then, the output of the previous stage and the original feature mapping F are converged again to enter the next stage, the process is continuously executed for six times, and finally, the human skeleton information in the image is output.

Further, the plurality of micro convolutions of the present invention include a series structure including a 1 × 1 convolution, a 3 × 3 depth separable convolution and a hole convolution with a dilation coefficient of 2, wherein a 1 × 1 convolution kernel and 2 3 × 3 convolution kernels are used to take account of the correlation of local information, then according to the idea of the depth separable convolution, the standard convolution kernel of the middle layer 3 × 3 is decomposed into 13 × 3 depth convolution and a 1 × 1 channel point convolution, and the last layer uses a 3 × 3 hole convolution with a dilation coefficient of 2 to compensate for the existing loss of receptive field.

Further, the version of the MobileNet network is MobileNet V3-small.

Further, the standard degree evaluation submodule for physical exercise provided by the invention comprises a human body tracking module, a physical exercise standardization module, a sparse representation model of human body posture and a physical exercise score, wherein the human body tracking module is used for realizing the result of improving the human body posture estimation output of OpenPose by adopting a particle filter algorithm, defining eight physical exercise standards and evaluation indexes, and judging the similarity between the movement of the athlete and the standard movement according to the sparse representation model of the human body posture.

Further, the sparse representation model according to the human body posture comprises a fine description and a rough description, wherein eyes, ears or other useless joint points of the athlete in the obtained image data are removed when the sports motion is characterized, the number of the described joint points is 14, and two-dimensional coordinates of all bone points are described as one

The skeleton data are sorted according to the serial numbers and then normalized by the unit of row and column respectively to obtain a uniform skeleton data format of

According to the standardized parameters of sports, the requirements of the angular positions of the upper limbs of the athlete on the actions are higher than those of the lower limbs, the upper limb joints are described in a fine mode, and the other joint points are described roughly.

Further, the method adopts the posture distance to calculate the similarity when judging the similarity between the movement of the athlete and the standard movement, wherein the similarity is calculated by adopting a Mahalanobis distance calculation formula.

Furthermore, the invention also comprises sports scoring, wherein the score is defined by adopting the gesture distance, the lower the distance value is, the higher the score of the action to be scored is, wherein the numerical value obtained by using the mahalanobis distance measurement is between 0 and 1, the unit 1 is used for subtracting the distance value between gestures, and the larger the value at the moment is, the higher the score is; the score is defined as: (unit 1 minus mahalanobis distance) × 100, i.e.:

Score＝(1-D _M )×100；

wherein D is _M Representing the mahalanobis distance between the swing and the standard swing of the athlete to be scored.

The invention also provides a system for realizing the physical exercise training method based on human body posture estimation, which comprises a motion video acquisition and transmission module, a database storage module, a physical exercise intelligent analysis module and an evaluation result display module; the system comprises a video camera, a video camera and a video streaming server, wherein the video camera is used for acquiring motion video images and is deployed on the roof around an indoor sports ground, and meanwhile, the video camera has a function of transmitting video streams in real time by 5G transmission equipment.

Example 1

To further illustrate the practical effects of the present invention, the present embodiment will select one representative sport with multiple motion types as an introduction.

Badminton is the first popular sports item in our country. Whether the old or the young, the shadow of the badminton racket is waved everywhere. In a campus, badminton is also a popular sports item, and badminton courses are set up in many schools.

Badminton is a small-sized ball game requiring cooperation of a human motion system, a breathing system and a circulation system for counterwork through a separation net, and is a popular sports item in China because athletes of no physical counterwork and flexible and diverse competitive forms, including single play and double play, no limitation of age and sex factors, controllable sports intensity and the like. Badminton also belongs to competitive sports, and beginners master basic actions of the badminton and are very important, because the badminton cannot improve self competitive level easily without professional teaching guidance, and can cause physical injuries of different degrees. From the study on the damage of badminton sports, great enthusiasts and others have found that the damage rate of students in badminton profession reaches 94.12%, and the study indicates that deformation of movements and poor grasp of technical movements are the main causes of damage. In conclusion, the problems of slow exercise level increase and physical injuries caused by lack of scientific guidance of beginners need to be solved urgently.

The traditional badminton teaching is one-to-one or one-to-many face teaching of coaches or sports teachers. The one-to-one coach teaching is high in cost and short in coach resource, and in the middle and primary schools and college physical education course teaching, the technical level of badminton of students is different, the physical education time is limited, and teachers are difficult to consider the actual situation of each student, so that the phenomenon of 'grabbing at one' exists in the teaching. In addition, the badminton racket swinging speed is high, and the naked eyes cannot accurately observe whether the action of a sporter is standard or not at a moment. In order to solve the problem, a professional sports team coach analyzes the static images of the motion of the game video, but the manual judgment mode is difficult to popularize and apply to ordinary classroom teaching. Therefore, based on the above, the badminton training method based on the human body posture estimation is realized in the embodiment.

Badminton racket swinging action data set construction

The badminton racket swinging motion data set is manufactured by collecting the racket swinging motions of professional athletes, eight common racket swinging motions are contained in the badminton racket swinging motion data set, and the motions are classified as shown in a figure 2. Badminton is more complicated in posture, is different from other simple standing and falling postures, and has extremely high participation degree of upper limbs in motion for small-sized ball games. In order to enable the model to better meet the requirements of practical application, the robustness of the model for detecting the action posture of the athlete during movement is improved, and the effectiveness of the algorithm provided by the invention is verified. The invention makes a Badminton Swing motion data set (BSD), which comprises 8000 Badminton Swing motion images. According to the method, the BSD data set is used as a target data set, the image is manually annotated, and a JSON format file is finally generated by adopting an object keypoints annotation format.

Feature extraction network improvements

The invention adopts the MobileNet network to replace the VGG19, and the parameter quantity of the characteristic extraction network is about one ninth of the original parameter quantity.

Through parameter quantity analysis of the MobileNet and VGG19 network, the invention improves the feature extraction network of the OpenPose original model. In order to identify the type of the badminton racket swinging motion, a classifier is cascaded at the rear end of the improved OpenPose network, and the classifier is a decision tree-based fast support vector machine (ST-SVM) model. The invention needs to classify the badminton racket swinging motion, belongs to the problem of small sample multi-classification, can solve the nonlinear optimization problem into a plurality of linear SVM problems by the ST-SVM classification algorithm based on the decision tree, and is simple and easy to realize. Therefore, as a classifier of the swing action of the badminton, compared with the traditional nonlinear method, the method has higher classification speed and good classification effect. The classification process firstly forms eight types of swing actions into a decision tree, and in the classification process, the categories are required to be selected before entering the next level each time, and unselected subtrees are deleted, so that the number of samples to be classified in the next step can be effectively reduced. The searching step is repeated until the leaf node. The decision tree construction is shown in fig. 2.

The OpenPose internal structure is shown in FIG. 3. Two branches branching from stage2 are both subjected to 5 convolution operations of 7 × 7, and the calculation overhead of the convolution operations of 7 × 7 is often very large. Whereas in the openpos model there are a large number of 7 × 7 convolution operations. The 7 × 7 convolution kernel has a larger reception range than the 3 × 3 convolution kernel, and also increases the amount of calculation. Considering the real-time requirement of tracking the motion of badminton players, the Convolution structure is improved without reducing the Convolution receptive field, and a hole Convolution (related Convolution) is introduced to make up for the receptive field missing in the Convolution improvement process, as shown in fig. 4.

The hole convolution does not bring promotion of calculated amount, but enables effective receptive field to be promoted, and meanwhile, the defect of multilayer hole convolution can bring loss of local information. Therefore, the present invention adopts a multilayer convolution structure to replace the original 7 × 7 convolution kernel, adopts a three-layer convolution structure, uses a 1 × 1 convolution kernel and 2 3 × 3 convolution kernels to take account of the correlation of local information, decomposes the standard convolution kernel of the middle layer 3 × 3 into 13 × 3 deep convolution and a 1 × 1 channel point convolution according to the idea of deep separable convolution, and finally uses a 3 × 3 hole convolution with a dilation coefficient of 2 to make up for the existing loss of receptive field, and the improvement process is shown in fig. 5.

The 7 × 7 convolution kernel structure in the final openpos model is modified to be a tandem structure consisting of a 1 × 1 convolution, a 3 × 3 depth separable convolution and a hole convolution with a coefficient of expansion of 2, as shown in fig. 6.

Simulation test

1. Training environment

The improved model was trained on a shuttlecock swing dataset, with the training environment shown in the table below.

Categories	Environmental parameter
		Operating system	Windows 10
Memory device	16G
		CPU	Intel Core i7-4720HQ CPU
GPU	GTX1080Ti
		Scripting language	Python

2. Details and results of the training

The network training data set is BSD, and the ratio of the training set, the verification set and the test set is 6. The initial learning rate was 0.001, the learning rate was adjusted to 0.0001 at round 50 and 0.00001 at round 150 for 300 iterations. Specific parameter settings are shown in the following table.

Parameter name	Means of	Parameter value
			type	Optimizer	Adam
ilr	Initial learning rate	0.001
			d	Input image height	500
h	Width of input image	480
			datasets	Data set	BSD

The model recognition result is shown in fig. 7, where the upper line in the figure shows the human skeleton recognition result, and the lower line shows the attention information of the model to the skeleton point, and the higher the brightness, the higher the attention received by the point. It can be seen that the attention of the model to the upper limbs is obviously higher than that of the joint points of other parts, but compared with the original model, the estimation accuracy rate of the joint points of other parts is slightly reduced, and the model plays a key role in analyzing the badminton racket swinging motion.

Network experiment of feature extraction

1. Data set and evaluation index

The data set used in the experimental part was the homemade data set BSD. Using precision P, recall R and F ₁ The score was used as an index for model evaluation.

From equations (3-8), it can be concluded that P and R are interacting. Therefore, a tradeoff between P and R is required, and the commonly used way is to calculate F _β And (3) fraction:

wherein, when β =1, it is called F ₁ Score, one of the most common indicators.

2. Experimental Environment

The hardware environment is shown in the following table:

categories	Environmental parameter
		Operating system	Windows10
Deep learning framework	TensorFlow
		Programming language	Python
Memory device	16G
		Processor with a memory having a plurality of memory cells	Intel Core i7-4720HQ CPU
GPU	GTX1080Ti

3. Results of the experiment

The results of the swing motion classification experiment using the VGG19 feature extraction network are shown in table 1. In order to verify the fitness of the three versions of the MobileNet network and the OpenPose model, the OpenPose feature extraction network is respectively replaced by MobileNet V1, mobileNet V2 and MobileNet V3-small for experiments, and the model accuracy rate P, the recall rate R and the model accuracy rate F after replacement are used for carrying out experiments ₁ The scores were analyzed, and the results are shown in tables 2, 3, and 4.

TABLE 1

TABLE 2

TABLE 3

TABLE 4

As can be seen from the table, in terms of accuracy, the recognition accuracy of the swing motion through the swing motion classification algorithm can reach over 88%, wherein the average recognition accuracy of the MobileNet V3-small is the highest and is 92.25% on average. In terms of processing speed, VGG19 has the slowest processing speed, the average FPS value can only reach 6-9, and the processing speed of MobileNet V3-small is the fastest, and the average FPS value reaches 24-27. Experiments prove that the conformity of the OpenPose model and the MobileNetV3-small is best.

And the distribution result of the swing motion of various shuttlecocks by the network is extracted in order to further analyze different characteristics. The number of samples for each type of action is chosen to be 100. According to the experimental result, the VGG19 can correctly identify that the average number of swing actions is 88.25; mobileNet V1 can correctly recognize that the average number of the swing actions is 89.13; mobileNet V2 can correctly recognize that the average number of the swing actions is 90.88; mobileNet V3-small can correctly recognize that the average number of swing actions is 92.25. According to the experimental results, the highest identification accuracy rate of the badminton swing motion type in the four feature extraction networks is MobileNet V3-small.

Standardization of badminton racket swinging action

Before analyzing the swing of the athlete, standard swing and evaluation indexes are defined. According to the technical characteristics of the swing actions of the eight shuttlecocks, the standard evaluation indexes are defined and used as the basis for the swing action suggestion. Parameter naming of evaluation indexes is represented according to the definition of a COCO data set on human body skeleton points (table 5), wherein @ 567 represents an included angle formed by a left shoulder joint, a left elbow joint and a left wrist joint, 5 represents the left shoulder joint, 6 represents the left elbow joint and 7 represents the left wrist joint in 18-point skeleton labeling. Formula h ₃ ≈h ₀ Indicating that the right elbow joint should be consistent with the head height. The badminton swing motion criteria are defined as shown in Table 6 below.

TABLE 5

Label (R)	MPII labeling	COCO labeling	COCO + Foot labeling
				0	Right ankle joint	Nasal bone joint	Nasal bone joint
1	Right knee joint	Neck part	Neck part
				2	Right hip joint	Left shoulder joint	Left shoulder joint
3	Left hip joint	Left elbow joint	Left elbow joint
				4	Left knee joint	Left wrist joint	Left wrist joint
5	Left ankle joint	Right shoulder joint	Right shoulder joint
				6	Pelvis	Right elbow joint	Right elbow joint
7	Chest part	Right wrist joint	Right wrist joint
				8	Exerting force	Left hip joint	Lower abdomen
9	Head top	Left knee joint	Left hip joint
				10	Right wrist joint	Left ankle joint	Left knee joint
11	Right elbow joint	Right hip joint	Left ankle joint
				12	Right shoulder joint	Right knee joint	Right hip joint
13	Left shoulder joint	Right ankle joint	Right knee joint
				14	Left elbow joint	Left eye	Right ankle joint
15	Left wrist joint	Right eye	Left eye
				16	--	Left ear	Right eye
17	--	Right ear	Left ear
				18	--	--	Right ear
19	--	--	Medial aspect of right foot
				20	--	--	Lateral side of right foot
21	--	--	Right heel
				22	--	--	Medial aspect of left foot
23	--	--	Lateral side of left foot
				24	--	--	Left heel

TABLE 6

Finally, the present invention is demonstrated to have application prospects through the demonstration of the embodiment, however, it should be noted that, due to space limitation, the present invention fails to demonstrate all movements one by one, but the rest of movements in the information collection, data analysis and result analysis are consistent with the above embodiment, and therefore, the above embodiment should not be construed as limiting the present invention.

Various corresponding changes and modifications can be made by those skilled in the art based on the above technical solutions and concepts, and all such changes and modifications should be included in the protection scope of the present invention.

Claims

1. A physical exercise training method based on human body posture estimation is characterized by comprising the following steps:

2. The physical exercise training method based on human body posture estimation as claimed in claim 1, wherein the physical exercise intelligent analysis module comprises a human body posture estimation sub-module and a physical exercise action standard degree evaluation sub-module of OpenPose improvement.

3. The physical exercise training method based on human body posture estimation as claimed in claim 1, wherein the feature extraction network of the human body posture estimation submodule of the improved OpenPose is a lightweight MobileNet network, and the convolution kernel structure of the network is a series structure formed by a plurality of micro convolutions.

4. A physical exercise training method based on human posture estimation as claimed in claim 3, characterized in that the input image is subjected to MobileNet network feature extraction to generate a set of feature maps F, then the input image enters Branch1 and Branch2 Branch prediction limb confidence coefficient and part affinity vector fields in parallel, after the input image passes through such a stage, a set of limb confidence coefficient maps and a set of part affinity vector fields are obtained, then the output of the previous stage and the original feature maps F are converged again to enter the next stage, and such a process is continuously performed six times, and finally the human skeleton information in the image is output.

5. A method for human posture estimation based sports training as claimed in claim 3, wherein said several micro convolutions include a series structure of a 1 x 1 convolution, a 3 x 3 depth separable convolution and a 2-expansion coefficient void convolution, wherein a 1 x 1 convolution kernel and 2 3 x 3 convolution kernels are used to take account of the correlation of local information, then according to the concept of depth separable convolution, the middle layer 3 x 3 standard convolution kernel is decomposed into 13 x 3 depth convolution and a 1 x 1 channel point convolution, and the last layer uses a 2-expansion coefficient 3 x 3 void convolution to compensate the existing receptive field loss.

6. A method for athletic movement training based on human body posture estimation according to claim 3 or 4, wherein the version of the MobileNet network is MobileNet V3-small.

7. The physical exercise training method based on human posture estimation as claimed in claim 1, wherein the physical exercise action standard degree evaluation submodule includes human tracking, physical exercise action standardization, a sparse representation model of human posture and physical exercise action scoring, wherein a particle filter algorithm is adopted to realize human tracking on the result output by human posture estimation of openpos improvement, define the standard and evaluation index of physical exercise action, and judge the similarity of player action and standard action according to the sparse representation model of human posture.

8. A method as claimed in claim 7, wherein said sparse representation model includes a fine description and a rough description, wherein the eyes, ears or other useless joint points of the athlete are removed from the obtained image data when the sports motion is characterized, the number of the described joint points is 14, and the two-dimensional coordinates of all the skeleton points are described as one

According to the sport motion standardization parameters, the requirement of the angle position of the upper limb motion of the athlete is higher than that of the lower limb motion, the upper limb joints are described finely, and other joint points are described roughly.

9. A physical training method according to claim 8, wherein the similarity between the athlete's movement and the standard movement is determined by calculating the similarity using the pose distance, wherein the similarity is calculated using the Mahalanobis distance calculation formula.

10. A method for training physical exercise based on human posture estimation as claimed in claim 7, further comprising scoring the physical exercise movement, defining a score by using posture distance, wherein the lower the distance value, the higher the score of the movement to be scored, wherein the value obtained by using the Mahalanobis distance measure is between 0 and 1, and 1 is used to subtract the distance value between postures, and the larger the value at this time, the higher the score; the score is defined as: (1 minus mahalanobis distance) x 100, i.e.:

Score＝(1-D _M )×100；

wherein D is _M Representing the mahalanobis distance between the athletic movement of the player to be scored and the standard movement.

11. A system for implementing the physical exercise training method based on human posture estimation according to any one of claims 1 to 8, wherein the system comprises a motion video acquisition and transmission module, a database storage module, a physical exercise motion intelligent analysis module and an evaluation result presentation module; the system is used for acquiring motion video images, comprises network cameras deployed on the roof around an indoor sports ground, and has a function of transmitting video streams in real time by 5G transmission equipment.