CN115272796A

CN115272796A - Behavior recognition method, behavior recognition device, behavior recognition equipment and storage medium

Info

Publication number: CN115272796A
Application number: CN202210905892.4A
Authority: CN
Inventors: 王宪
Original assignee: Jinan Boguan Intelligent Technology Co Ltd
Current assignee: Jinan Boguan Intelligent Technology Co Ltd
Priority date: 2022-07-29
Filing date: 2022-07-29
Publication date: 2022-11-01

Abstract

The application discloses a behavior recognition method, a behavior recognition device, equipment and a storage medium, which relate to the field of computer vision and comprise the following steps: detecting a target video by using a preset detection method to obtain key points; filtering the key points based on a preset filtering rule to obtain abandoned key points and filtered key points; establishing a linear model based on the abandoned key points and the filtered key points, and performing regression prediction on the abandoned key points by using the linear model to obtain regression key points; correcting the abandoned key points according to a preset correction rule and the regressed key points to obtain corrected key points; and performing corresponding analysis and identification operation based on the corrected key points. The abandoned key points are obtained through filtering, the linear model is established based on the abandoned key points to correct the abandoned key points, and the accuracy and robustness of behavior recognition are improved under the condition that the key point detection is inaccurate.

Description

Behavior recognition method, behavior recognition device, behavior recognition equipment and storage medium

Technical Field

The present invention relates to the field of computer vision, and in particular, to a behavior recognition method, apparatus, device, and storage medium.

Background

Many scenes in real life have various application requirements on behavior recognition algorithms, such as recognizing falling behaviors and the like. The method comprises the steps of extracting coordinates of key points of a human body in a video image through a feature extraction network, wherein the common feature extraction network for extracting the key points of the bone comprises openposition (environment building), alphaposition (human body posture recognition) and the like, then learning the position relation of different key points by using a graph neural network or a convolution network, and finally recognizing action behaviors of the human body. The two-stage behavior recognition method generally comprises a feature extraction network and a behavior classification network, and has the advantages that background information irrelevant to a human body in an RGB (Red-Green-Blue) image can be filtered through skeleton feature extraction, but an important problem is faced in practical application: the precision of the behavior classification network in the second stage depends heavily on the feature extraction network in the first stage, and when the key points of the human body extracted by the feature extraction network are inaccurate and incomplete, the behavior classification network often cannot accurately identify the behavior action of the human body. However, the real scene often has incomplete pictures or occlusion problems, and the whole behavior action of the human body causes inaccurate feature point extraction due to target occlusion or poor robustness of the feature extraction network to the complex scene.

The key points can be regarded as characteristic values of data, and for the problems of deletion and drift of the key points, a common solution in engineering is to adopt a method for processing the characteristic missing values, such as discarding the data, replacing characteristic average values and the like. Discarding data means that when key point data is missing, in practical use, the data is sometimes discarded, and the method of replacing feature average means that when a certain key point in the key point data is missing, the average of the key point in all data is used to replace the missing value of the bone key point. The disadvantages of this method are: due to the fact that actual application scenes are complex and various, a large amount of incomplete skeleton data exist, and abandoning data with missing feature points can greatly reduce the data utilization rate and limit the generalization capability of the model when the recognition model is trained; when the identification model is used for reasoning, and the key points regressed in the application scene are inaccurate, the model fails. The method for replacing the missing key points by the feature average values ignores the position relation between the missing key points and other key points, so that the key point data is not accurate, and finally, the performance of the identification model is influenced.

Another practical method is to use the curve fitting algorithm of the critical edges of the previous and next frames to regress the missing key points. The method utilizes the relation of the front frame and the rear frame to the key points with the confidence degree smaller than a certain value, and uses the adjacent edge curve fitting algorithm to complete the calculation of the coordinates and the confidence degree of the key points. The disadvantages of this method are: more information of previous and subsequent frames is needed in application, and the captured single picture algorithm can not only regress key points, so that the application range of the algorithm is limited. The critical edge curve fitting must ensure that the key points are in the image range, the key points which exceed the image range cannot be regressed, and the algorithm is invalid when the adjacent points of the current later frame are all outside the image area, so that the algorithm cannot solve the problem of regression correction of the key points of an incomplete human body.

Disclosure of Invention

In view of this, the present invention provides a behavior recognition method, apparatus, device and storage medium, which can improve the accuracy and robustness of behavior recognition. The specific scheme is as follows:

in a first aspect, the present application discloses a behavior recognition method, including:

detecting a target video by using a preset detection method to obtain key points;

filtering the key points based on a preset filtering rule to obtain abandoned key points and filtered key points;

establishing a linear model based on the abandoned key points and the filtered key points, and performing regression prediction on the abandoned key points by using the linear model to obtain regression key points;

correcting the abandoned key points according to a preset correction rule and the regressed key points to obtain corrected key points;

and performing corresponding analysis and identification operation based on the corrected key points.

Optionally, the detecting the target video by using a preset detection method to obtain the key point includes:

detecting each frame of video frame of a target video by using a preset detection method so as to obtain a preset number of key points, coordinates of the key points and confidence degrees of the key points on each frame of the video frame.

Optionally, the filtering the key points based on the preset filtering rule to obtain discarded key points and filtered key points includes:

comparing the confidence degrees of all the key points with a preset confidence degree threshold value;

determining the key points corresponding to the confidence degrees of the key points which are greater than the preset confidence degree threshold value as the filtered key points;

and determining the key points corresponding to the confidence degrees of the key points smaller than the preset confidence degree threshold value as the abandoned key points.

Optionally, after the linear model is built based on the discarded keypoints and the filtered keypoints, the method further includes:

obtaining standard key point data to obtain a standard data set;

all data in the standard data set are subjected to normalization operation to obtain a normalized data set;

and executing preset parameter solving operation based on the linear model and the normalized data set so as to obtain a target formula after calculating unknown parameters in the linear model.

Optionally, the performing regression prediction on the discarded keypoints by using the linear model to obtain the post-regression keypoints includes:

calculating based on the target formula and the abandoned key points to obtain target key points;

and changing the coordinate scale corresponding to the coordinates of the target key point into the original coordinate scale so as to obtain the coordinates of the corresponding regressed key point.

Optionally, after the discarding the keypoints are corrected according to a preset correction rule and the regression-performed keypoints to obtain corrected keypoints, the method further includes:

judging whether a regression coordinate value corresponding to the regression key point exceeds a preset pixel range or not;

if the regression coordinate value exceeds the preset pixel range, carrying out secondary coordinate correction on the corrected coordinate of the key point through a preset centralization correction rule to obtain a centralized coordinate;

and correcting the confidence coefficient of the abandoned key point by using a preset confidence coefficient correction formula to obtain a target confidence coefficient.

Optionally, the performing, based on the corrected key points, corresponding analysis and identification operations includes:

forming a target key point set by the filtered key points and the corrected key points;

generating a target matrix based on the coordinates corresponding to the target key point set and the confidence;

and inputting the target matrix into a preset recognition network for training so as to perform corresponding analysis recognition operation on the video by using the trained recognition network.

In a second aspect, the present application discloses a behavior recognition apparatus, comprising:

the key point detection module is used for detecting the target video by using a preset detection method to obtain key points;

the key point filtering module is used for filtering the key points based on a preset filtering rule to obtain abandoned key points and filtered key points;

a model building module for building a linear model based on the discarded key points and the filtered key points;

the key point regression module is used for carrying out regression prediction on the abandoned key points by utilizing the linear model so as to obtain regressed key points;

the key point correction module is used for correcting the abandoned key points according to a preset correction rule and the regressed key points to obtain corrected key points;

and the analysis and identification module is used for carrying out corresponding analysis and identification operation on the corrected key points.

In a third aspect, the present application discloses an electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program for implementing the steps of the behavior recognition method as disclosed in the foregoing.

In a fourth aspect, the present application discloses a computer readable storage medium for storing a computer program; wherein the computer program, when executed by a processor, implements a behavior recognition method as disclosed in the preceding.

As can be seen, the present application provides a behavior recognition method, comprising: detecting a target video by using a preset detection method to obtain key points; filtering the key points based on a preset filtering rule to obtain abandoned key points and filtered key points; establishing a linear model based on the abandoned key points and the filtered key points, and performing regression prediction on the abandoned key points by using the linear model to obtain regression key points; correcting the abandoned key points according to a preset correction rule and the regressed key points to obtain corrected key points; and performing corresponding analysis and identification operation based on the corrected key points. Therefore, the abandoned key points are obtained through filtering, the linear model is built based on the abandoned key points to correct the abandoned key points, namely, the accurate corrected key points corresponding to the abandoned key points which are inaccurate to detect are obtained, and then the analysis and identification operation is carried out according to the corrected key points, so that the accuracy and the robustness of behavior identification are improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flow chart of a behavior recognition method disclosed herein;

FIG. 2 is a flow chart of a particular behavior recognition method disclosed herein;

FIG. 3 is a flow chart of a particular behavior recognition method disclosed herein;

FIG. 4 is a flow chart of a particular behavior recognition method disclosed herein;

fig. 5 is a schematic structural diagram of a behavior recognition device provided in the present application;

fig. 6 is a block diagram of an electronic device provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

At present, the precision of the behavior classification network in the second stage depends heavily on the feature extraction network in the first stage, and when the key points of the human body extracted by the feature extraction network are inaccurate and incomplete, the behavior classification network often cannot accurately identify the behavior actions of the human body. However, the real scene often has the problem of incomplete pictures or occlusion, and the condition of inaccurate feature point extraction is caused by target occlusion or poor robustness of the feature extraction network to the complex scene in the whole behavior action of the human body. Therefore, the behavior identification method is provided, and the accuracy and robustness of behavior identification can be improved.

The embodiment of the invention discloses a behavior identification method, which is shown in figure 1 and comprises the following steps:

step S11: and detecting the target video by using a preset detection method to obtain key points.

In this embodiment, a preset detection method is used to detect a target video to obtain a key point. Specifically, each frame of video frame of the target video is detected by using a preset detection method, so as to obtain a preset number of key points, coordinates of the key points and confidence degrees of the key points on each frame of video frame. The preset detection method may adopt a detection method commonly used in the industry, and in practical application, a multi-user key point detection algorithm is used, and the method includes a bottom-up (bottom-up) opendose, an alphadose and the like, or a top-down (top-down) fastern (target detection algorithm) plus an HRnet (neural network) and the like. Inputting each frame in the video frame sequence into an algorithm, and regressing coordinates of a preset number of key points and corresponding confidence degrees corresponding to each target. For example, coordinates of 17 key points of each portrait and a confidence corresponding to each key point are regressed, and a correspondence between a sequence number and a position of a key point may be set as follows: 1-nose, 2-left eye, 3-right eye, 4-left ear, 5-right ear, 6-left shoulder, 7-right shoulder, 8-left elbow, 9-right elbow, 10-left wrist, 11-right wrist, 12-left hip, 13-right hip, 14-left knee, 15-right knee, 16-left ankle, 17-right ankle.

It should be noted that, besides alphapos, openpos, and fast _ Rcnn plus HRnet, there are other key point coordinate extraction networks that can complete the whole technical solution process.

Step S12: and filtering the key points based on a preset filtering rule to obtain abandoned key points and filtered key points.

In this embodiment, after the target video is detected by using a preset detection method and the key points are obtained, the key points are filtered based on a preset filtering rule to obtain the discarded key points and the filtered key points. It can be understood that, the detected key points include the key points with more accurate detection and the key points with inaccurate detection, so the key points with inaccurate detection in the detection need to be determined through the filtering operation. The discarded key points are the key points which do not meet the filtering condition at the moment, and the key points after filtering are the remaining key points which meet the filtering condition at the moment. In a specific embodiment, filtering is performed according to the confidence of the detected keypoints.

Step S13: and establishing a linear model based on the abandoned key points and the filtered key points, and performing regression prediction on the abandoned key points by using the linear model to obtain regression key points.

In this embodiment, after filtering the key points based on a preset filtering rule to obtain rejected key points and filtered key points, a linear model is established based on the rejected key points and the filtered key points, and regression prediction is performed on the rejected key points by using the linear model to obtain regressed key points. It is to be understood that the linear model assumes that each rejected keypoint has a linear relationship with a known set of keypoints, and therefore a linear model is built based on each rejected keypoint and the filtered keypoints, and then the rejected keypoints are subjected to regression prediction using the linear model to obtain the post-regression keypoints. Namely predicting the accurate regression key points corresponding to the discarded key points with inaccurate detection.

Step S14: and correcting the abandoned key points according to a preset correction rule and the regression key points to obtain corrected key points.

In this embodiment, after the linear model is used to perform regression prediction on the discarded keypoints to obtain post-regression keypoints, the discarded keypoints are corrected according to a preset correction rule and the post-regression keypoints to obtain corrected keypoints. It can be understood that the key point after regression prediction of the discarded key point by using the linear model is only a preset key point obtained by calculation, and is not a real key point, so that the coordinate of the discarded key point needs to be replaced by the coordinate of a new key point after regression to obtain a replaced coordinate, and the discarded key point is corrected according to a preset correction rule and the replaced coordinate to obtain a corrected key point, wherein the corrected key point is obtained by weighting the original coordinate of the discarded key point and the predicted coordinate of the key point after regression. Note that the corrected keypoints are keypoints satisfying the filtering condition of this time.

Step S15: and performing corresponding analysis and identification operation based on the corrected key points.

In this embodiment, after the discarded keypoints are corrected according to a preset correction rule and the regression keypoints, corresponding analysis and identification operations are performed based on the corrected keypoints. It can be understood that a target key point set is formed based on the corrected key points and the original filtered key points, and the target key point set is the key point set acquired under the condition of accurate detection. And then, carrying out corresponding analysis and identification operation on the target key point set obtained in the video by using the trained identification network.

It can be understood that, when behavior recognition is performed by using the key points, if the confidence of returning the human body key points is low, the coordinate positions of the key points are often inaccurate, and thus the accuracy of behavior recognition is reduced. The actual application scenes are complex and various, and the condition that the key points are inaccurate easily occurs. Therefore, as shown in fig. 2, low confidence key points (i.e., discarded key points) are filtered, threshold filtering is performed on the detected human body key points according to the confidence, the detected key points with lower confidence are regarded as discarded key points which need to be corrected, then linear regression modeling is performed on the discarded key points, model parameters are solved by using known, accurate and complete human body key data, the positions of the low confidence key points are regressed, then normalized correction is performed on all the regressed key points, accurate and complete human body key point data are obtained, and finally the accuracy of the human body behavior recognition model under the conditions that the human body is shielded and the human body is incomplete is improved, namely the accuracy and robustness of behavior recognition are improved under the condition that the key point detection is inaccurate.

As can be seen, the present application provides a behavior recognition method, comprising: detecting a target video by using a preset detection method to obtain key points; filtering the key points based on a preset filtering rule to obtain abandoned key points and filtered key points; establishing a linear model based on the abandoned key points and the filtered key points, and performing regression prediction on the abandoned key points by using the linear model to obtain regression key points; correcting the abandoned key points according to a preset correction rule and the regressed key points to obtain corrected key points; and performing corresponding analysis and identification operation based on the corrected key points. Therefore, the abandoned key points are obtained through filtering, the linear model is built based on the abandoned key points to correct the abandoned key points, namely, the accurate corrected key points corresponding to the abandoned key points which are detected inaccurately are obtained, and then the analysis and identification operations are carried out according to the corrected key points, so that the accuracy and robustness of behavior identification are improved under the condition that the key point detection is inaccurate.

Referring to fig. 3, the embodiment of the present invention discloses a behavior recognition method, and the embodiment further describes and optimizes the technical solution with respect to the previous embodiment.

Step S21: and detecting the target video by using a preset detection method to obtain key points.

Step S22: and comparing the confidence degrees of all the key points with a preset confidence degree threshold value to obtain abandoned key points and filtered key points.

In this embodiment, the confidence degrees of all the key points are compared with a preset confidence degree threshold value to obtain the discarded key points and the filtered key points. Determining the key points corresponding to the confidence degrees of the key points which are greater than the preset confidence degree threshold value as the filtered key points; and determining the key points corresponding to the confidence degrees of the key points smaller than the preset confidence degree threshold value as the abandoned key points. It can be understood that the keypoints detected by the keypoint detection model are filtered according to the confidence level, and the keypoints lower than the confidence level are to be filtered, in practical applications, for example, the filtering threshold is set to 0.6, and the filtering threshold can be adjusted according to a specific scenario, and the filtered keypoints are determined as rejected keypoints.

Specifically, the key points before filtration are as follows:

P_A＝{p₁,p₂,…,p₁₇}, (equation 1);

keypoints below the threshold (i.e., discard keypoints) are:

P_L＝{p_l1,p_l2,…,p_lM}, (equation 2);

wherein M is<8, i.e. no more than 8 discard values at most, and P_L∈P_A；

Key points after filtration:

P_R＝{p_r1,p_r2,…,p_rN}, (equation 3);

wherein P is_R＝P_A-P_L，N＝17-M。

Step S23: and establishing a linear model based on the abandoned key points and the filtered key points.

In this embodiment, a linear model is established based on the discarded keypoints and the filtered keypoints. The model assumes that each rejected keypoint has a linear relationship with the set of known keypoints, and builds a linear model for each rejected keypoint and the filtered keypoints.

The concrete formula is as follows:

…

wherein p is_R＝[p_r1,p_r2,…,p_rN]^T，p_RIs an N-dimensional column vector, x₁,x₂,…,x_M∈R^N。R^NIs an N-dimensional real vector space.

From the above relationship, a matrix can be obtained:

p_L＝Xp_R(formula 7); p is a radical of_L＝[p_l1,p_l2,…,p_lM]^T，p_LAs M-dimensional column vectors； X＝[x₁,x₂,…,x_M]^TX is an M by N matrix, and T symbols represent transposes.

Step S24: and acquiring standard key point data to obtain a standard data set.

In this embodiment, the standard key point data is obtained to obtain a standard data set. It can be understood that after the linear model is built, the parameter matrix is needed to solve the parameters in each linear model formula one by one, that is, to obtain X in formula 7. Firstly, solving parameters in formula 4, wherein the solving process is as follows: collecting and filtering data; data normalization and division; the parameters are solved for three parts. Specifically, the data collection and filtering: the method comprises the steps of collecting complete and accurate human body key point data, using COCO (Common Object in Context, data set) and other public data sets to disclose the key point data, using a key point extraction network to select some normal data with simple scenes and complete human bodies to input into the network, and selecting data with accurate coordinate extraction and containing key points at all positions through manual screening. Finally, a data set D is obtained₀∈R^S×17Wherein S is the number of data pieces used for extracting parameters after screening, generally S>1000。

Step S25: and carrying out normalization operation on all data in the standard data set to obtain a normalized data set.

In this embodiment, a normalization operation is performed on all data in the standard data set to obtain a normalized data set. It can be understood that the data normalization and division specifically operate as follows:

performing maximum and minimum normalization on the coordinates of the key points, and performing maximum and minimum normalization on all d₀∈D₀And d is₀∈R¹⁷：

Finally obtaining the normalized numberAccording to D ∈ R^S×17。

The position d of the unknown key point in the corresponding formula 4 in the data set is taken out_l1∈R^S×1Taking out the position D of the known key point in the corresponding formula 4_R∈R^S×NWherein d is_l1And D_RAll belonging to the column space of D.

Step S26: and executing preset parameter solving operation based on the linear model and the normalized data set so as to obtain a target formula after calculating unknown parameters in the linear model.

In this embodiment, a preset parameter solving operation is performed based on the linear model and the normalized data set, so as to obtain a target formula after calculating unknown parameters in the linear model. Specifically, according to modeling formula 4 and the obtained data, there are:

assuming the loss function is a two-norm square:

substituting equation 9 into equation 10, and expanding the loss function, finally simplifying to obtain:

l is with respect to x₁And (4) solving the derivative of the convex function to directly obtain an optimal solution.

Order to

The following can be obtained:

according to the above procedure, other parameters can be calculated:

thereby obtaining a parameter matrix X, and obtaining a final known linear model.

Step S27: and calculating based on the target formula and the abandoned key points to obtain target key points.

In this embodiment, the target key point is obtained based on the target formula and the discarded key point calculation. Specifically, the target key points corresponding to the rejection key points generated by threshold filtering can be directly calculated by obtaining the solved parameters, and M target key points P are calculated according to formula 7_L', linear regression prediction from known points to unknown points is completed.

Step S28: and changing the coordinate scale corresponding to the coordinates of the target key point into the original coordinate scale so as to obtain the coordinates of the corresponding regressed key point.

In this embodiment, after the target key point is obtained based on the target formula and the calculation of the discarded key point, the coordinate scale corresponding to the coordinate of the target key point is changed to the original coordinate scale, so as to obtain the coordinate of the corresponding regressed key point. Specifically, the regressed coordinates of the key points are transformed back to the original coordinate scale:

wherein P is_LIs the key point to be filtered out,

is a post-regression keypoint after scale transformation。

Step S29: and correcting the abandoned key points according to a preset correction rule and the regression key points to obtain corrected key points.

In this embodiment, the abandoned key point is corrected according to a preset correction rule and the regressive key point to obtain a corrected key point. After scale transformation, the coordinates are regressed

Has been transformed back to the original coordinates P_LUnder the scale. In practical application, in order to obtain a more accurate predicted value, a prediction correction formula can be adopted to correct the low-confidence key points, and coordinates regressed by a key point detection network and linear regression prediction coordinates are used for correction:

wherein

Predicting coordinates, P, for the transformed linear regression_LFor low confidence coordinate values that are filtered, λ is an adjustable real parameter, ranging between 0 and 1, with larger λ representing a larger weight of the linear regression prediction to the result.

It can be understood that the flow of coordinate rectification is as follows: and weighting, normalizing and centralizing the low confidence coefficient coordinates and the modeling regression coordinates to finally obtain corrected key point coordinates. The confidence correction method comprises the following steps: and obtaining a corrected key point confidence value through linear interpolation according to the weighting parameters and the filtering threshold value.

Step S210: and performing corresponding analysis and identification operation based on the corrected key points.

For the specific content of the above steps S21 and S210, reference may be made to the corresponding content disclosed in the foregoing embodiments, and details are not repeated here.

Therefore, the target video is detected by using a preset detection method to obtain key points; comparing the confidence degrees of all the key points with a preset confidence degree threshold value to obtain abandoned key points and filtered key points; establishing a linear model based on the discarded key points and the filtered key points; obtaining standard key point data to obtain a standard data set; all data in the standard data set are subjected to normalization operation to obtain a normalized data set; performing a preset parameter solving operation based on the linear model and the normalized data set so as to obtain a target formula after calculating unknown parameters in the linear model; calculating based on the target formula and the abandoned key points to obtain target key points; changing the coordinate scale corresponding to the coordinates of the target key point into the original coordinate scale to obtain the coordinates of the corresponding regressed key point; correcting the abandoned key points according to a preset correction rule and the regressed key points to obtain corrected key points; in the embodiment, the linear relation among the key points is utilized, the coordinates of the key points with low reliability are corrected more accurately through linear modeling and coordinate point regression prediction, then corresponding analysis and identification operations are carried out on the corrected key points, the precision and robustness of behavior identification are improved, and the behavior identification algorithm based on the key points of the human body has more accurate input data and better indexes through the preset correction rules.

Referring to fig. 4, the embodiment of the present invention discloses a behavior recognition method, and the embodiment further describes and optimizes the technical solution with respect to the previous embodiment.

Step S31: and detecting the target video by using a preset detection method to obtain key points.

Step S32: and filtering the key points based on a preset filtering rule to obtain abandoned key points and filtered key points.

Step S33: and establishing a linear model based on the abandoned key points and the filtered key points, and performing regression prediction on the abandoned key points by using the linear model to obtain the regressed key points.

Step S34: and correcting the abandoned key points according to a preset correction rule and the regression key points to obtain corrected key points.

Step S35: and judging whether the regression coordinate value corresponding to the regression key point exceeds a preset pixel range.

In this embodiment, the abandoned key point is corrected according to a preset correction rule and the regression key point, and after the corrected key point is obtained, whether a regression coordinate value corresponding to the regression key point exceeds a preset pixel range is determined. Since the coordinate values calculated by the scale transformation of the formula 15 may exceed the pixel range, the modeling method can regress and calculate the coordinate values with low confidence coefficient outside the pixels, thereby completing the regression of the key points outside the image.

Step S36: and if the regression coordinate value exceeds the preset pixel range, carrying out secondary coordinate correction on the corrected coordinates of the key points through a preset centralization correction rule to obtain centralized coordinates.

In this embodiment, if the regression coordinate value exceeds the preset pixel range, secondary coordinate correction is performed on the coordinates of the corrected key points through a preset centering correction rule, so as to obtain centered coordinates. In the same way as above, the first and second,

coordinate points outside the image may exist, and the coordinate points can be directly used as input of other subsequent algorithms in order to maintain the relative coordinate relationship, and the coordinate points can also be used

And (3) performing centering correction, and repositioning all pixels to the center of the image, wherein the specific formula is as follows:

wherein λ is_centerRefers to the proportion of the center of the image to the whole pixel, P_maxRefers to the largest pixelValue of,

is the coordinates before the centering and the coordinates before the centering,

is the centered coordinate.

Step S37: and correcting the confidence coefficient of the abandoned key point by using a preset confidence coefficient correction formula to obtain a target confidence coefficient.

In this embodiment, the preset confidence correction formula is used to correct the confidence of the corrected key points, so as to obtain the target confidence. It can be understood that, in addition to the regression correction for the coordinates with low confidence, the confidence correction for these keypoints is also needed, and since the corrected keypoint coordinates are fused with the coordinate information of the known keypoint, these corrected keypoints have higher confidence, and the confidence correction formula adopted in practice is as follows:

wherein theta is_thIs the keypoint confidence filtering threshold, λ is the coordinate correction parameter in equation 16,

is the confidence of the coordinates after correction, obviously has

The corrected confidence value is also greater when the value of λ is greater, i.e., the regression coordinate is weighted more heavily.

Step S38: and performing corresponding analysis and identification operation based on the corrected key points.

In this embodiment, a corresponding analysis and identification operation is performed based on the corrected key points. Regression corrected key points

And a threshold valueKey points P remaining after filtration_RAll the key points of the human body are formed again. It should be noted that the filtered keypoints and the corrected keypoints form a target keypoint set, then a target matrix is generated based on the coordinates and the confidence degrees corresponding to the target keypoint set, and finally the target matrix is input to a preset recognition network for training, so that the trained recognition network is used for performing corresponding analysis recognition operation on the video.

Specifically, all the reconstructed human body key points are labeled according to behavior types and used as input to be sent to recognition network training. For example, the identification network that can be used in practical applications is POSEC3D, and a graph network or other infrastructure network structure may also be used. Taking POSEC3D as an example, it is necessary to map 17 keypoints into a thermodynamic diagram of 17 channels, and intercept 48 consecutive frames as input to the network. POSEC3D will fuse the spatio-temporal information of the key points. When generating the thermodynamic diagram, each channel of input data is generated by a key point coordinate P and a key point confidence coefficient theta, a two-dimensional Gaussian distribution sparse matrix is generated by taking the coordinate P as a central point and the confidence coefficient theta as a peak value, the output value of a key point regression network is used for the key point coordinate P with high confidence coefficient and the confidence coefficient theta with low confidence coefficient, and the corrected key point coordinate and confidence coefficient are used

And

the initial learning rate of training is set to be 0.01, parameters are optimized by adopting a random gradient descent algorithm, and for all data training rounds of 240, corresponding parameters can be adjusted according to data scale and scenes.

In the inference process, trained parameters of the recognition network are loaded, the parameter matrix X in the correction stage can be calculated off line under the condition of limited calculation resources, and then the regression correction of the key point coordinates is directly completed by using a formula 7 and a formula 15 in the inference stage. In actual scene data with more shielding, different schemes under different lambda values obtain the results shown in table 1:

TABLE 1

The recall of the key point behavior algorithm can be obviously improved by adding the regression correction algorithm under the similar precision, the correction parameter lambda is not suitable to be set too high or too low, different set values can be selected according to different scenes, and the model index can be obviously improved. It should be noted that besides POSEC3D, there are other key-based behavior recognition networks including but not limited to ST-GCN, which can also complete the whole technical solution flow.

For the details of the steps S31 to S34, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.

Therefore, the target video is detected by using a preset detection method to obtain key points; filtering the key points based on a preset filtering rule to obtain abandoned key points and filtered key points; establishing a linear model based on the abandoned key points and the filtered key points, and performing regression prediction on the abandoned key points by using the linear model to obtain regression key points; correcting the abandoned key points according to a preset correction rule and the regressed key points to obtain corrected key points; judging whether a regression coordinate value corresponding to the regression key point exceeds a preset pixel range or not; if the regression coordinate value exceeds the preset pixel range, carrying out secondary coordinate correction on the corrected coordinate of the key point through a preset centralization correction rule to obtain a centralized coordinate; correcting the confidence coefficient of the abandoned key point by using a preset confidence coefficient correction formula to obtain a target confidence coefficient; and performing corresponding analysis and identification operation based on the corrected key points, so that the precision and robustness of behavior identification are improved.

Referring to fig. 5, an embodiment of the present application further discloses a behavior recognition apparatus, which includes:

the key point detection module 11 is configured to detect a target video by using a preset detection method to obtain a key point;

the key point filtering module 12 is configured to filter the key points based on a preset filtering rule to obtain discarded key points and filtered key points;

a model building module 13, configured to build a linear model based on the discarded keypoints and the filtered keypoints;

a key point regression module 14, configured to perform regression prediction on the discarded key points by using the linear model to obtain regression key points;

a key point correction module 15, configured to correct the discarded key points according to a preset correction rule and the regressed key points, so as to obtain corrected key points;

and the analysis and identification module 16 is used for performing corresponding analysis and identification operations based on the corrected key points.

As can be seen, the present application includes: detecting a target video by using a preset detection method to obtain key points; filtering the key points based on a preset filtering rule to obtain abandoned key points and filtered key points; establishing a linear model based on the abandoned key points and the filtered key points, and performing regression prediction on the abandoned key points by using the linear model to obtain regression key points; correcting the abandoned key points according to a preset correction rule and the regressed key points to obtain corrected key points; and performing corresponding analysis and identification operation based on the corrected key points. Therefore, the abandoned key points are obtained through filtering, the linear model is built based on the abandoned key points to correct the abandoned key points, namely, the accurate corrected key points corresponding to the abandoned key points which are detected inaccurately are obtained, and then the analysis and identification operations are carried out according to the corrected key points, so that the accuracy and robustness of behavior identification are improved under the condition that the key point detection is inaccurate.

In some specific embodiments, the key point detecting module 11 specifically includes:

the key point detection unit is used for detecting each frame of video frame of the target video by using a preset detection method so as to obtain a preset number of key points, coordinates of the key points and confidence degrees of the key points on each frame of the video frame.

In some specific embodiments, the key point filtering module 12 specifically includes:

the confidence coefficient comparison unit is used for comparing the confidence coefficients of all the key points with a preset confidence coefficient threshold value;

a filtered keypoint determining unit, configured to determine, as the filtered keypoint, a keypoint corresponding to the confidence of the keypoint that is greater than the preset confidence threshold;

a discarded keypoint determining unit, configured to determine, as the discarded keypoint, a keypoint corresponding to the confidence of the keypoint that is smaller than the preset confidence threshold.

In some specific embodiments, the model building module 13 specifically includes:

a linear model establishing unit, configured to establish a linear model based on the discarded keypoints and the filtered keypoints;

the standard key point data acquisition unit is used for acquiring standard key point data to obtain a standard data set;

the normalization unit is used for performing normalization operation on all data in the standard data set to obtain a normalized data set;

and the parameter solving unit is used for executing preset parameter solving operation based on the linear model and the normalized data set so as to obtain a target formula after calculating unknown parameters in the linear model.

In some embodiments, the keypoint regression module 14 specifically includes:

a target key point obtaining unit, configured to calculate based on the target formula and the discarded key points to obtain target key points;

and the coordinate scale changing unit is used for changing the coordinate scale corresponding to the coordinates of the target key point into the original coordinate scale so as to obtain the coordinates of the corresponding regressed key point.

In some embodiments, the keypoint correction module 15 specifically includes:

a pixel range judging unit, configured to judge whether a regression coordinate value corresponding to the regressed key point exceeds a preset pixel range;

the centralized correction unit is used for carrying out secondary coordinate correction on the coordinates of the corrected key points through a preset centralized correction rule to obtain centralized coordinates if the regression coordinate values exceed the preset pixel range;

and the confidence coefficient correction unit is used for correcting the confidence coefficient of the abandoned key point by using a preset confidence coefficient correction formula so as to obtain the target confidence coefficient.

In some embodiments, the analysis and identification module 16 specifically includes:

the analysis and identification unit is used for carrying out corresponding analysis and identification operation based on the corrected key points;

a target key point set composition unit, configured to combine the filtered key points and the corrected key points into a target key point set;

a matrix generating unit, configured to generate a target matrix based on the coordinates and the confidence degrees corresponding to the target key point set;

and the training unit is used for inputting the target matrix into a preset recognition network for training so as to perform corresponding analysis recognition operation on the video by using the trained recognition network.

Furthermore, the embodiment of the application also provides electronic equipment. Fig. 6 is a block diagram illustrating an electronic device 20 according to an exemplary embodiment, which should not be construed as limiting the scope of the application in any way.

Fig. 6 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present disclosure. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. Wherein the memory 22 is used for storing a computer program, which is loaded and executed by the processor 21 to implement the relevant steps in the behavior recognition method disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in the present embodiment may be specifically an electronic computer.

In this embodiment, the power supply 23 is configured to provide a working voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and a communication protocol followed by the communication interface is any communication protocol applicable to the technical solution of the present application, and is not specifically limited herein; the input/output interface 25 is configured to obtain external input data or output data to the outside, and a specific interface type thereof may be selected according to specific application requirements, which is not specifically limited herein.

In addition, the storage 22 is used as a carrier for resource storage, and may be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., and the resources stored thereon may include an operating system 221, a computer program 222, etc., and the storage manner may be a transient storage or a permanent storage.

The operating system 221 is used for managing and controlling each hardware device on the electronic device 20 and the computer program 222, and may be Windows Server, netware, unix, linux, or the like. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the behavior recognizing method performed by the electronic device 20 disclosed in any of the foregoing embodiments.

Further, an embodiment of the present application further discloses a storage medium, in which a computer program is stored, and when the computer program is loaded and executed by a processor, the steps of the behavior recognition method disclosed in any of the foregoing embodiments are implemented.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed in the embodiment corresponds to the method disclosed in the embodiment, so that the description is simple, and the relevant points can be referred to the description of the method part.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

The behavior recognition method, apparatus, device and storage medium provided by the present invention are described in detail above, and the principle and implementation of the present invention are explained herein by applying specific examples, and the description of the above examples is only used to help understanding the method and core ideas of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method of behavior recognition, comprising:

2. The behavior recognition method according to claim 1, wherein the detecting the target video by using a preset detection method to obtain the key points comprises:

3. The behavior recognition method according to claim 2, wherein the filtering the key points based on a preset filtering rule to obtain discarded key points and filtered key points comprises:

4. The behavior recognition method according to claim 2, wherein after the building a linear model based on the discarded keypoints and the filtered keypoints, further comprising:

obtaining standard key point data to obtain a standard data set;

5. The behavior recognition method according to claim 4, wherein the performing regression prediction on the discarded keypoints by using the linear model to obtain regression keypoints comprises:

6. The behavior recognition method according to claim 5, wherein after the discarding key points are corrected according to a preset correction rule and the regression key points to obtain corrected key points, the method further comprises:

7. The behavior recognition method according to any one of claims 2 to 6, wherein the performing corresponding analysis recognition operations based on the corrected key points comprises:

generating a target matrix based on the coordinates and the confidence corresponding to the target key point set;

8. A behavior recognition apparatus, comprising:

and the analysis and identification module is used for carrying out corresponding analysis and identification operation based on the corrected key points.

9. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program for carrying out the steps of the behavior recognition method according to any one of claims 1 to 7.

10. A computer-readable storage medium for storing a computer program; wherein the computer program when executed by a processor implements a method of behaviour recognition according to any of claims 1 to 7.