CN117593792A

CN117593792A - Abnormal gesture detection method and device based on video frame

Info

Publication number: CN117593792A
Application number: CN202311575924.XA
Authority: CN
Inventors: 李斌; 冯雪涛
Original assignee: Zhejiang Shenxiang Intelligent Technology Co ltd
Current assignee: Zhejiang Shenxiang Intelligent Technology Co ltd
Priority date: 2023-11-23
Filing date: 2023-11-23
Publication date: 2024-02-23

Abstract

The application discloses an abnormal gesture detection method and device based on video frames, wherein the method comprises the following steps: determining whether the target object accords with a first suspected abnormal posture standard or not according to a posture detection frame constructed for the target object in a sequence video frame image by a preset preliminary judging method; acquiring node sequence data of a target object meeting the first suspected abnormal posture standard in a three-dimensional space from the sequence video frame image; determining whether the target object conforming to the first suspected abnormal posture standard conforms to the second suspected abnormal posture standard or not according to the node sequence data by a preset dynamic judging method; determining whether the target object accords with a third abnormal posture standard according to the node sequence data by a preset static judging method aiming at the target object which accords with the second suspected abnormal posture standard; if yes, judging that the abnormal gesture of the target object occurs; thereby improving the accuracy of abnormal gesture detection and improving the accuracy.

Description

Abnormal gesture detection method and device based on video frame

Technical Field

The application relates to the technical field of computer image processing, in particular to a method and a device for detecting abnormal gestures based on video frames, and a method and a device for detecting abnormal gestures of a human body.

Background

Gesture detection is an important technology, and has wide application value in many fields. For example, accurate detection of human body gestures can help us understand and analyze human body motions, gestures, and actions, thereby playing an important role in physical training, human body health monitoring, safety monitoring, and the like. For example, in the field of physical training, accurate feedback and guidance can be provided by real-time monitoring and analysis of the athlete's stance to improve the training effect. In the field of healthcare, human posture detection can be used to track and analyze the posture and activity of patients, assist in rehabilitation and health management. In addition, in virtual reality and augmented reality applications, human body gesture detection may be used to track the actions of a user in real time, enabling a more immersive user experience.

In attitude monitoring, it is particularly important to monitor abnormal attitudes; for example, monitoring of abnormal posture conditions such as falling of the elderly, which is particularly important for daily life of people, is of great importance for timely taking of rescue measures.

In the prior art, various gesture detection technologies generally adopt sensors attached to a monitored object such as wearable equipment to acquire data, and then perform gesture analysis according to the data to realize the discovery of abnormal gestures.

However, the above solutions used in the prior art generally require a series of demanding technical conditions to be met. For example, it is first necessary that the monitored subject wear a sensor or camera at a specific location, requiring that these devices maintain good contact or positioning with the monitored subject. Such dependence depends especially on the coordination of the monitored object, however, these devices may be inconvenient for the monitored object, so that the compliance of the monitored object is often not good, which severely limits the application range and expandability of the technology, especially many monitoring scenarios cannot put any requirement for wearing devices on the monitored object, such as human body fall monitoring in public places, thus being unfavorable for achieving the purpose of abnormal gesture monitoring; typically, abnormal postures such as falling of old people in public places are monitored.

In the prior art, some human body gesture detection technologies also use monitoring equipment to identify, such as a depth camera and other video equipment to collect field images, and use image identification technologies such as a depth neural network and the like to identify abnormal human body gestures. However, the identification methods used in the prior art have a certain limitation in accuracy. For example, in complex environments, light, occlusion, and view limitation factors may lead to poor accuracy in recognition of abnormal gestures.

In view of the limitations of the prior art described above, it is necessary to provide a new technical solution to achieve more accurate abnormal gesture detection. The technical scheme can rapidly and reliably identify the abnormal gesture under the prior art environment so as to meet the requirements of different fields, in particular to the monitoring requirements of the abnormal gesture such as falling of the human body in a real-life scene.

Disclosure of Invention

The application provides an abnormal gesture monitoring method based on video frames, which aims to solve the problems that in the prior art, under an abnormal gesture monitoring normal scene, whether falling detection is based on wearing equipment or neural network image processing, the detection accuracy is low, misjudgment exists and the like.

The application provides an abnormal gesture detection method based on video frames, which comprises the following steps:

determining whether the target object accords with a first suspected abnormal posture standard or not according to a posture detection frame constructed for the target object in a sequence video frame image by a preset preliminary judging method;

acquiring node sequence data of a target object meeting the first suspected abnormal posture standard in a three-dimensional space from the sequence video frame image;

determining whether the target object conforming to the first suspected abnormal posture standard conforms to the second suspected abnormal posture standard or not according to the node sequence data by a preset dynamic judging method;

Determining whether the target object accords with a third abnormal posture standard according to the node sequence data by a preset static judging method aiming at the target object which accords with the second suspected abnormal posture standard; if yes, judging that the abnormal gesture of the target object occurs.

In some embodiments, the determining, according to the gesture detection frame constructed for the target object in the sequential video frame image and by a preset preliminary determination method, whether the target object meets the first suspected abnormal gesture criterion includes:

tracking the track of the target object according to the acquired video information, and acquiring a video frame image of the target object tracking sequence;

defining the gesture detection frame of the target object in the frame image according to the target object tracking sequence video frame image;

and judging whether the target object accords with the first suspected abnormal gesture standard or not by a preset preliminary judgment method according to the gesture detection frame and the reference gesture detection frame.

In some embodiments, the determining, according to the gesture detection frame and the reference gesture detection frame, whether the target object meets the first suspected abnormal gesture criterion according to a preset preliminary determination method includes:

Determining an aspect ratio of the gesture detection frame and a reference aspect ratio of the reference gesture detection frame;

and comparing the reference aspect ratio with the aspect ratio of the gesture detection frame, and judging whether the target object meets the first suspected abnormal gesture standard.

In some embodiments, the determining whether the target object meets the first suspected abnormal pose criteria comprises:

counting the number of frames of the sequence video frame images of which the aspect ratio of the gesture detection frame is larger than the reference aspect ratio;

and when the frame number is greater than or equal to an abnormal frame number reference value, determining the target object to accord with the first suspected abnormal posture standard.

In some embodiments, the acquiring node sequence data of the target object meeting the first suspected abnormal gesture standard in the sequence video frame image in three-dimensional space includes:

according to the 3D gesture estimation of the target object which accords with the first suspected abnormal gesture standard in the sequence video frame image, node coordinate data and confidence coefficient of the target object in a three-dimensional space are obtained;

and determining the node coordinate data and the node confidence coefficient thereof as the node sequence data.

In some embodiments, the determining, according to the node sequence data, whether the target object meeting the first suspected abnormal gesture criterion meets the second suspected abnormal gesture criterion according to a preset dynamic judgment method includes:

extracting first key node sequence data and second key node sequence data from the node sequence data;

calculating the position relation data between the target object and a gesture comparison reference in each sequence video frame image according to the first key node sequence data and the second key node sequence data;

determining the change rate of the position relation data according to the time relation among the video frame images of each sequence;

and comparing the change rate of the position relation data with a preset dynamic threshold value, and determining whether the target object conforming to the first suspected abnormal gesture standard conforms to the second suspected abnormal gesture standard.

In some embodiments, the calculating, according to the first key node sequence data and the second key node sequence data, positional relationship data between the target object and a gesture comparison reference in each of the sequence video frame images includes:

Determining a first key node sequence position mean value according to position coordinate data of each node in the first key node sequence data;

determining a second key node sequence position mean value according to the position coordinate data of each node in the second key node sequence data;

establishing a connection line between the first key node sequence position average value and the second key node sequence position average value;

and taking an included angle formed between the connecting line and the gesture comparison reference as position relation data between the connecting line and the gesture comparison reference.

In some embodiments, the determining, according to the node sequence data, whether the target object meets a third abnormal gesture standard according to a preset static judgment method, for the target object meeting the second suspected abnormal gesture standard includes:

acquiring a height average value in third key node sequence data in the node sequence data;

and determining whether the target object accords with the third abnormal gesture standard according to whether the height average value is smaller than or equal to a preset height reference value.

In some embodiments, the determining whether the target object meets the third abnormal gesture criterion according to whether the height average value is less than or equal to a preset height reference value includes:

Selecting the frame images with the height average value smaller than or equal to the height reference value in the third key node sequence data in each sequence video frame image;

judging whether the frame number of the frame image with the height average value smaller than or equal to the height reference value is larger than or equal to a preset reference frame number, if so, determining that the target object accords with the third abnormal gesture standard.

The application also provides an abnormal gesture detection device based on the video frame, which comprises:

the first determining unit is used for determining whether the target object accords with a first suspected abnormal posture standard or not according to a posture detection frame constructed for the target object in the sequence video frame image by a preset preliminary judging method;

the acquisition unit is used for acquiring node sequence data of the target object meeting the first suspected abnormal posture standard in a three-dimensional space from the sequence video frame images;

the second determining unit is used for determining whether the target object conforming to the first suspected abnormal gesture standard conforms to a second suspected abnormal gesture standard or not according to the node sequence data by a preset dynamic judging method;

the third determining unit is used for determining whether the target object accords with a third abnormal gesture standard according to the node sequence data and a preset static judging method aiming at the target object which accords with the second suspected abnormal gesture standard; if yes, judging that the abnormal gesture of the target object occurs.

The application also provides a human body abnormal posture detection method, which comprises the following steps:

determining whether the target human body accords with a first suspected abnormal posture standard or not according to a posture detection frame constructed for the target human body in a target human body sequence video frame image by a preset preliminary judging method;

acquiring node sequence data of a target human body in a three-dimensional space, wherein the node sequence data accords with the first suspected abnormal posture standard, from a target human body sequence video frame image;

determining whether the target human body conforming to the first suspected abnormal posture standard conforms to the second suspected abnormal posture standard or not according to the node sequence data by a preset dynamic judging method;

aiming at a target human body conforming to the second suspected abnormal posture standard, determining whether the target human body conforms to a third abnormal posture standard or not according to the node sequence data by a preset static judging method; if yes, judging that the abnormal posture of the target human body occurs.

The application also provides a human body abnormal gesture detection device, including:

the first determining unit is used for determining whether the target human body accords with a first suspected abnormal posture standard or not according to a posture detection frame constructed for the target human body in a target human body sequence video frame image by a preset preliminary judging method;

The acquisition unit is used for acquiring node sequence data of a target human body in a three-dimensional space, wherein the node sequence data accords with the first suspected abnormal posture standard, from a target human body sequence video frame image;

the second determining unit is used for determining whether the target human body conforming to the first suspected abnormal posture standard conforms to a second suspected abnormal posture standard or not according to the node sequence data by a preset dynamic judging method;

the third determining unit is used for determining whether the target human body accords with a third abnormal posture standard according to the node sequence data and a preset static judging method aiming at the target human body which accords with the second suspected abnormal posture standard; if yes, judging that the abnormal posture of the target human body occurs.

The application also provides a computer storage medium for storing network platform generated data and a program for processing the network platform generated data;

the program, when read and executed by the processor, performs the abnormal posture detection method based on the video frame as described above, or performs the human body abnormal posture detection method as described above.

The application also provides an electronic device comprising:

a processor;

And a memory for storing a program for processing the network platform generation data, which when read and executed by the processor, performs the above-described abnormal posture detection method based on video frames or performs the above-described human body abnormal posture detection method.

Compared with the prior art, the application has the following advantages:

according to the abnormal gesture detection method based on the video frame, on one hand, the process of generating the abnormal gesture of the target object is split into a plurality of stages for filtering in sequence, namely: and judging the posture change of the target object through three sections, namely preliminary judgment, dynamic judgment and static judgment, and screening whether the target object accords with the corresponding suspected abnormal posture standard or not by different judging methods, so that the accuracy of abnormal posture detection is improved, and the accuracy is improved. On the other hand, because the identification result of each stage depends on the identification result of the last stage, the relevant basic data of the previous stage can be multiplexed in the subsequent detection stage without re-acquiring the basic data, so that the waste of calculation resources can be avoided. On the other hand, in this embodiment, the dynamic judgment and the static judgment are performed by using the node sequence data in the target object, so that the problem that noise interference of a certain type such as the background, the distance, the size, the orientation and the like of the scene, the environment and the like where the target object is located is difficult to overcome by using a single judgment mode can be overcome, and the effect of complementary advantages can be achieved.

Drawings

FIG. 1 is a flowchart of an embodiment of a method for detecting abnormal gestures based on video frames provided in the present application;

FIG. 2 is a schematic diagram of 3D pose estimation in an embodiment of a video frame based anomaly pose detection method provided herein;

FIG. 3 is a node information schematic diagram of the 3D pose estimation of FIG. 2;

fig. 4 is a schematic diagram of a first suspected abnormal gesture standard judgment in an embodiment of a video frame-based abnormal gesture detection method provided in the present application;

fig. 5 is a schematic diagram of second suspected abnormal gesture standard judgment in an embodiment of a video frame-based abnormal gesture detection method provided in the present application;

fig. 6 is a schematic diagram of third suspected abnormal gesture standard judgment in an embodiment of a video frame-based abnormal gesture detection method provided in the present application;

fig. 7 is a schematic structural diagram of an embodiment of an abnormal gesture detection apparatus based on video frames provided in the present application;

FIG. 8 is a flowchart of an embodiment of a method for detecting abnormal human body gestures provided in the present application;

fig. 9 is a schematic structural view of an embodiment of a human body abnormal posture detection device provided in the present application;

fig. 10 is a schematic structural diagram of an embodiment of an electronic device provided in the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is, however, susceptible of embodiment in many other ways than those herein described and similar generalizations can be made by those skilled in the art without departing from the spirit of the application and the application is therefore not limited to the specific embodiments disclosed below.

The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. The manner of description used in this application and in the appended claims is for example: "a", "a" and "a" etc. are not limited in number or order, but are used to frame the same type of information with each other.

Based on the background technology, the invention concept of the abnormal gesture detection method based on the video frame is derived from crowd security detection in public environment. When the prior art solves the problems, corresponding defects exist from the wearing equipment to the depth camera to the deep learning. For example: the sensor moving along with the human body is required to be arranged in the wearing equipment so as to realize the detection of the human body when the human body falls down. Depth cameras typically use binocular or monocular cameras for image acquisition and fall prediction through deep learning. The human body tumbling action detection method also comprises the step of detecting the human body tumbling action through infrared equipment, electromagnetic wave reflection and other modes.

The above-mentioned based on deep camera etc. need collect a large amount of positive and negative sample data and train the model in the in-process that adopts the deep learning to predict to realize the discernment to the unusual behavior such as falling down, and fall down the unusual behavior, the collection of positive sample data has certain limitation, therefore, all has unavoidable influence to its performance and prediction accuracy to the model, still has the domain adaptation problem moreover, namely: when the scene difference between the human body fall detection scene and the model training set is large, the model performance is further influenced, so that the output prediction result is also greatly deviated. When detecting human body falls based on modes such as wearing equipment, infrared equipment and electromagnetic wave reflection, related equipment is necessarily required to be arranged under a detection scene, and the scheme can lead to higher cost and then has certain limitation on the detection environment.

Based on the prior art of human body fall detection, how to accurately and low-cost detect human body fall can be a source of the inventive concept under the condition of not being influenced by factors such as scene, distance, size, orientation and the like. Accordingly, the same needs exist for any scene in which abnormal gesture detection of an article is required, for example: the detection of the posture of the parked bicycle can provide reference for the maintenance and the throwing of the following bicycle; for the detection of abnormal gestures of unmanned aerial vehicle aircrafts and the like, support can be provided for the realization of subsequent functions, for example: unmanned aerial vehicle delivery service to be convenient for immediately carry out discrimination, follow-up and solution etc. of logistics abnormity. Therefore, the abnormal gesture detection method based on the video frame is not limited to detection of abnormal gestures of a human body, and is applicable to any scene requiring abnormal gesture detection. In order to facilitate understanding of the technical scheme, in the specific description of the embodiment of the application, the description is specifically made with reference to a typical abnormal posture monitoring scene of a human body falling. A detailed description will be given below of an abnormal gesture detection method based on a video frame provided in the present application.

As shown in fig. 1, fig. 1 is a flowchart of an embodiment of a method for detecting an abnormal gesture based on a video frame provided in the present application. The method embodiment can comprise the following steps:

step S101: determining whether the target object accords with a first suspected abnormal posture standard or not according to a posture detection frame constructed for the target object in a sequence video frame image by a preset preliminary judging method;

step S102: acquiring node sequence data of a target object meeting the first suspected abnormal posture standard in a three-dimensional space from the sequence video frame image;

step S103: determining whether the target object conforming to the first suspected abnormal posture standard conforms to the second suspected abnormal posture standard or not according to the node sequence data by a preset dynamic judging method;

step S104: determining whether the target object accords with a third abnormal posture standard according to the node sequence data by a preset static judging method aiming at the target object which accords with the second suspected abnormal posture standard; if yes, judging that the abnormal gesture of the target object occurs.

In general, in the embodiment of the abnormal gesture detection method based on the video frame, whether the abnormal gesture exists in the target object is judged one by one mainly through a three-section judging method, so that whether the abnormal gesture exists in the target object is determined in a layer-by-layer progressive mode, the accuracy and the precision of detection are improved, and the generation of erroneous judgment is avoided. The above steps S101 to S104 are described in detail in order.

Regarding step S101: and determining whether the target object accords with a first suspected abnormal posture standard or not according to a posture detection frame constructed for the target object in the sequence video frame images by a preset preliminary judgment method.

The purpose of step S101 is to determine whether the target object in the sequential video frame images meets the first suspected abnormal gesture criteria.

In this embodiment, the sequential video frame images may be acquired by acquisition devices arranged in a public scene or a private scene, and of course, the arrangement of the acquisition devices may be determined according to requirements. The sequence of video frame images may be frame images in video information derived from a plurality of acquisition devices, i.e. the target object is tracked and acquired by the acquisition devices. The sequence of video frame images includes a plurality of frame images, and the frame images are all in a time sequential relationship.

The target object may be an object requiring gesture detection, and may be a human body, an aircraft, a bicycle, or the like.

The first suspected abnormal gesture standard may be understood as a representation of a suspected abnormal condition occurring in a frame selection range of the target object in the frame image.

The preset preliminary judgment method can be understood as a judgment mode of whether the target object accords with the first suspected abnormal gesture standard, and the preliminary judgment is realized based on the gesture detection frame, in other words, the preliminary judgment is firstly carried out from the frame selection range of the target object in the frame image.

As shown in fig. 2 and 3, and fig. 4, the specific implementation procedure of step S101 may include:

step S101-1: tracking the track of the target object according to the acquired video information, and acquiring a video frame image of the target object tracking sequence;

step S101-2: defining the gesture detection frame of the target object in the frame image according to the target object tracking sequence video frame image;

step S101-3: and judging whether the target object accords with the first suspected abnormal gesture standard or not by a preset preliminary judgment method according to the gesture detection frame and the reference gesture detection frame.

The video information in step S101-1 may be a video acquired by an acquisition device, and the target object in the video is tracked by a target tracking algorithm. The target tracking algorithm may include: a generator algorithm and a discriminant algorithm.

The generating algorithm is mainly used for modeling a given target area in an initial frame, and searching the most similar part of the model in a subsequent frame to obtain a predicted target position.

The discriminant algorithm regards the target tracking problem as a target detection task in each frame, trains a classifier by using the image features of the tracked target, takes a target area in the image as a positive sample, takes a background area as a negative sample, and searches an optimal solution in a subsequent frame by using the trained classifier.

The discriminant algorithm described above may be used in this embodiment, for example: tracking-by-detection, an important task in computer vision, is aimed at tracking and identifying target objects in a video sequence over time, by a tracking strategy of tracking-by-detection based on target detection. Of course, target object tracking can also be realized by SORT (Simple Online and Realtime Tracking: a simple, online and real-time target tracking algorithm), DEEPSORT (Deep Learning-based SORT: a multi-target tracking algorithm based on Deep Learning), JDE (Jointly learns the Detector and Embedding model: joint Learning detector and embedded model), fairMOT (multi-target tracking), and the like.

When the target object is a human body, the ReID characteristic can be extracted through pedestrian re-identification (ReID), and according to the similarity of the ReID characteristic and the position information of a human body frame, the detection frames of the target human body belonging to one identity in different video frames are associated, and a complete motion track, namely a sequence video frame image, is formed. Since pedestrian re-recognition belongs to the prior art, no excessive description is made here.

The purpose of the step S101-2 is to construct the gesture detection frame. It will be appreciated that the sequence of video frame images includes sequence of frame images representing different poses of the target object in terms of a temporal or environmental dimension (also referred to as a background dimension). The target object in each frame of image has a difference, and of course, the difference may be large or small, so that whether the target object in the frame of image has a change or not can be intuitively obtained, and the method can be realized by defining an attitude detection frame and setting a reference attitude detection frame.

As shown in fig. 3, in this embodiment, the gesture detection frame is taken as an example of a rectangular frame, that is, the gesture detection frame defining the ith target object to correspond to the t frame is:

wherein x and y represent the coordinates of the upper left corner of the gesture detection frame; w, h represents the width and height of the gesture detection frame; p represents the confidence of the gesture detection frame.

The specific implementation process of the step S101-3 may include:

step S101-31: determining an aspect ratio of the gesture detection frame and a reference aspect ratio of the reference gesture detection frame; specifically, the gesture detection frame in each frame image in the sequence of video frame images is traversed, and the aspect ratio of the gesture detection frame is calculated, namely: r=w/h. The reference aspect ratio of the reference posture detection frame may be an aspect ratio of the reference posture detection frame defined based on the normal posture, and of course, may refer to an aspect ratio of the reference posture detection frame defined directly in the normal posture, or may be an aspect ratio of the reference posture detection frame defined based on the normal posture and the redundancy value. It is of course also possible to set the reference aspect ratio based on empirical values, which can be seen as a range of data with its end point as a reference threshold. Under the conditions of different monitoring purposes and monitoring objects, the reference gesture detection frame has different specific meanings and specific numerical values; for a specific application scene of human body falling, the aspect ratio is a smaller value under normal conditions, and the aspect ratio is a larger value when falling, so that a specific value can be used as a threshold value, and if the specific value is larger than the threshold value, the abnormal gesture of falling is determined to be possible.

Step S101-32: comparing the reference aspect ratio with the aspect ratio of the gesture detection frame to judge whether the target object meets the first suspected abnormal gesture standard; specifically, whether the aspect ratio of the gesture detection frame in the sequence video frame image is larger than the reference aspect ratio of the reference gesture detection frame or not is determined, if yes, the target object is determined to be in accordance with the first suspected abnormal gesture standard; if not, the target object does not meet the first suspected abnormal posture standard, a prompt may be output, or the process may return to the step S101-1 or the step S101-2.

In order to improve the accuracy of the preliminary judgment method, the accuracy of subsequent detection is ensured. The step S101-32 may specifically further include:

steps S101-321: counting the number of frames of the sequence video frame images of which the aspect ratio of the gesture detection frame is larger than the reference aspect ratio; specifically, it may beWherein I (. Cndot.) isThe function is indicated, taking a value of 1 when the internal condition of the bracket is true, and a value of 0 otherwise, where thres is a preset reference aspect ratio or a reference threshold, e.g. 5.

Step S101-32: and when the frame number is greater than or equal to an abnormal frame number reference value, determining the target object to accord with the first suspected abnormal posture standard. For example: the sequence video frame image is a sliding window comprising 8 frames, the abnormal frame number reference value is 4 frames, the counted frame number of the sequence video frame image with the aspect ratio larger than the reference aspect ratio is 5 frames, and then the target object can be determined to accord with the first suspected abnormal gesture standard. The above is merely an example, and the reference value of the abnormal frame number, the reference aspect ratio, etc. may be set and adjusted according to the actual detection requirement.

The method for counting the frame number is a relatively simple judging method considering that when the video frame is adopted for judging, a series of specific frame images are adopted for judging, and corresponding judging results are provided for each frame image, so that a complete conclusion is obtained, and each frame image needs to be comprehensively considered. In addition, it is also possible to consider that the frame images at different time points have different meanings, for example, in the frame images arranged in time sequence, a weight coefficient is given to each frame image at a time point, and a frame image later in time point is given a higher weight, and all the frame image judgment results are weighted and compared with a preset threshold value; for example, in a sequence of 32 images, the weight of the first 8 images is set to 0.8, the second group 8 images is set to 0.9, the third group 8 images is set to 1, the weight of the last 8 images is set to 1.2, the threshold is set to 16, the judgment result of each frame is abnormal, the judgment result of abnormality is 1, the judgment result of each frame is normal, the judgment result of each frame is 0, the judgment result of each frame is multiplied by the coefficient, and if the whole is larger than 16, the target object is judged to be abnormal.

Based on the above, in the embodiment, preliminary judgment on the suspected abnormal gesture of the target object is realized in the first stage, and a basis is provided for subsequent continuous screening.

Regarding step S102: and acquiring node sequence data of the target object meeting the first suspected abnormal posture standard in a three-dimensional space from the sequence video frame images.

The node sequence data in step S102 may be understood as node data for observing that the posture of the target object changes, where the sequence video frame image includes a plurality of frame images, and the node sequence data includes node data in each frame image.

The purpose of step S102 is to perform 3D pose estimation on the target object to obtain position information of the target object in the three-dimensional space. As shown in fig. 4, the specific implementation procedure of step S102 may include:

step S102-1: according to the 3D gesture estimation of the target object which accords with the first suspected abnormal gesture standard in the sequence video frame image, node coordinate data and confidence coefficient of the target object in a three-dimensional space are obtained;

step S102-2: and determining the node coordinate data and the confidence level thereof as the node sequence data.

The 3D pose estimation in step S102-1 may use SimpleBaseline algorithm, i.e. the keypoint detection framework, or use videoPose algorithm, i.e.: the input video is converted into the relative three-dimensional positions of each key point of the human body relative to the root joint, etc. As shown in fig. 2, fig. 2 is a schematic diagram of 3D pose estimation in an embodiment of a video frame-based abnormal pose detection method provided in the present application. The target object node information can be output through a 3D pose estimation algorithm. The 3D pose estimation algorithm can output information of a bone joint point of a whole body of a human body by taking a target object as a human body, and can be expressed in a sequence manner, in this embodiment, a right-hand coordinate system is taken as an example for illustration, 18 joint elements can be included in this embodiment, each joint element includes joint point information (x, y, z, conf), where x, y, z are coordinates of the bone joint point in a three-dimensional space respectively, and conf is a detection confidence coefficient of the bone joint point, and a value of the conf is between 0 and 1. As shown in fig. 3, fig. 3 is a schematic diagram of node information of the 3D pose estimation in fig. 2. The 18 joint points are Nose, neck, right shoulder, RShoulder, right elbow, relbew, right wrist RWrist, left shoulder LShoulder, left elbow lelbew, left wrist LWrist, right hip RHip, right knee RKnee, right ankle LAnkle, left hip LHip, left knee LKnee, left ankle LAnkle, right eye REye, left eye LEye, right ear REar, left ear Lear, respectively, in order from front to back; therefore, multiplexing base is provided for subsequent dynamic judgment and static judgment, and the waste of computing resources is avoided. Each frame image comprises 18 node information, and the sequence video is a node sequence data.

Regarding step S103: and determining whether the target object conforming to the first suspected abnormal posture standard conforms to the second suspected abnormal posture standard or not according to the node sequence data by a preset dynamic judging method.

The purpose of step S103 is to determine, by a dynamic determination method, whether the target object meets a second suspected abnormal gesture criterion.

The dynamic judgment method can be understood as judgment based on the data related to the dynamic motion of the target object.

As shown in fig. 5, the specific implementation procedure of step S103 may include:

step S103-1: extracting first key node sequence data and second key node sequence data from the node sequence data;

step S103-2: calculating the position relation between the target object and a gesture comparison reference in each sequence video frame image according to the first key node sequence data and the second key node sequence data;

step S103-3: determining the change rate of the position relation data according to the time relation among the video frame images of each sequence;

step S103-4: and comparing the change rate of the position relation data with a preset dynamic threshold value, and determining whether the target object conforming to the first suspected abnormal gesture standard conforms to the second suspected abnormal gesture standard.

The first key node sequence data and the second key node sequence data are extracted from the required medium node sequence data in the step S103-1. As can be seen from the foregoing, the nodes in this embodiment are exemplified by human joint nodes, and the sequence data of the nodes may include 18 nodes such as Nose node, neck node, right shoulder RShoulder, right elbow relbew, right wrist RWrist, left shoulder LShoulder, left elbow lelbew, left wrist LWrist, right hip RHip, right knee RKnee, right ankle LAnkle, left hip LHip, left knee LKnee, left ankle LAnkle, right eye REye, left eye LEye, right ear return, left ear and the like. Extracting the left shoulder, the right shoulder, the neck and the head as first key nodes, and acquiring corresponding node sequence data; and extracting the left foot and the right foot as second key nodes, and acquiring corresponding node sequence data.

Based on the extracted first key node sequence data and the second key node sequence data, the specific implementation process of step S103-2 may include:

step S103-21: determining a first key node sequence position mean value according to position coordinate data of each node in the first key node sequence data; namely: calculating the position mean value P of the left shoulder, the right shoulder, the neck and the head nodes _a ＝(x _a ,y _a ,z _a )。

Step S103-22: determining a second key node sequence position mean value according to the position coordinate data of each node in the second key node sequence data; namely: calculating the position average value P of the left foot and the right foot nodes _b ＝(x _b ,y _b ,z _b )。

Step S103-23: establishing a connection line between the first key node sequence position average value and the second key node sequence position average value; namely: for the human body posture, the first key node may be used as a human head representative node, and the second key node may be used as a human foot representative node. Connecting the first key node position mean value P _a ＝(x _a ,y _a ,z _a ) And the second key node position mean value P _b ＝(x _b ,y _b ,z _b ) And forming a connecting line.

Step S103-24: forming between the wiring and the posture comparison referenceAnd the included angle is used as position relation data between the attitude comparison standard and the attitude comparison standard. Namely: taking the direction of the z axis as a datum reference line, and forming an included angle between the connecting line and the z axisWherein the line between the first key node position mean and the second key node position mean may pass through the vector +.>The base reference line may be represented by a vector (0, 1), i.e., a Z-axis coordinate. Vector-> <·,·>The operator represents the angle between the two vectors. And carrying out the calculation on each frame image in the sequence video to obtain the average change rate of the included angle theta. Namely: definition of { Θ ] ^t T=1, …, T is the set of angles between the key node connection and the z-axis of a T frame within the sequence video sliding window. The average change rate Δ= { max ({ Θ) ^t })-min({Θ ^t -determining the average rate of change delta as data in said dynamic determination method.

In step S103-3, a rate of change of the positional relationship data is determined according to a temporal relationship between the respective sequential video frame images, for example: and determining according to the empirical value of the change rate of the angle between the first key node position average value and the second key node position average value and the z-axis when the human body enters the unbalanced gesture from the normal gesture.

In step S103-4, when the average change rate Δ is greater than or equal to the preset dynamic threshold, it may be determined that the target object that meets the first suspected abnormal gesture criterion also meets the second suspected abnormal gesture criterion. For example: and when the average change rate of the sequence video frame images is 8 frames and is greater than or equal to 7.5 (the unit can be, for example, degree/each adjacent frame) in the video time sliding window of the 8 frames, the target object can be determined to accord with the second suspected abnormal gesture standard.

Based on the detection of the target object from the first suspected abnormal gesture standard to the detection of the second suspected abnormal gesture standard, the change state of the gesture of the target object in the sequence video frame image can be obtained, and the initial judgment to the dynamic judgment further determines whether the target object meets the second suspected abnormal gesture standard or not under the condition that the target object meets the first suspected abnormal gesture standard, and then performs static judgment.

Regarding the step S104: determining whether the target object accords with a third abnormal posture standard according to the node sequence data by a preset static judging method aiming at the target object which accords with the second suspected abnormal posture standard; if yes, judging that the abnormal gesture of the target object occurs.

The purpose of the step S104 is to determine, by a static determination method, whether the target object that meets the second suspected abnormal gesture criterion meets a third abnormal gesture criterion.

As shown in fig. 6, the specific implementation procedure of the step S104 may include:

step S104-1: acquiring a height average value in third key node sequence data in the node sequence data; namely: extracting third key node sequence data from the node sequence data, and along with the human joint node in the above embodiment, the third key node may include: average coordinate of 5 joint node coordinates z of left shoulder, right shoulder, spine, left hip, right hip and the likeIn this embodiment, the right hand coordinates are taken as an example, and therefore, the z-axis direction is the height direction, and the average value +.>/>

Step S104-2: determining that the target object meets the third abnormal gesture standard according to whether the height average value is smaller than or equal to a preset height reference value; if not, a prompt message indicating that the posture is normal may be output, or the process may return to the step S101, the step S102, or the step S103 to be re-executed. The height reference value can be adjusted in real time, and the setting and adjusting modes can refer to the height average value of 5 joint nodes in the three-dimensional space under the normal posture of the human body, and can be an empirical value for judging the abnormal posture.

In order to improve the accuracy of the target object posture determination, the step S104-2 may further include:

step S104-21: selecting the frame images with the height average value smaller than or equal to the height reference value in the third key node sequence data in each sequence video frame image;

step S104-22: judging whether the frame number of the frame image with the height average value smaller than or equal to the height reference value is larger than or equal to a preset reference frame number, if so, determining that the target object accords with the third abnormal gesture standard; namely:wherein thres is the height reference value and may also be referred to as the height reference threshold. Counting the number of frames N meeting the height average value less than or equal to the height reference value in the sequence video frame images, and determining whether the number of frames is greater than or equal to the reference frame number, for example: in a frame image sliding window with 8 frames of sequence video, if the height average value of 4 frames is smaller than or equal to the height reference value, the target object can be determined to accord with a third abnormal gesture standard, so that the abnormal gesture of the target object can be determined.

Similar to the previous preliminary judgment method, a method of giving weights to the frame images according to time sequence may be considered, specifically, the more the weight of the judgment result of the frame image located behind the time node is higher, so that the judgment result more conforming to the final actual situation can be obtained.

The above is a specific description of an embodiment of a method for detecting abnormal gesture based on video frames, which is provided in the present application, and on one hand, the embodiment performs sequential filtering by splitting the process of generating abnormal gesture of a target object into multiple stages. Namely: and judging the posture change of the target object through three sections, namely preliminary judgment, dynamic judgment and static judgment, and screening whether the target object accords with the corresponding suspected abnormal posture standard or not by different judging methods, so that the accuracy of abnormal posture detection is improved, and the accuracy is improved. On the other hand, since the detection of each stage depends on the result of the last stage, the related data before the subsequent detection stage can be multiplexed without re-acquisition, so that the waste of computing resources can be avoided. On the other hand, in this embodiment, the dynamic judgment and the static judgment are performed by using the node sequence data in the target object, so that the excessively sensitive interference of the related noise such as the background, the distance, the size, the orientation and the like of the scene, the environment and the like where the target object is located on a certain judgment mode can be partially overcome.

The foregoing is a specific description of an embodiment of a method for detecting an abnormal gesture based on a video frame, which corresponds to the foregoing embodiment of the method for detecting an abnormal gesture based on a video frame, and the application further discloses an embodiment of an apparatus for detecting an abnormal gesture based on a video frame, please refer to fig. 5, and since the apparatus embodiment is substantially similar to the method embodiment, the description is relatively simple, and the relevant points refer to a part of the description of the method embodiment. The device embodiments described below are merely illustrative.

As shown in fig. 7, fig. 7 is a schematic structural diagram of an embodiment of an abnormal gesture detection apparatus based on a video frame provided in the present application, where the embodiment of the apparatus may include:

a first determining unit 701, configured to determine, according to a gesture detection frame constructed for a target object in a sequential video frame image, whether the target object meets a first suspected abnormal gesture criterion by using a preset preliminary determination method;

an obtaining unit 702, configured to obtain node sequence data of a target object in a three-dimensional space, where the node sequence data meets the first suspected abnormal gesture standard, in the sequence video frame image;

a second determining unit 703, configured to determine, according to the node sequence data, whether the target object meeting the first suspected abnormal gesture standard meets a second suspected abnormal gesture standard according to a preset dynamic determination method;

a third determining unit 704, configured to determine, according to the node sequence data, whether the target object meets a third abnormal gesture standard according to a preset static determination method, for the target object that meets the second suspected abnormal gesture standard; if yes, judging that the abnormal gesture of the target object occurs.

In this embodiment, the first determining unit 701 includes: the method comprises the steps of obtaining a subunit, defining the subunit and judging the subunit;

The acquisition subunit is used for tracking the track of the target object according to the acquired video information and acquiring the video frame image of the target object tracking sequence;

the defining subunit is used for defining the gesture detection frame of the target object in the frame image according to the target object tracking sequence video frame image;

the judging subunit is configured to judge whether the target object meets the first suspected abnormal gesture standard according to the gesture detection frame and the reference gesture detection frame by using a preset preliminary judging method.

The judging subunit may include: a ratio determining subunit, a comparing and judging subunit;

the first determining subunit is configured to determine an aspect ratio of the gesture detection frame and a reference aspect ratio of the reference gesture detection frame;

the comparison judging subunit is configured to compare the reference aspect ratio with the aspect ratio of the gesture detection frame, and judge whether the target object meets the first suspected abnormal gesture standard.

The first comparison and judgment subunit may specifically include: a statistics subunit for counting the number of frames of the sequential video frame images for which the aspect ratio of the gesture detection frame is greater than the reference aspect ratio; the first comparison and judgment subunit is specifically configured to determine that the target object meets the first suspected abnormal gesture standard when the frame number is greater than or equal to an abnormal frame number reference value.

The acquisition unit 702 may include: a first acquisition subunit and a second determination subunit; the first obtaining subunit is configured to obtain node coordinate data and a confidence coefficient thereof of a target object in a three-dimensional space according to 3D pose estimation of the target object in the sequence video frame image, where the target object meets the first suspected abnormal pose standard; the second determining subunit is configured to determine the node sequence data according to the node coordinate data and the confidence level thereof.

The second determining subunit 703 may specifically include: the device comprises an extraction subunit, a calculation subunit, a change rate determination subunit and a comparison subunit;

the extraction subunit is used for extracting first key node sequence data and second key node sequence data from the node sequence data;

the calculating subunit is used for calculating the position relation data between the target object and the gesture comparison reference in each sequence video frame image according to the first key node sequence data and the second key node sequence data;

the change rate determining subunit is used for determining the change rate of the position relation data according to the time relation among the video frame images of each sequence;

The comparing subunit is configured to compare the rate of change of the positional relationship data with a preset dynamic threshold to determine whether the target object that meets the first suspected abnormal gesture standard meets the second suspected abnormal gesture standard.

The computing subunit may include: the first average value determining subunit, the second average value determining subunit and the establishing subunit;

the first average value determining subunit is configured to determine a first key node sequence position average value according to position coordinate data of each node in the first key node sequence data;

the second average value determining subunit is configured to determine a second key node sequence position average value according to position coordinate data of each node in the second key node sequence data;

the establishing subunit is configured to establish a connection between the first key node sequence position average value and the second key node sequence position average value;

the calculating subunit is specifically configured to use an included angle formed between the connecting line and the gesture comparison reference as positional relationship data between the connecting line and the gesture comparison reference.

The third determining unit 704 may specifically include: the method comprises the steps of obtaining a subunit and determining the subunit;

The obtaining subunit is configured to obtain a height average value in third key node sequence data in the node sequence data;

the determining subunit is configured to determine whether the target object meets the third abnormal gesture standard according to whether the height average value is smaller than or equal to a preset height reference value.

The determining subunit may include: selecting a subunit and a judging subunit; the selecting subunit is configured to select the frame images in the third key node sequence data in each of the sequence video frame images, where the height average value is less than or equal to the height reference value; the judging subunit is configured to judge whether the frame number of the frame image with the height average value smaller than or equal to the height reference value is greater than or equal to a preset reference frame number, and if yes, determine that the target object meets the third abnormal gesture standard.

The foregoing is a description of an embodiment of a video frame based abnormal gesture detection apparatus provided in the present application, and reference may be made to the contents of steps S101 to S104 in the foregoing method embodiment for specific contents of the apparatus embodiment, which are not described in detail herein.

Based on the foregoing, the present application further provides a method for detecting an abnormal posture of a human body, as shown in fig. 8, fig. 8 is a flowchart of an embodiment of a method for detecting an abnormal posture of a human body, which may include:

step S801: determining whether the target human body accords with a first suspected abnormal posture standard or not according to a posture detection frame constructed for the target human body in a target human body sequence video frame image by a preset preliminary judging method;

step S802: acquiring node sequence data of a target human body in a three-dimensional space, wherein the node sequence data accords with the first suspected abnormal posture standard, from a target human body sequence video frame image;

step S803: determining whether the target human body conforming to the first suspected abnormal posture standard conforms to the second suspected abnormal posture standard or not according to the node sequence data by a preset dynamic judging method;

step S804: aiming at a target human body conforming to the second suspected abnormal posture standard, determining whether the target human body conforms to a third abnormal posture standard or not according to the node sequence data by a preset static judging method; if yes, judging that the abnormal posture of the target human body occurs.

For the specific contents of the steps S801 to S804, reference may be made to the contents of the steps S101 to S104 described above, and detailed description thereof will be omitted.

Correspondingly, the application further provides a human body abnormal posture detection device, as shown in fig. 9, fig. 9 is a schematic structural diagram of an embodiment of a human body abnormal posture detection device provided by the application, and the embodiment of the device may include:

a first determining unit 901, configured to determine, according to a gesture detection frame constructed for a target human body in a target human body sequence video frame image, whether the target human body meets a first suspected abnormal gesture standard by using a preset preliminary judgment method;

an obtaining unit 902, configured to obtain, in a target human body sequence video frame image, node sequence data of a target human body in a three-dimensional space, where the node sequence data meets the first suspected abnormal posture standard;

a second determining unit 903, configured to determine, according to the node sequence data, whether the target human body that meets the first suspected abnormal posture standard meets a second suspected abnormal posture standard according to a preset dynamic determination method;

a third determining unit 903, configured to determine, according to the node sequence data, whether the target human body meets a third abnormal posture standard according to a preset static determination method, for the target human body that meets the second suspected abnormal posture standard; if yes, judging that the abnormal posture of the target human body occurs.

For the specific content of the embodiment of the body abnormal posture detection apparatus, reference may be made to the above-mentioned embodiment of the body abnormal posture detection method and the content of the embodiment of the abnormal posture detection method based on video frames, and details thereof will not be described herein.

Based on the above, the present application further provides a computer storage medium, configured to store network platform generated data, and a program for processing the network platform generated data;

the program, when read by a processor for execution, performs the steps as in the above-described video frame-based abnormal posture detection method embodiment, or performs the steps as in the above-described human body abnormal posture detection method embodiment.

Based on the foregoing, the present application further provides an electronic device, as shown in fig. 10, and fig. 10 is a schematic structural diagram of an embodiment of an electronic device provided in the present application, where the embodiment of the electronic device includes:

a processor 1001;

a memory 1002 for storing a program for processing network platform production data, which when read by the processor performs the steps as in the above-described video frame-based abnormal gesture detection method embodiment, or performs the steps as in the above-described human body abnormal gesture detection method embodiment.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and ground frame, and provide corresponding operation entries for the user to select authorization or rejection.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

1. Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer readable media, as defined herein, does not include non-transitory computer readable media (transmission media), such as modulated data signals and carrier waves.

2. It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

While the preferred embodiment has been described, it is not intended to limit the invention thereto, and any person skilled in the art may make variations and modifications without departing from the spirit and scope of the present invention, so that the scope of the present invention shall be defined by the claims of the present application.

Claims

1. An abnormal gesture detection method based on a video frame is characterized by comprising the following steps:

2. The method for detecting abnormal gesture based on video frame according to claim 1, wherein the determining whether the target object meets the first suspected abnormal gesture standard according to the gesture detection frame constructed for the target object in the sequential video frame image by a preset preliminary determination method comprises:

3. The method for detecting abnormal gesture based on video frame according to claim 2, wherein the determining whether the target object meets the first suspected abnormal gesture standard according to the gesture detection frame and the reference gesture detection frame by a preset preliminary determination method comprises:

4. The method for detecting abnormal gesture based on video frame according to claim 3, wherein the determining whether the target object meets the first suspected abnormal gesture criterion comprises:

5. The method for detecting abnormal gesture based on video frame according to claim 1, wherein the acquiring node sequence data of the target object in the three-dimensional space, which meets the first suspected abnormal gesture standard, in the sequence video frame image comprises:

6. The method for detecting abnormal gesture based on video frame according to claim 1, wherein determining whether the target object meeting the first suspected abnormal gesture standard meets the second suspected abnormal gesture standard according to the node sequence data by a preset dynamic judgment method comprises:

7. The video frame-based abnormal gesture detection method of claim 6, wherein calculating positional relationship data between the target object and gesture comparison references in each of the sequential video frame images from the first key node sequence data and the second key node sequence data comprises:

8. The method for detecting abnormal gesture based on video frame according to claim 1, wherein the determining, according to the node sequence data, whether the target object meets a third abnormal gesture standard by a preset static determination method, for the target object meeting the second suspected abnormal gesture standard includes:

9. The video frame-based abnormal gesture detection method of claim 8, wherein determining whether the target object meets the third abnormal gesture criterion according to whether the height average is less than or equal to a preset height reference value comprises:

10. An abnormal gesture detection apparatus based on a video frame, comprising:

11. A human body abnormal posture detection method, characterized by comprising:

12. A human body abnormal posture detection device, characterized by comprising:

13. A computer storage medium for storing network platform generated data and a program for processing the network platform generated data;

the program, when read by a processor and executed, performs the abnormal posture detection method based on video frames as set forth in any one of the preceding claims 1 to 9, or performs the abnormal posture detection method of a human body as set forth in claim 11.

14. An electronic device, comprising:

a processor;

a memory for storing a program for processing network platform production data, which when read and executed by the processor, performs the video frame-based abnormal posture detection method according to any one of claims 1 to 9, or performs the human body abnormal posture detection method according to claim 11.