CN118116084A

CN118116084A - Behavior information detection method, apparatus and storage medium

Info

Publication number: CN118116084A
Application number: CN202410481066.0A
Authority: CN
Inventors: 冯昊; 冯雪涛
Original assignee: Zhejiang Shenxiang Intelligent Technology Co ltd
Current assignee: Zhejiang Shenxiang Intelligent Technology Co ltd
Priority date: 2024-04-22
Filing date: 2024-04-22
Publication date: 2024-05-31
Anticipated expiration: 2044-04-22
Also published as: CN118116084B

Abstract

The application provides a behavior information detection method, equipment and a storage medium, wherein the method comprises the following steps: acquiring a video frame to be processed; identifying a plurality of target human body objects with preset mutual actions in the video frame to be processed, and determining positioning information of the plurality of target human body objects; acquiring track information of the plurality of target human body objects in the video frame to be processed; judging whether preset behaviors exist among the plurality of target human body objects according to the track information and the positioning information; and if the preset behaviors exist among the plurality of target human body objects, sending out prompt information. The method and the device realize the automatic identification of the action behaviors of the human body object according to the video frame to be processed, and timely give out prompts when the preset dangerous behaviors exist in the video frame, so that the dangerous behaviors can be conveniently interfered and stopped in time, and the data processing efficiency and the intellectualization of the monitoring system are improved.

Description

Behavior information detection method, apparatus and storage medium

Technical Field

The present application relates to the field of information processing technologies, and in particular, to a behavior information detection method, apparatus, and storage medium.

Background

The video monitoring system is one of common security monitoring systems, directly checks the condition of the monitored place through a remote control camera and auxiliary equipment thereof, and transmits the image and sound of the monitored place to a monitoring center, so that the condition of the monitored place is clear at a glance, and abnormal conditions can be found, recorded and handled in time conveniently.

For example, taking a campus monitoring system as an example, current cresting and spoofing are risks faced by campuses, and threaten campus safety and student growth. Timely discovery and active intervention are effective prevention means, and are pain points and difficulties of schools.

In an actual scene, the data processing efficiency of the monitoring system is low, and after the campus riot action occurs, the situation of the riot action is checked by manually tracing back the field video according to the monitoring video, so that the information has serious lag, and the campus riot action is not beneficial to timely management and control.

Disclosure of Invention

The embodiment of the application mainly aims to provide a behavior information detection method, equipment and storage medium, which realize the automatic identification of the action behaviors of human body objects according to video frames to be processed, and timely give out prompts when the preset dangerous behaviors exist in the video frames, so that the dangerous behaviors can be conveniently intervened and stopped in time, and the data processing efficiency and the intellectualization of a monitoring system are improved.

In a first aspect, an embodiment of the present application provides a behavior information detection method, including: acquiring a video frame to be processed; identifying a plurality of target human body objects with preset mutual actions in the video frame to be processed, and determining positioning information of the plurality of target human body objects; acquiring track information of the plurality of target human body objects in the video frame to be processed; judging whether preset behaviors exist among the plurality of target human body objects according to the track information and the positioning information; and if the preset behaviors exist among the plurality of target human body objects, sending out prompt information.

In an embodiment, the preset interaction includes a mutual touching action of limbs among the plurality of human objects; the identifying that a plurality of target human body objects with preset mutual actions exist in the video frame to be processed, and determining positioning information of the plurality of target human body objects includes: inputting the video frame to be processed into a preset gesture detector, so that the gesture detector can identify the actions of human body objects in the video frame to be processed, and outputting the positioning information of the target human body objects with the mutual touch actions of limbs in the video frame to be processed.

In an embodiment, before the inputting the video frame to be processed into the preset gesture detector, the method further includes: acquiring a first sample image, and marking different human body objects with mutual limb touch actions as a whole in the first sample image; and training an image detector by adopting the first sample image to obtain the gesture detector, wherein the gesture detector is used for integrally positioning and outputting different target human body objects with mutual limb touch actions.

In an embodiment, the preset interaction includes a mutual chase action between a plurality of human objects; the identifying a plurality of target human body objects with preset interaction in the video frame to be processed comprises the following steps: identifying a plurality of first candidate objects with running gestures in the video frame to be processed; determining the motion speed and the motion direction of the first candidate object according to the video frame to be processed; determining a plurality of second candidate objects with running behaviors from the plurality of first candidate objects according to the running gesture, the movement speed and the movement direction; and determining the plurality of target human body objects with mutual pursuit behaviors from the plurality of second candidate objects according to the movement direction and the movement speed of the second candidate objects.

In an embodiment, the determining that there is a second candidate object of running behavior from the first candidate objects according to the running gesture, the movement speed and the movement direction includes: determining running gesture times of the plurality of first candidate objects in a first preset duration; and determining a first candidate object, of which the running gesture frequency is greater than or equal to a preset frequency and the movement speed and the movement direction meet preset conditions, as the second candidate object with running behaviors.

In an embodiment, the determining, according to the motion direction and the motion speed of the second candidate object, the plurality of target human body objects having the mutual chase behaviors from the plurality of second candidate objects includes: respectively determining a movement direction deviation and a movement speed deviation between different second candidate objects; and determining different second candidate objects, of which the movement direction deviation is smaller than a preset direction threshold value and the movement speed deviation is smaller than a preset speed threshold value, as the target human body objects.

In an embodiment, the video frame to be processed includes a plurality of sub-video frames originating from different acquisition devices; the obtaining track information of the plurality of target human body objects in the video frame to be processed includes: respectively acquiring human body characteristics and tracking tracks of a plurality of human body objects in the plurality of sub-video frames; combining tracking tracks of the same human body object from different sub-video frames according to the human body characteristics to respectively generate complete track information of the human body objects; and selecting the track information of the plurality of target human body objects from the complete track information of the plurality of human body objects.

In an embodiment, the track information includes a single person positioning frame and identity information of the target human object; the positioning information comprises an integral positioning frame of the plurality of target human body objects; the determining whether preset behaviors exist among the plurality of target human body objects according to the track information and the positioning information includes: if the single positioning frame is in the integral positioning frame, determining that the target human body object marked by the single positioning frame has the preset behavior, and determining the identity information bound by the single positioning frame as the identity information of the target human body object with the preset behavior.

In one embodiment, the method further comprises: recording identity information of a target human body object with the preset behavior, counting occurrence frequency of the human body object bound by the identity information about the preset behavior, and sending prompt information when the occurrence frequency is greater than the preset frequency.

In an embodiment, if the preset behavior exists among the plurality of target human objects, sending a prompt message includes: and if at least two target human body objects exist in the video frame to be processed, the preset behavior occurs, the duration of the preset behavior is longer than the second preset duration, and prompt information is sent out.

In an embodiment, the identifying a plurality of first candidates for a running gesture in the video frame to be processed includes: acquiring a human body image in the video frame to be processed; inputting the human body image into a preset gesture classifier, so that the gesture classifier outputs a plurality of first candidate objects with running gestures in the human body image; the gesture classifier is trained based on a preset second sample image, and a human body object with a running gesture is marked in the second sample image; the gesture classifier is used for classifying gesture information of human objects in the human body image into running gestures and non-running gestures.

In a second aspect, an embodiment of the present application provides a behavior information detection apparatus, including:

the first acquisition module is used for acquiring a video frame to be processed;

the identification module is used for identifying a plurality of target human body objects with preset mutual actions in the video frame to be processed and determining positioning information of the plurality of target human body objects;

The second acquisition module is used for acquiring track information of the plurality of target human body objects in the video frame to be processed;

the judging module is used for judging whether preset behaviors exist among the plurality of target human body objects according to the track information and the positioning information;

and the prompt module is used for sending prompt information if the preset behaviors exist among the plurality of target human body objects.

In an embodiment, the preset interaction includes a mutual touching action of limbs among the plurality of human objects; the identification module is used for inputting the video frame to be processed into a preset gesture detector so that the gesture detector can identify the actions of the human body objects in the video frame to be processed, and outputting the positioning information of the target human body objects with the mutual touch actions of limbs in the video frame to be processed.

In one embodiment, the method further comprises: the training module is used for acquiring a first sample image before the video frame to be processed is input into a preset gesture detector, and marking different human body objects with mutual limb touch actions in the first sample image as a whole; and training an image detector by adopting the first sample image to obtain the gesture detector, wherein the gesture detector is used for integrally positioning and outputting different target human body objects with mutual limb touch actions.

In an embodiment, the preset interaction includes a mutual chase action between a plurality of human objects; the identification module is used for identifying a plurality of first candidate objects with running gestures in the video frame to be processed; determining the motion speed and the motion direction of the first candidate object according to the video frame to be processed; determining a plurality of second candidate objects with running behaviors from the plurality of first candidate objects according to the running gesture, the movement speed and the movement direction; and determining the plurality of target human body objects with mutual pursuit behaviors from the plurality of second candidate objects according to the movement direction and the movement speed of the second candidate objects.

In an embodiment, the identification module is specifically configured to determine a number of running postures of the plurality of first candidate objects within a first preset duration; and determining a first candidate object, of which the running gesture frequency is greater than or equal to a preset frequency and the movement speed and the movement direction meet preset conditions, as the second candidate object with running behaviors.

In an embodiment, the identification module is specifically configured to determine a movement direction deviation and a movement speed deviation between the different second candidate objects respectively; and determining different second candidate objects, of which the movement direction deviation is smaller than a preset direction threshold value and the movement speed deviation is smaller than a preset speed threshold value, as the target human body objects.

In an embodiment, the video frame to be processed includes a plurality of sub-video frames originating from different acquisition devices; the second acquisition module is used for respectively acquiring human body characteristics and tracking tracks of a plurality of human body objects in the plurality of sub-video frames; combining tracking tracks of the same human body object from different sub-video frames according to the human body characteristics to respectively generate complete track information of the human body objects; and selecting the track information of the plurality of target human body objects from the complete track information of the plurality of human body objects.

In an embodiment, the track information includes a single person positioning frame and identity information of the target human object; the positioning information comprises an integral positioning frame of the plurality of target human body objects; and the judging module is used for determining that the target human body object marked by the single positioning frame has the preset behavior if the single positioning frame is in the integral positioning frame, and determining the identity information bound by the single positioning frame as the identity information of the target human body object with the preset behavior.

In one embodiment, the method further comprises: the analysis module is used for recording the identity information of the target human body object with the preset behaviors, counting the occurrence frequency of the human body object bound by the identity information about the preset behaviors, and sending out prompt information when the occurrence frequency is greater than the preset frequency.

In an embodiment, the prompting module is configured to send a prompting message if the preset behavior occurs in the video frame to be processed when at least two target human body objects exist in the video frame to be processed, and the duration of the preset behavior is longer than the second preset duration.

In an embodiment, the identification module is further specifically configured to obtain a human body image in the video frame to be processed; inputting the human body image into a preset gesture classifier, so that the gesture classifier outputs a plurality of first candidate objects with running gestures in the human body image; the gesture classifier is trained based on a preset second sample image, and a human body object with a running gesture is marked in the second sample image; the gesture classifier is used for classifying gesture information of human objects in the human body image into running gestures and non-running gestures.

In a third aspect, an embodiment of the present application provides an electronic device, including:

At least one processor; and

A memory communicatively coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor to cause the electronic device to perform the method of any of the above aspects.

In a fourth aspect, an embodiment of the present application provides a cloud device, including:

At least one processor; and

A memory communicatively coupled to the at least one processor;

Wherein the memory stores instructions executable by the at least one processor to cause the cloud device to perform the method of any of the above aspects.

In a fifth aspect, an embodiment of the present application provides a computer readable storage medium, where computer executable instructions are stored, and when executed by a processor, implement the method according to any one of the above aspects.

In a sixth aspect, embodiments of the present application provide a computer program product comprising a computer program which, when executed by a processor, implements the method of any of the above aspects.

According to the behavior information detection method, the behavior information detection device and the storage medium, through identification of the video frame to be processed, a plurality of target human body objects with preset interaction, such as human body objects with limb touch or chasing, are determined, positioning information of the target human body objects is determined, then action tracks of the target human body objects are obtained, whether preset behaviors with dangers, such as storm behaviors, of the target human body objects are generated or not is judged according to the action tracks and the positioning information, if the preset behaviors are determined, prompt can be sent out in time, intervention and prevention of the dangerous behaviors are facilitated in time, and data processing efficiency and intelligence of a monitoring system are improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It will be apparent to those of ordinary skill in the art that the drawings in the following description are of some embodiments of the application and that other drawings may be derived from them without inventive faculty.

Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 2 is a schematic diagram of an application scenario of a behavior information detection system according to an embodiment of the present application;

fig. 3 is a flow chart of a behavior information detection method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of gesture detection according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a behavior information detection system according to an embodiment of the present application;

Fig. 6 is a flow chart of a behavior information detection method according to an embodiment of the present application;

Fig. 7 is a schematic structural diagram of a behavior information detection device according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a cloud device according to an embodiment of the present application.

Specific embodiments of the present application have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application.

The term "and/or" is used herein to describe association of associated objects, and specifically indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.

In order to clearly describe the technical solution of the embodiments of the present application, firstly, the terms involved in the present application are explained:

YOLOv5: you Only Look Once version 5, an efficient object detection algorithm that can quickly and accurately identify and locate objects in a video or image.

ResNet: residual Network, a deep convolutional neural Network architecture.

ResNet-18: a residual network of depth 18 layers.

The behavior information detection mode of the embodiment of the application can be applied to any field needing to identify human behaviors.

In order to solve at least one of the above problems, an embodiment of the present application provides a behavior information detection scheme, by identifying a video frame to be processed, determining a plurality of target human objects having preset interactions therein, such as human objects having limb touches or chases, determining positioning information of the target human objects, then obtaining action tracks of the target human objects, and determining whether some preset behaviors having dangers, such as a storm behavior, occur in the target human objects according to the action tracks and the positioning information, if the preset behaviors are determined to occur, a prompt can be sent out in time, so that intervention and prevention of the dangerous behaviors are facilitated in time, and data processing efficiency and intellectualization of a monitoring system are improved.

Some embodiments of the present application are described in detail below with reference to the accompanying drawings. In the case where there is no conflict between the embodiments, the following embodiments and features in the embodiments may be combined with each other. In addition, the sequence of steps in the method embodiments described below is only an example and is not strictly limited.

As shown in fig. 1, the present embodiment provides an electronic apparatus 1 including: at least one processor 11 and a memory 12, one processor being exemplified in fig. 1. The processor 11 and the memory 12 are connected by a bus 10. The memory 12 stores instructions executable by the processor 11, and the instructions are executed by the processor 11, so that the electronic device 1 can execute all or part of the methods in the embodiments described below, to automatically identify the action behaviors of the human body object according to the video frame to be processed, and timely send out a prompt when detecting that the preset dangerous behavior exists in the video frame, thereby facilitating timely intervention and prevention of the dangerous behavior, and improving the data processing efficiency and intellectualization of the monitoring system.

In an embodiment, the electronic device 1 may be a mobile phone, a tablet computer, a notebook computer, a desktop computer, or a large computing system composed of a plurality of computers.

Fig. 2 is a schematic diagram of an application scenario 200 of a behavior information detection system according to an embodiment of the present application. As shown in fig. 2, the system includes: server 210 and terminal 220, wherein:

The server 210 may be a data platform that provides a behavior information detection service, such as a video surveillance cloud platform. In an actual scenario, a video surveillance cloud platform may have multiple servers 210, for example, 1 server 210 in fig. 2.

The terminal 220 may be a computer, a mobile phone, a tablet, or other devices used when the user logs in to the video monitoring cloud platform, or a plurality of terminals 220 may be provided, and 2 terminals 220 are illustrated in fig. 2 as an example.

Information transmission between the terminal 220 and the server 210 may be performed through the internet, so that the terminal 220 may access data on the server 210. The terminal 220 and/or the server 210 may be implemented by the electronic device 1.

The behavior information detection scheme of the embodiment of the application can be deployed on the server 210, the terminal 220 or the server 210 and the terminal 220. The actual scene may be selected based on actual requirements, which is not limited in this embodiment.

When the behavior information detection scheme is deployed in whole or in part on the server 210, an interface may be invoked open to the terminal 220 to provide algorithmic support to the terminal 220.

The method provided by the embodiment of the application can be realized by the electronic equipment 1 executing corresponding software codes and by carrying out data interaction with a server. The electronic device 1 may be a local terminal device. When the method is run on a server, the method can be implemented and executed based on a cloud interaction system, wherein the cloud interaction system comprises the server and the client device.

In a possible implementation manner, the method provided by the embodiment of the present application provides a graphical user interface through a terminal device, where the terminal device may be the aforementioned local terminal device or the aforementioned client device in the cloud interaction system.

Please refer to fig. 3, which is a behavior information detection method according to an embodiment of the present application, the method may be executed by the electronic device 1 shown in fig. 1, and may be applied to the video monitoring application scenario shown in fig. 2, so as to automatically identify the action behavior of the human object according to the video frame to be processed, and timely send a prompt when detecting that a preset dangerous behavior exists in the video frame, so as to facilitate timely intervention and prevention of the dangerous behavior, and improve the data processing efficiency and intellectualization of the monitoring system. In this embodiment, taking the terminal 220 as an executing terminal as an example, the method includes the following steps:

step 301: and obtaining a video frame to be processed.

In this step, the video frame to be processed may be a monitoring video frame in a specific area, and the video frame may include activity information of a person in the specific area. Such as a video frame of a surveillance in a campus may include a moving picture of a student or teacher and other relevant personnel. The video frames to be processed can be continuous video streams or a plurality of video frames with a certain time sequence, which are sampled from the monitoring video. The video frames to be processed can be obtained in real time through a monitoring system in the campus.

Step 302: and identifying a plurality of target human body objects with preset mutual actions in the video frame to be processed, and determining positioning information of the plurality of target human body objects.

In this step, the preset interaction refers to interaction between different human body objects, and the preset interaction may include direct contact action of human body, such as limb contact action, and the preset interaction may also be non-contact action, such as chase action between two people. The image of the video frame to be processed can be processed through an image recognition technology, each human body object in the video frame to be processed is recognized, and then a plurality of target human body objects with preset mutual actions are determined. And gives the positioning information of the target human body objects with preset mutual actions.

In an embodiment, the preset interaction includes a mutual touching action of limbs among the plurality of human objects. Step 302 may specifically include: inputting the video frame to be processed into a preset gesture detector, so that the gesture detector can identify the actions of the human body objects in the video frame to be processed, and outputting the positioning information of a plurality of target human body objects with mutual limb touch actions in the video frame to be processed.

In this embodiment, the motion of the human body object in the video frame to be processed can be automatically detected by the pre-trained gesture detector, so that the data volume of online calculation is reduced. The gesture detector is used for identifying the actions of the human body objects in the input video frame to be processed and outputting positioning information of a plurality of target human body objects with mutual limb touch actions in the video frame to be processed. The positioning information may be represented by means of a positioning box.

In one embodiment, before inputting the video frame to be processed into the preset gesture detector, the method further includes: a training process for a gesture detector, comprising: and acquiring a first sample image, and marking different human body objects with mutual limb touch actions in the first sample image as a whole. Training an image detector by using the first sample image to obtain a gesture detector, wherein the gesture detector is used for integrally positioning and outputting different target human body objects with limbs touching each other.

In this embodiment, on the training implementation of the gesture detector, the scheme includes two aspects of sample labeling and training the detector. The training sample is from a sample of a real frame, and for expressing the frame beating action, the coordinate of a binding frame is marked by a rioter and a rioter of the frame beating action as a whole, namely the rioter and the rioter are marked in the same binding frame. Taking direct touch of a human body as an example, whether a riot action or a riot action exists between students or not needs to be identified in a campus scene, the riot action has a specific action, for example, limb touch occurs between two students, a frame beating action may occur, a gesture detector in the embodiment needs to detect a typical frame beating gesture in a video frame to realize frame beating identification, and position coordinates of the frame beating occurrence are given, namely a frame beating binding frame. Typical frame gestures include, but are not limited to, boxing, kicking, clasping, pressing on the ground, pressing against the ground, pinching the neck, and the like. In an actual scene, the typical playing gesture can be the instantaneous action gesture of the video frame to be processed, for example, the gesture of the person at the moment of the contact of the person with the person is the gesture of the body different from the daily behavior. The first sample image with the frame playing gesture can be obtained in advance, and the first sample image can be a series of continuous video frame images or a plurality of images obtained by sampling a specific video stream. And marking different human body objects with mutual touch actions of limbs as a whole in the first sample image to form a marked first sample image. Taking the labeling of the cradling gesture as an example, a human body object with a typical cost gesture can be labeled by adopting a cradling binding frame, and two or more people of a riot person and a riot person can be contained in the cradling binding frame.

The present embodiment may employ Yolov detectors, or other detectors. And inputting the marked first sample image (comprising the sample video frame image and the coordinates marked with the framing binding frame in the drawing) into a detector for training to obtain the required framing posture detector. The gesture detector can give the position coordinates of the target human body object with the frame-beating action aiming at the input video frame to be processed.

As shown in fig. 4, for a schematic diagram of gesture detection provided by the embodiment of the present application, the output result of the gesture detector may refer to the output video frame as shown in fig. 4, where a "framing binding frame" with a plurality of human objects in a typical framing gesture exists.

In one embodiment, the step 302 may specifically include: a plurality of first candidate objects for which a running gesture exists in a video frame to be processed is identified. And determining the motion speed and the motion direction of the first candidate object according to the video frames to be processed. A plurality of second candidates for which running behavior exists is determined from the plurality of first candidates according to the running gesture, the movement speed, and the movement direction. And determining a plurality of target human body objects with mutual pursuit behaviors from the plurality of second candidate objects according to the movement direction and the movement speed of the second candidate objects.

In this step, the preset mutual action may include a mutual chase action between a plurality of human subjects. In a practical scenario, non-contact actions may occur between multiple people who perform a riot action, such as a person chasing another person, and the chasing action is often accompanied by the occurrence of the riot process, so that mutual chasing actions need to be identified. The method comprises the steps of firstly identifying a plurality of first candidate objects with running gestures in a video frame to be processed through an image identification technology, wherein people who chase each other generally have running behaviors, further respectively determining the movement speed and the movement direction of the first candidate objects according to the movement states of the first candidate objects in the video frame to be processed, screening the first candidate objects with the running gestures based on the movement speed and the movement direction, screening a plurality of second candidate objects with the running behaviors, further screening the second candidate objects according to the movement direction and the movement speed of the first candidate objects, screening a plurality of target human body objects with the mutual chase behaviors, and improving the accuracy of the chase behavior identification result through multi-round screening.

In one embodiment, the motion speed is identified, considering that the running human object in the video frame image and the distance between the human object and the camera are related, the motion speed can be quantified by normalizing the motion distance by using the height of the human object, specifically, the motion speed of the human object in the video frame can be determined by adopting the following formula (1):

（1）

wherein r is the motion speed of the human body object, d is the pixel distance of the human body object moving for 1 second in the video frame to be processed, and h is the pixel height of the human body object in the motion range.

In one embodiment, the identifying a plurality of first candidate objects in the video frame to be processed that have running gestures in step 302 may specifically include: and acquiring a human body image in the video frame to be processed. The human body image is input into a preset gesture classifier, so that the gesture classifier outputs a plurality of first candidate objects with running gestures in the human body image. The gesture classifier is trained based on a preset second sample image, and a human body object with a running gesture is marked in the second sample image. The posture classifier is used for classifying posture information of a human body object in a human body image into a running posture and a non-running posture.

In this embodiment, the human body image in the video frame to be processed may be a human body image in which a human body feature is determined from the video frame to be processed, for example, human body information identification is performed on the video frame to be processed in advance, and the included human body image is determined. The detection of running behavior can be achieved by a "running gesture classifier" in combination with "motion speed analysis". The running gesture classifier may divide the input human body image into a running gesture image and a non-running gesture image, and further output a plurality of first candidate objects having running gestures in the human body image. Wherein the pose classifier is trained based on the second sample image. The second sample image of the training gesture classifier is an image containing a human body, for example, the sample image containing a human body object can be collected from some videos, and the second sample image is labeled as running and non-running. The human object in the second sample image labeled running class can satisfy the following conditions: body forward leaning, swing arm, leg and feet off the ground. Through the running appearance knowledge learned from the second sample image, the gesture classifier takes the human body image in the video frame to be processed as input, divides the human body meeting the above conditions into first candidate objects with running gestures, and divides other human body objects not meeting the above conditions into non-running gesture objects.

Alternatively, the posture classifier can be Resnet-18, and other suitable classifiers can be used.

In one embodiment, determining in step 302 that there is a second plurality of candidates for running behavior from the first plurality of candidates according to the running gesture, the movement speed, and the movement direction includes: the running gesture times of the first candidate objects in the first preset duration are determined. And determining the first candidate object with running gesture times larger than or equal to the preset times and the movement speed and the movement direction meeting preset conditions as a second candidate object with running behaviors.

In this embodiment, in order to accurately identify a human body object with running behavior, running gesture frequency of a first candidate object with running gesture may be counted, for example, the running gesture times of a plurality of first candidate objects in a first preset duration are determined, if the times of running gesture of a certain human body object in the first candidate object in the first preset duration are greater than the preset times, and the movement speed and the movement direction of the human body object meet corresponding preset conditions, it may be determined that running behavior of the human body object occurs. The first preset duration may be set according to actual requirements, for example, 1 second. The preset times can also be determined according to possible running gesture occurrence frequency in the actual running process of the human body, for example, running gesture times of at least one person in 1 second can be counted when running behavior occurs, and the running gesture times are set to be preset times, for example, 2 times are preset here. The preset conditions that the movement speed and movement direction need to satisfy may be determined according to the movement speed and direction of a typical person when the running behavior occurs. For example, when r meets the following preset condition (formula 2), it is determined that the corresponding human object has run behavior:

（2）

Where o is the motion direction of the human body in the video frame to be processed within 1 second. When the human body object takes place twice running gesture within 1 second and the movement speed and movement direction simultaneously satisfy formula 2, the human body object may consider that running behavior takes place, and the human body 3 object is screened out as a second candidate object.

In one embodiment, determining in step 302 that there are a plurality of target human objects having a mutual chase behavior from the plurality of second candidate objects according to the motion direction and the motion speed of the second candidate objects includes: a movement direction deviation and a movement speed deviation between different second candidate objects are determined, respectively. And determining different second candidate objects with the movement direction deviation smaller than the preset direction threshold and the movement speed deviation smaller than the preset speed threshold as a plurality of target human body objects.

In this embodiment, the second candidate object is a human body object in which the running behavior occurs in the video frame to be processed, and in the process of identifying the pursuit behavior based on the movement direction and the movement speed, two or more human body objects in which the running direction is similar and the running speed is similar in the second candidate object may be determined as target human body objects that pursue each other. The running speed similarity and the running direction similarity may be predefined, for example, the running direction similarity may be determined when the movement direction deviation of two or more human objects is smaller than a preset direction threshold value, and the running direction similarity may be determined when the movement speed deviation of two or more human objects is smaller than a preset speed threshold value, wherein the preset direction threshold value and the preset speed threshold value may be determined according to the running direction and the running speed that may occur between human bodies that chase each other in an actual scene, for example, the preset direction threshold value may be 20 °, and the preset speed threshold value may be determined in a quantization manner similar to the foregoing formula (1), for example, the preset speed threshold value may be 0.1.

Specifically, first, the motion speed deviation and the motion track direction deviation between every two of the second candidate objects are determined, and for any two running human body objects, the human body object satisfying the following conditions (formula 3 and formula 4) within 1 second is determined as the target human body object having the pursuit behavior:

（3）

（4）

Where o ₁ and o ₂ are the motion directions of the trajectories of the two running human objects within 1 second, and r ₁ and r ₂ are the normalized motion distances (for representing the motion speeds) of the trajectories of the two running human objects within 1 second, respectively.

The method for detecting the playing posture and the running posture adopted by the embodiment of the application is based on the typical posture, and other playing posture detection schemes, such as a method based on multi-frame videos and a method based on a key point sequence, can be selected alternatively. The video-based method can obtain higher accuracy, but the method needs to take multi-frame images as input, has high requirement on computing equipment and has slower speed. And because the video frame has few samples, the recall rate of the algorithm in the real scene is difficult to ensure. The method based on the key points has the characteristic of strong generalization capability, only uses the key points of the human body as input, and has low data dimension and high algorithm reasoning speed. But also has lower precision due to the characteristics of low data dimension and sparseness. The typical gesture method adopted by the embodiment of the application can ensure the accuracy and the recall rate and has the advantage in calculation speed.

Step 303: and acquiring track information of a plurality of target human body objects in the video frame to be processed.

In this step, the trajectory information is used to characterize the motion trajectory characteristics of the target human object in the video frame to be processed, such as the motion trajectory of the target human object moving from one location to another. Taking a campus monitoring scene as an example, the track information can represent the moving track of related personnel such as students or teachers among classrooms, playgrounds, libraries and other places. The track information can accurately represent the movement behavior characteristics of the human body object, and can provide accurate information for the subsequent judgment of the human body object.

In one embodiment, step 303 may specifically include: and respectively acquiring human body characteristics and tracking tracks of a plurality of human body objects in a plurality of sub-video frames. And combining tracking tracks of the same human body object from different sub-video frames according to human body characteristics to respectively generate complete track information of a plurality of human body objects. Track information of a plurality of target human body objects is selected from the complete track information of the plurality of human body objects.

In this embodiment, the video frames to be processed may include a plurality of sub-video frames originating from different acquisition devices. For example, in a campus monitoring scene, different cameras may be installed at different positions of a classroom, a playground, a library, etc., and monitoring videos of different areas may be collected respectively, and a video frame to be processed may include the monitoring videos collected by the different cameras. The method comprises the steps of preprocessing a plurality of sub-video frames in an information extraction mode, extracting the position, time and human body characteristics of a person in a video picture to form structural information of the human body, combining tracking tracks of the same person in different cameras by taking the human body characteristics as clues, namely restoring the position of each time unit (second) of each human body object individual in a campus, for example, tracking tracks of students from entering the campus to exiting the campus every day, and forming track information of the students in the campus.

Optionally, a detection algorithm may be used to detect a human body in the video frame and extract human body features, where the human body features include, but are not limited to, characteristic information that is displayed by the human body in terms of face, long phase, body shape, body posture, etc., such as height, low, fat, thin, yellow, white, black, red, etc. information of skin color. In an actual scene, human body characteristics can be extracted by a specific model, and various characteristic information of the human body can be represented by adopting characteristic vectors output by the model.

Optionally, for the sub-video frames from the same video camera, a preset tracking algorithm may be used to track the human body object in the sub-video frame, and restore the track coordinates of the human body in the sub-video frame picture.

Optionally, identity information confirmation can be performed on the human body object in the video frame, for example, the identity of the individual can be obtained through the face features, such as information of the name, the class, etc.

Optionally, for a human body object that cannot be directly identified by identity, for example, a human face is not right opposite to a camera, so that the face identification fails, a place where the human body object enters and exits, such as a class, an office, a room and the like, of the human body object can be identified through track analysis, and identity information of the human body object is predicted through other clues of the track of the human body object.

Step 304: and judging whether preset behaviors exist among the plurality of target human body objects according to the track information and the positioning information. If yes, go to step 305, otherwise, go back to step 301 to continue the detection.

In this step, the preset actions may include an action of rioting, a potential of spoofing, and the like, which jeopardizes personal safety. After the target human body objects with preset actions are determined, whether preset dangerous behaviors occur or not can be further determined by combining the track information of the human body objects, and the accuracy of behavior detection is improved.

In one embodiment, the track information includes a single person positioning frame and identity information of the target human subject. The positioning information includes an overall positioning frame of the plurality of target human objects. Step 304 may specifically include: if the single positioning frame is in the integral positioning frame, determining that the target human body object marked by the single positioning frame has preset behaviors, and determining the identity information bound by the single positioning frame as the identity information of the target human body object with the preset behaviors.

In this embodiment, the track information includes, but is not limited to, a single positioning frame and identity information of the target human body object, and the positioning information of the plurality of target human body objects with preset interactions includes, but is not limited to, an overall positioning frame of the plurality of target human body objects. Taking the action of taking a frame as a preset action as an example, for example, the gesture detector outputs the frame riot and the riot as a whole, that is to say, the frame of the whole positioning comprises the frame riot and the riot. In order to clearly determine the identity of the racking agent, the output integral positioning frame can be matched with the single positioning frame in the track information, and if the racking binding frame A output by the gesture detector comprises the human body object 1 and the human body object 2, if the single positioning frame of the human body object 1 is completely contained in the racking binding frame A, the target human body object 1 is determined to have the racking action, and if the single positioning frame of the human body object 2 is completely contained in the racking binding frame A, the target human body object 2 is determined to have the racking action, and further, the racking action between the human body object 1 and the human body object 2 can be determined.

Step 305: if preset behaviors exist among the plurality of target human body objects, prompt information is sent out.

In this step, if it is determined that the target human body object has a preset dangerous behavior, for example, it is determined that a frame-taking behavior has occurred between the human body object 1 and the human body object 2, prompt information may be timely sent, where the prompt information includes, but is not limited to, video clips and position information of the occurrence of the frame-taking behavior, so that relevant personnel can timely process the video clips and the position information, and intelligence of the monitoring system is improved.

Alternatively, an alert may be sent to the person 1 and 2 being cradled, alerting the person 1 and 2 to stop cradling.

Optionally, voice or text prompts can be sent to parents, teachers and other related personnel of the human body object 1 and the human body object 2, or alarm phones can be called, so that the related personnel can stop dangerous behaviors in time.

In one embodiment, the step 305 may specifically include: if at least two target human body objects exist in the video frame to be processed, preset behaviors occur, the duration time of the preset behaviors is longer than the second preset duration time, and prompt information is sent out.

In this embodiment, the second preset time period may be set based on the actual requirement, for example, 5 seconds. Taking 'frame taking risk identification' in a campus monitoring scene as an example, taking human body track information and a frame taking gesture detection result as input, and if two or more human body objects exist and frame taking actions occur for more than 5 seconds at the same time, considering that real frame taking is performed. The corresponding framed video clips and positions may be immediately reported to security guards or teachers for timely processing.

In one embodiment, the method further comprises: recording identity information of a target human body object with preset behaviors, counting the occurrence frequency of the human body object bound with the identity information about the preset behaviors, and sending prompt information when the occurrence frequency is greater than the preset frequency.

In this embodiment, long-term tracking and recognition can be performed on a human body object with dangerous behavior, so as to determine whether there is a risk of long-term dangerous behavior. Taking the identification of the risk of the spoofing in the campus monitoring scene as an example, in the actual scene, the degree of the spoofing and the time dimension of the spoofing behavior are slight, but the frequency is high, so that the long-term behavior and the immobilization of the spoofing object can be identified and discovered. And aiming at the accidental agents in the racking gesture, carrying out identity recognition and track tracking, and judging whether the deception behavior occurs or not through information such as behavior frequency, the identity of the riot object and the like. And the data processing capability and the intellectualization of the monitoring system can be further improved by carrying out grading notification on the deception behavior and intervention of staff such as a teacher, a security guard and the like.

For example, "the identification of the risk of the spoofing" takes the track information of the human body object, the cradling gesture and the chasing behavior as input, if the track of two or more human body objects has cradling behavior at the same time, the cradling event is recorded, and the track and the identity information of the cradling agent are recorded.

The system analyzes the recorded racking behavior and identity information at a certain frequency (e.g., one week, one month), personnel meeting the following conditions will be considered to be at risk of or being at risk of being spoofed,

Condition 1: someone satisfies a certain number of racking actions, which may be 2 times per week.

Condition 2: two persons can perform the simultaneous racking or chasing actions for a certain number of times, and the number of times can be 5 times per week.

The video corresponding to the occurrence of the racking event can also be recorded and provided for a teacher to further screen.

According to the behavior information detection method, the video frames to be processed are identified, the plurality of target human body objects with preset mutual actions, such as human body objects with limb touch or chasing, are determined, the positioning information of the target human body objects is determined, then the action tracks of the target human body objects are obtained, whether some preset behaviors with dangers, such as riot behaviors, occur in the target human body objects or not is judged according to the action tracks and the positioning information, if the preset behaviors are determined to occur, prompts can be sent out in time, the dangerous behaviors can be conveniently intervened and stopped in time, and the data processing efficiency and the intelligence of the monitoring system are improved.

The embodiment of the application realizes the cradling recognition method based on the typical cradling gesture and the identity and track tracking based method for recognizing the risk of the spoofing. The typical gesture detector detects the moment of the shooting actions such as boxing, kicking, embracing and falling and the like through an image detection method, and gives the position of the action in the video frame. And confirming the racking behavior and the severity through time accumulation, and timely informing related personnel for processing. On the other hand, the degree of the riot and the time dimension of the spoofing behavior are slight but frequent, so that the long-term nature of the behavior and the immobilization of the subject of the riot need to be identified and found. And aiming at the accidental agents in the racking gesture, carrying out identity recognition and track tracking, and judging whether the deception behavior occurs or not through information such as behavior frequency, the identity of the riot object and the like. The intervention of the teacher, security personnel and the like is notified by grading the deceptive behavior.

Referring to fig. 5, which is a behavior information detection system according to an embodiment of the present application, taking a campus monitoring scenario as an example, the system mainly includes: the system comprises a personnel information extraction module, an action recognition module, a risk analysis module and a track analysis module, wherein,

The personnel information extraction module is used for extracting information from the input monitoring video, extracting the position, time and human body characteristics of a person in the monitoring video picture, and forming structural information of a human body, wherein the structural information of the human body comprises but is not limited to: human body position, human body characteristics and personnel identity information. And outputting the human body structural information to the action recognition module, the risk analysis module and the track analysis module.

And the track analysis module takes the human body characteristics provided by the information extraction module as clues, combines the human body tracks of the same person on different cameras, namely, restores the position of each time unit (second) of each individual person in the campus, and forms track information in the campus from the position information of the students entering the campus to the position information of the students leaving the campus every day. For the situation that the human face is failed due to the fact that the 'information extraction module' cannot recognize the track of the identity feature, such as the human face which is not right opposite to the camera, the track analysis is used for confirming places where people enter and exit, such as a class, an office, a room and the like, and other clues of the track of the people are obtained. The track analysis module outputs the position track of the human body object in the school scene at each time to the risk analysis module and provides the identity information of the fighter.

The motion recognition module is used for typical frame gesture detection and/or pursuit behavior recognition, and can be seen in detail in the description of the foregoing embodiments.

And the risk analysis module is used for frame taking risk identification and spoofing risk identification. The method comprises the steps of taking human body track and a racking gesture detection result as input, and considering that real racking is in progress if two or more people have racking behaviors for more than 5 seconds at the same time. The corresponding video clips and positions will be immediately reported to security or teacher for timely processing.

The method comprises the steps of performing the risk identification of the bloodline by taking the human body track and the gesture of the cradling and the identification result of the chasing behavior as input, and recording the cradling event and the cradling agent track and identity information provided by the track analysis module if two or more tracks exist and the cradling behavior occurs simultaneously.

According to the embodiment of the application, the behavior of taking the frame and the risk of finding the cheating are identified through the monitoring camera and the artificial intelligence algorithm. The artificial intelligence algorithm realizes behavior detection of the frame playing gesture and running chase in the video, determines the occurrence of real frame playing through the duration of the frame playing gesture, and firstly detects the typical frame playing gesture such as boxing, kicking, falling in a throw way, pressing on the ground, pushing away the friction, pinching the neck and the like through a typical gesture detection method. The analysis of the racking behavior accumulates the detection results and reports the personnel meeting the racking behavior. The method comprises the steps that a risk identification module records a frame taking gesture detection result and a chasing detection result, and identity identification is carried out on all agents identified as frame taking gestures and chasing, wherein the identity identification comprises track tracking and face identification. Through counting the students who repeatedly take place the behavior of taking the shelf to discover the behavior of the deception, for example, to satisfying certain times of taking the shelf gesture and fixed personnel of taking the shelf, the system gives out the warning to mr and security, and the system provides the video simultaneously and takes the clip in order to further analyze and discriminate the behavior of the deception. The detection of the cradling behavior is realized through a typical gesture detector, the cradling behavior is determined through time accumulation, and the fraying behavior is distinguished through the frequency and the object of the cradling behavior. The data processing capacity and the intellectualization of the monitoring system are improved.

Please refer to fig. 6, which is a behavior information detection method according to an embodiment of the present application, the method may be executed by the electronic device 1 shown in fig. 1, and may be applied to the video monitoring application scenario shown in fig. 2, so as to automatically identify the action behavior of the human object according to the video frame to be processed, and timely send a prompt when detecting that a preset dangerous behavior exists in the video frame, so as to facilitate timely intervention and prevention of the dangerous behavior, and improve the data processing efficiency and intellectualization of the monitoring system. In this embodiment, taking the terminal 220 as an executing terminal as an example, the method includes the following steps:

step 601: and obtaining a video frame to be processed. The video frames to be processed comprise a plurality of sub-video frames originating from different acquisition devices.

Step 602: inputting the video frame to be processed into a preset gesture detector, so that the gesture detector can identify the actions of the human body objects in the video frame to be processed, and outputting the integral positioning frames of a plurality of target human body objects with mutual limb touch actions in the video frame to be processed.

Step 603: a plurality of first candidate objects for which a running gesture exists in a video frame to be processed is identified.

Step 604: and determining the motion speed and the motion direction of the first candidate object according to the video frames to be processed.

Step 605: a plurality of second candidates for which running behavior exists is determined from the plurality of first candidates according to the running gesture, the movement speed, and the movement direction.

Step 606: and determining a plurality of target human body objects with mutual pursuit behaviors from the plurality of second candidate objects according to the movement direction and the movement speed of the second candidate objects.

Step 607: and respectively acquiring human body characteristics and tracking tracks of a plurality of human body objects in a plurality of sub-video frames.

Step 608: and combining tracking tracks of the same human body object from different sub-video frames according to human body characteristics to respectively generate complete track information of a plurality of human body objects.

Step 609: track information of a plurality of target human body objects is selected from the complete track information of the plurality of human body objects, and the track information comprises a single person positioning frame and identity information of the target human body objects.

Step 610: matching the single positioning frame with the integral positioning frames of the plurality of target human body objects with limbs touching each other, if the single positioning frame is in the integral positioning frame, determining that the target human body objects marked by the single positioning frame have preset behaviors, and determining the identity information bound by the single positioning frame as the identity information of the target human body objects with the preset behaviors.

Step 611: if at least two target human body objects exist in the video frame to be processed, preset behaviors occur, the duration time of the preset behaviors is longer than the second preset duration time, and prompt information is sent out.

Step 612: recording identity information of a target human body object with preset behaviors, counting the occurrence frequency of the human body object bound with the identity information about the preset behaviors, and sending prompt information when the occurrence frequency is greater than the preset frequency.

The details of each step of the behavior information detection method can be referred to the related descriptions of the above embodiments, which are not repeated here.

Please refer to fig. 7, which illustrates a behavior information detecting apparatus 700 according to an embodiment of the present application, which is applicable to the electronic device 1 illustrated in fig. 1 and can be applied to the video monitoring application scenario illustrated in fig. 2, so as to automatically identify the action behavior of the human object according to the video frame to be processed, and timely send out a prompt when detecting that the preset dangerous behavior exists in the video frame, so as to facilitate timely intervention and prevention of the dangerous behavior, and improve the data processing efficiency and intellectualization of the monitoring system. The device comprises: the functional principles of the first acquisition module 701, the identification module 702, the second acquisition module 703, the judgment module 704 and the prompt module 705 are as follows:

A first obtaining module 701, configured to obtain a video frame to be processed.

The identifying module 702 is configured to identify a plurality of target human body objects with preset interactions in a video frame to be processed, and determine positioning information of the plurality of target human body objects.

The second acquiring module 703 is configured to acquire track information of a plurality of target human body objects in a video frame to be processed.

The judging module 704 is configured to judge whether a preset behavior exists between the plurality of target human body objects according to the track information and the positioning information.

The prompt module 705 is configured to send out prompt information if a preset behavior exists among the plurality of target human body objects.

In an embodiment, the preset interaction includes a mutual touching action of limbs among the plurality of human objects. The recognition module 702 is configured to input the video frame to be processed into a preset gesture detector, so that the gesture detector recognizes the motion of the human body object in the video frame to be processed, and output positioning information of a plurality of target human body objects with limbs touching each other in the video frame to be processed.

In one embodiment, the method further comprises: the training module is used for acquiring a first sample image before the video frame to be processed is input into the preset gesture detector, and marking different human body objects with mutual touch actions of limbs in the first sample image as a whole. Training an image detector by using the first sample image to obtain a gesture detector, wherein the gesture detector is used for integrally positioning and outputting different target human body objects with limbs touching each other.

In an embodiment, the preset interaction includes a mutual chase action between the plurality of human objects. The identifying module 702 is configured to identify a plurality of first candidate objects with running gestures in a video frame to be processed. And determining the motion speed and the motion direction of the first candidate object according to the video frames to be processed. A plurality of second candidates for which running behavior exists is determined from the plurality of first candidates according to the running gesture, the movement speed, and the movement direction. And determining a plurality of target human body objects with mutual pursuit behaviors from the plurality of second candidate objects according to the movement direction and the movement speed of the second candidate objects.

In one embodiment, the identification module 702 is specifically configured to determine the number of running postures of the plurality of first candidate objects within the first preset duration. And determining the first candidate object with running gesture times larger than or equal to the preset times and the movement speed and the movement direction meeting preset conditions as a second candidate object with running behaviors.

In one embodiment, the identification module 702 is specifically configured to determine a movement direction deviation and a movement speed deviation between different second candidate objects, respectively. And determining different second candidate objects with the movement direction deviation smaller than the preset direction threshold and the movement speed deviation smaller than the preset speed threshold as a plurality of target human body objects.

In one embodiment, the video frames to be processed include a plurality of sub-video frames originating from different acquisition devices. The second acquiring module 703 is configured to acquire human body features and tracking tracks of a plurality of human body objects in a plurality of sub-video frames respectively. And combining tracking tracks of the same human body object from different sub-video frames according to human body characteristics to respectively generate complete track information of a plurality of human body objects. Track information of a plurality of target human body objects is selected from the complete track information of the plurality of human body objects.

In one embodiment, the track information includes a single person positioning frame and identity information of the target human subject. The positioning information includes an overall positioning frame of the plurality of target human objects. And the judging module 704 is configured to determine that the target human body object marked by the single positioning frame has a preset behavior if the single positioning frame is in the whole positioning frame, and determine the identity information bound by the single positioning frame as the identity information of the target human body object having the preset behavior.

In one embodiment, the method further comprises: the analysis module is used for recording the identity information of the target human body object with the preset behavior, counting the occurrence frequency of the human body object bound by the identity information about the preset behavior, and sending out prompt information when the occurrence frequency is greater than the preset frequency.

In an embodiment, the prompting module 705 is configured to send a prompting message if at least two target human body objects exist in the video frame to be processed to generate a preset behavior, and the duration of the preset behavior is longer than the second preset duration.

In an embodiment, the identification module 702 is further specifically configured to acquire a human body image in the video frame to be processed. The human body image is input into a preset gesture classifier, so that the gesture classifier outputs a plurality of first candidate objects with running gestures in the human body image. The gesture classifier is trained based on a preset second sample image, and a human body object with a running gesture is marked in the second sample image. The posture classifier is used for classifying posture information of a human body object in a human body image into a running posture and a non-running posture.

For a detailed description of the behavior information detection apparatus 700, please refer to the description of the related method steps in the above embodiment, the implementation principle and technical effects are similar, and the detailed description of this embodiment is omitted here.

Fig. 8 is a schematic structural diagram of a cloud device 80 according to an exemplary embodiment of the present application. The cloud device 80 may be used to run the methods provided in any of the embodiments described above. As shown in fig. 8, the cloud device 80 may include: memory 804 and at least one processor 805, one for example in fig. 8.

Memory 804 is used to store computer programs and may be configured to store various other data to support operations on cloud device 80. The memory 804 may be an object store (Object Storage Service, OSS).

The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The processor 805 is coupled to the memory 804, and is configured to execute the computer program in the memory 804, so as to implement the solutions provided by any of the method embodiments, and specific functions and technical effects that can be implemented are not described herein.

Further, as shown in fig. 8, the cloud device further includes: firewall 801, load balancer 802, communication component 806, power component 803, and other components. Only some components are schematically shown in fig. 8, which does not mean that the cloud device only includes the components shown in fig. 8.

In one embodiment, the communication component 806 of fig. 8 is configured to facilitate wired or wireless communication between the device in which the communication component 806 is located and other devices. The device in which the communication component 806 is located can access a wireless network based on a communication standard, such as a WiFi,2G, 3G, 4G, LTE (Long Term Evolution, long term evolution, LTE for short), 5G, or a combination thereof. In one exemplary embodiment, the communication component 806 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the Communication component 806 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on radio frequency identification (Radio Frequency Identification, RFID) technology, infrared data Association (IrDA) technology, ultra Wide Band (UWB) technology, bluetooth (BT) technology, and other technologies.

In one embodiment, the power component 803 of fig. 8 provides power to various components of the device in which the power component 803 is located. The power components 803 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the devices in which the power components reside.

The embodiment of the application also provides a computer readable storage medium, wherein computer executable instructions are stored in the computer readable storage medium, and when the processor executes the computer executable instructions, the method of any of the previous embodiments is realized.

Embodiments of the present application also provide a computer program product comprising a computer program which, when executed by a processor, implements the method of any of the preceding embodiments.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, e.g., the division of modules is merely a logical function division, and there may be additional divisions of actual implementation, e.g., multiple modules may be combined or integrated into another system, or some features may be omitted or not performed.

The integrated modules, which are implemented in the form of software functional modules, may be stored in a computer readable storage medium. The software functional modules described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or processor to perform some of the steps of the methods of the various embodiments of the application.

It should be appreciated that the Processor may be a central processing unit (Central Processing Unit, abbreviated as CPU), or may be other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, abbreviated as DSP), application SPECIFIC INTEGRATED Circuit (ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution. The memory may include a high-speed RAM (Random Access Memory ) memory, and may further include a nonvolatile memory NVM (Nonvolatile memory, abbreviated as NVM), such as at least one magnetic disk memory, and may further be a U-disk, a removable hard disk, a read-only memory, a magnetic disk, or an optical disk.

The storage medium may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random-Access Memory (SRAM), electrically erasable programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ ONLY MEMORY EEPROM), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory (EPROM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an Application SPECIFIC INTEGRATED Circuits (ASIC). It is also possible that the processor and the storage medium reside as discrete components in an electronic device or a master device.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article of apparel, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article of apparel, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article of apparel, or apparatus that comprises the element.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method of the embodiments of the present application.

In the technical scheme of the application, the related information such as user data and the like is collected, stored, used, processed, transmitted, provided, disclosed and the like, which are all in accordance with the regulations of related laws and regulations and do not violate the popular public order.

The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the application, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A behavior information detection method, characterized by comprising:

Acquiring a video frame to be processed;

Identifying a plurality of target human body objects with preset mutual actions in the video frame to be processed, and determining positioning information of the plurality of target human body objects;

acquiring track information of the plurality of target human body objects in the video frame to be processed;

Judging whether preset behaviors exist among the plurality of target human body objects according to the track information and the positioning information;

and if the preset behaviors exist among the plurality of target human body objects, sending out prompt information.

2. The method of claim 1, wherein the preset interaction comprises a limb-to-limb interaction between a plurality of human subjects; the identifying that a plurality of target human body objects with preset mutual actions exist in the video frame to be processed, and determining positioning information of the plurality of target human body objects includes:

Inputting the video frame to be processed into a preset gesture detector, so that the gesture detector can identify the actions of human body objects in the video frame to be processed, and outputting the positioning information of the target human body objects with the mutual touch actions of limbs in the video frame to be processed.

3. The method of claim 2, further comprising, prior to said inputting the video frame to be processed into a preset gesture detector:

acquiring a first sample image, and marking different human body objects with mutual limb touch actions as a whole in the first sample image;

And training an image detector by adopting the first sample image to obtain the gesture detector, wherein the gesture detector is used for integrally positioning and outputting different target human body objects with mutual limb touch actions.

4. The method of claim 1, wherein the preset interaction comprises a mutual chase action between a plurality of human subjects; the identifying a plurality of target human body objects with preset interaction in the video frame to be processed comprises the following steps:

Identifying a plurality of first candidate objects with running gestures in the video frame to be processed;

determining the motion speed and the motion direction of the first candidate object according to the video frame to be processed;

determining a plurality of second candidate objects with running behaviors from the plurality of first candidate objects according to the running gesture, the movement speed and the movement direction;

And determining the plurality of target human body objects with mutual pursuit behaviors from the plurality of second candidate objects according to the movement direction and the movement speed of the second candidate objects.

5. The method of claim 4, wherein the determining that there is a second plurality of candidates for running behavior from the first plurality of candidates according to the running gesture, the movement speed, and the movement direction comprises:

determining running gesture times of the plurality of first candidate objects in a first preset duration;

And determining a first candidate object, of which the running gesture frequency is greater than or equal to a preset frequency and the movement speed and the movement direction meet preset conditions, as the second candidate object with running behaviors.

6. The method of claim 4, wherein determining the plurality of target human subjects from the plurality of second candidate subjects for which a mutual chase action exists based on the direction of motion and the speed of motion of the second candidate subject comprises:

respectively determining a movement direction deviation and a movement speed deviation between different second candidate objects;

and determining different second candidate objects, of which the movement direction deviation is smaller than a preset direction threshold value and the movement speed deviation is smaller than a preset speed threshold value, as the target human body objects.

7. The method of claim 1, wherein the video frame to be processed comprises a plurality of sub-video frames originating from different acquisition devices; the obtaining track information of the plurality of target human body objects in the video frame to be processed includes:

respectively acquiring human body characteristics and tracking tracks of a plurality of human body objects in the plurality of sub-video frames;

Combining tracking tracks of the same human body object from different sub-video frames according to the human body characteristics to respectively generate complete track information of the human body objects;

And selecting the track information of the plurality of target human body objects from the complete track information of the plurality of human body objects.

8. The method of claim 1, wherein the trajectory information includes a single person location box and identity information of the target human subject; the positioning information comprises an integral positioning frame of the plurality of target human body objects; the determining whether preset behaviors exist among the plurality of target human body objects according to the track information and the positioning information includes:

If the single positioning frame is in the integral positioning frame, determining that the target human body object marked by the single positioning frame has the preset behavior, and determining the identity information bound by the single positioning frame as the identity information of the target human body object with the preset behavior.

9. The method as recited in claim 8, further comprising:

recording identity information of a target human body object with the preset behavior, counting occurrence frequency of the human body object bound by the identity information about the preset behavior, and sending prompt information when the occurrence frequency is greater than the preset frequency.

10. The method of claim 1, wherein the sending a prompt message if the preset behavior exists among the plurality of target human objects comprises:

And if at least two target human body objects exist in the video frame to be processed, the preset behavior occurs, the duration of the preset behavior is longer than the second preset duration, and prompt information is sent out.

11. The method of claim 4, wherein the identifying a plurality of first candidates for a running gesture in the video frame to be processed comprises:

acquiring a human body image in the video frame to be processed;

inputting the human body image into a preset gesture classifier, so that the gesture classifier outputs a plurality of first candidate objects with running gestures in the human body image;

the gesture classifier is trained based on a preset second sample image, and a human body object with a running gesture is marked in the second sample image; the gesture classifier is used for classifying gesture information of human objects in the human body image into running gestures and non-running gestures.

12. An electronic device, comprising:

At least one processor; and

A memory communicatively coupled to the at least one processor;

Wherein the memory stores instructions executable by the at least one processor to cause the electronic device to perform the method of any one of claims 1-11.

13. A computer readable storage medium having stored therein computer executable instructions which, when executed by a processor, implement the method of any of claims 1-11.

14. A computer program product comprising a computer program which, when executed by a processor, implements the method of any of claims 1-11.