CN111860140B - Target event detection method, device, computer equipment and storage medium - Google Patents

Target event detection method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN111860140B
CN111860140B CN202010521931.1A CN202010521931A CN111860140B CN 111860140 B CN111860140 B CN 111860140B CN 202010521931 A CN202010521931 A CN 202010521931A CN 111860140 B CN111860140 B CN 111860140B
Authority
CN
China
Prior art keywords
target
video frame
action
frame
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010521931.1A
Other languages
Chinese (zh)
Other versions
CN111860140A (en
Inventor
王远江
郭子豪
郑凯
袁野
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Megvii Technology Co Ltd
Original Assignee
Beijing Megvii Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Megvii Technology Co Ltd filed Critical Beijing Megvii Technology Co Ltd
Priority to CN202010521931.1A priority Critical patent/CN111860140B/en
Publication of CN111860140A publication Critical patent/CN111860140A/en
Application granted granted Critical
Publication of CN111860140B publication Critical patent/CN111860140B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Closed-Circuit Television Systems (AREA)

Abstract

A target event detection method, apparatus, computer device and storage medium, the method comprising: acquiring a current video frame, and identifying a preset warning area from the current video frame; performing target detection in the current video frame, and obtaining a target detection frame when a target is detected in the current video frame; if the target detection frame and the preset warning area meet the preset position relation condition, sequentially identifying continuous interaction actions of the target and the preset warning area in each subsequent continuous video frame; and determining whether a target event occurs or not based on an initial action and an interaction action sequence formed by each continuous interaction action, wherein the initial action is an action corresponding to the target in the target detection frame. By the method, the position relation between the target and the warning area is detected, and whether the target event occurs is determined by combining the action state change of the target within a certain time, so that the accuracy of detecting the target event can be improved, and misjudgment is reduced.

Description

Target event detection method, device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and apparatus for detecting a target event, a computer device, and a storage medium.
Background
With the development of computer vision technology, more and more security services utilize computer vision to identify and judge targets or events in a picture scene, and generate an alarm to prompt a user when certain conditions are met.
In the related art, however, a detection frame of a human body is generally obtained first, then the detection frame and a preset warning area marked as a warning area are used for calculating an intersecting relation, whether a target event of interaction between the human body and the preset warning area occurs is further judged by judging the intersecting relation between the detection frame of the human body and the preset warning area, and if the target event occurs, a direct alarm is given. This, while being able to alert to conventional target events, may also result in a significant amount of false positives for persons present near the preset alert zone.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a target event detection method, apparatus, computer device, and storage medium that can reduce erroneous judgment.
A method of target event detection, the method comprising:
acquiring a current video frame, and identifying a preset warning area from the current video frame;
performing target detection in the current video frame, and obtaining a target detection frame when a target is detected in the current video frame;
If the target detection frame and the preset warning area meet the preset position relation condition, sequentially identifying continuous interaction actions of the target and the preset warning area in each subsequent continuous video frame;
And determining whether a target event occurs or not based on an initial action and an interaction action sequence formed by each continuous interaction action, wherein the initial action is used as an action corresponding to the target in the target detection frame.
A target event detection apparatus, the apparatus comprising:
the area identification module is used for acquiring a current video frame and identifying a preset warning area from the current video frame;
The target detection module is used for carrying out target detection in the current video frame, and when a target is detected in the current video frame, a target detection frame is obtained;
The action recognition module is used for sequentially recognizing continuous interaction actions of the target and the preset warning area in each subsequent continuous video frame if the target detection frame and the preset warning area meet the preset position relation condition;
And the target event detection module is used for determining whether a target event occurs or not based on an initial action and an interaction action sequence formed by each continuous interaction action, wherein the initial action is used as an action corresponding to the target in the target detection frame.
A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:
acquiring a current video frame, and identifying a preset warning area from the current video frame;
performing target detection in the current video frame, and obtaining a target detection frame when a target is detected in the current video frame;
If the target detection frame and the preset warning area meet the preset position relation condition, sequentially identifying continuous interaction actions of the target and the preset warning area in each subsequent continuous video frame;
And determining whether a target event occurs or not based on an initial action and an interaction action sequence formed by each continuous interaction action, wherein the initial action is used as an action corresponding to the target in the target detection frame.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
acquiring a current video frame, and identifying a preset warning area from the current video frame;
performing target detection in the current video frame, and obtaining a target detection frame when a target is detected in the current video frame;
If the target detection frame and the preset warning area meet the preset position relation condition, sequentially identifying continuous interaction actions of the target and the preset warning area in each subsequent continuous video frame;
And determining whether a target event occurs or not based on an initial action and an interaction action sequence formed by each continuous interaction action, wherein the initial action is used as an action corresponding to the target in the target detection frame.
The target event detection method, the device, the computer equipment and the storage medium are characterized in that after a current video frame is acquired, a preset warning area is identified in the current video frame, target detection is carried out on the current video frame, a target detection frame is obtained when a target is detected in the current video frame, and if the target detection frame and the preset warning area meet the condition that the target detection frame and the preset warning area meet the preset position relation, continuous interaction actions of the target and the preset warning area are sequentially identified in each subsequent continuous video frame; and determining whether a target event occurs or not according to the initial action corresponding to the target in the target detection frame and the interaction action formed by each continuation interaction action. By the method, the position relation between the target and the warning area is detected, whether the target event occurs is determined by combining the action state change of the target within a certain time, the accuracy of detecting the target event can be improved, and misjudgment is reduced.
Drawings
FIG. 1 is a diagram of an application environment for a target event detection method in one embodiment;
FIG. 2 is a flow chart of a method for detecting a target event according to one embodiment;
FIG. 3 is a schematic diagram of three representative wall areas determined from target points in one embodiment;
FIG. 4 is a flow diagram of classifying a current video frame by ResNet in one embodiment;
FIG. 5 is a flowchart of a method for detecting a target event according to another embodiment;
FIG. 6 is a flowchart of a method for detecting a target event according to another embodiment;
FIG. 7 is a flow diagram of a predicted target detection block for determining a target in a next video frame in one embodiment;
FIG. 8 is a block diagram of a target event detection device in one embodiment;
Fig. 9 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The target event detection method provided by the application can be applied to an application environment shown in figure 1. Wherein terminal 102 communicates with terminal 104 over a network. After the terminal 104 acquires the current video frame from the terminal 102, the terminal 104 identifies a preset warning area in the current video frame, performs target detection on the current video frame, and obtains a target detection frame when the target is detected in the current video frame, if the target detection frame and the preset warning area meet the condition that the target detection frame and the preset warning area meet the preset position relation, the continuous interaction action of the target and the preset warning area is sequentially identified in each subsequent continuous video frame; and determining whether a target event occurs or not according to the initial action corresponding to the target in the target detection frame and the interaction action formed by each continuation interaction action. The terminal 102 may be, but not limited to, various video capturing devices, video imaging devices, etc., and in other embodiments, the terminal 102 may be various devices connected to the video capturing devices, etc. for storing captured video data; the terminal 104 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
In other embodiments, the above method for detecting a target event may also be used in an application scenario of a terminal and a server, where the terminal may be, but not limited to, various video capturing devices, video imaging devices, or devices connected to the video capturing devices and the like for storing the captured video data, and the server may be implemented by a separate server or a server cluster formed by multiple servers.
In one embodiment, as shown in fig. 2, a target event detection method is provided, and the method is applied to the terminal 104 in fig. 1 for illustration, and includes steps S210 to S240.
Step S210, a current video frame is acquired, and a preset warning area is identified from the current video frame.
The target event detection method in the embodiment can be applied to detection of the monitoring picture acquired by the video acquisition equipment such as the monitoring camera and the like, and whether the target event occurs in the monitoring picture or not is judged. In security services, for some special locations, it may be necessary to monitor whether some specific target events occur here, and when a specific target event occurs, a warning message may be generated to remind relevant staff. In one embodiment, the terminal 104 may directly obtain the video frame from the monitoring camera or the like for analysis; in another embodiment, the terminal 104 may acquire video frame data from a device such as a monitoring camera for storage, and then the video frame acquisition module in the target event detection device of the terminal 104 acquires video from the module for storing video frames for analysis; it will be appreciated that, in another embodiment, the frames captured by the monitoring camera may also be stored by a device connected to the monitoring camera, and in this embodiment, the terminal 104 may also acquire video frames from the device storing video frames.
In one embodiment, the current video frame is a video frame at a current time of the video; it can be understood that the picture in the current video frame is the picture of the monitored specific position, in this embodiment, the preset alert area in the video frame needs to be detected to determine whether the target event occurs. In a specific embodiment, the target event detection method is used for detecting whether a human body wall turnover event occurs, and the preset warning area is a wall area.
Further, in one embodiment, determining the preset alert zone from the current video frame includes: and acquiring target point position information, and determining a preset warning area based on the target point position information. The target point position information can be input by a user, for example, the user can specifically click the selected target point in the display screen through a mouse to input the position information of the target point, and for example, the user can directly input the position information of the target point; the target point includes at least four. In this embodiment, the user may input a boundary point of the preset alert area as the target point, and the server determines the preset alert area based on the position information of the target point after acquiring the position information of the target point. Further, after the server obtains the positions of the target points, the server is sequentially connected with the target points to obtain a preset warning area.
In a specific embodiment, the preset warning area is a wall area, taking four target point positions as an example, the server is connected with the first two target points and marked as a wall line, and the second two target points are connected with and marked as a wall foot line, and are visualized on an interface; wherein the first two target points and the second two target points are distinguished according to the sequence input by the user. In the terminal, three areas are recorded, wherein the first two target points form a wall line, and the second two target points form a double-line area of a wall foot line; the two rear target points form a wall head line, and the two front target points form a double-line area of the wall foot line; and a preset warning area formed by the four target points together, as shown in fig. 3, three schematic diagrams representing wall areas determined according to the target points in this embodiment are shown.
In another embodiment, determining the preset alert zone from the current video frame includes: and identifying a preset boundary position point in the current video frame, and determining a preset warning area according to the identified preset boundary position point. In this embodiment, a target with an identifiable property is set for the preset alert area, and a preset boundary position point can be obtained by identifying a preset target in the current video frame, so that the position of the preset alert area is determined according to the preset boundary position point.
Step S220, performing object detection in the current video frame, and obtaining an object detection frame when an object is detected in the current video frame.
The object detection is also called object extraction, is image segmentation based on object geometric and statistical characteristics, integrates object segmentation and recognition, and has accuracy and instantaneity which are important capabilities of the whole system. Further, the target can be specifically identified for the current video frame through a classification network in the neural network. When a target is detected in the current video frame, a frame which encloses the target is obtained, namely a target detection frame. In one embodiment, obtaining the target detection box includes: obtaining the coordinate position of the target detection frame; such as coordinate positions corresponding to four vertices of the target detection frame.
In this embodiment, the target is the subject of execution in the target event to be detected, and in a specific embodiment, the target event is a human body crossing the wall, and the target is a human body. In one embodiment, one or more targets may be detected in the current video frame.
Further, in one embodiment, performing object detection in the current video frame and generating a detection box when an object is detected may be implemented by a neural network for object detection; in one embodiment, any type of neural network for object detection may be used to implement the object detection process for the current video frame.
Step S230, if the target detection frame and the preset warning area meet the preset position relation condition, sequentially identifying continuous interaction actions of the target and the preset warning area in each subsequent continuous video frame.
The preset position relation condition to be met by the target detection frame and the preset warning area can be set according to actual conditions. In one embodiment, when the effective area of the target detection frame intersects with the preset warning area, judging that the target detection frame and the preset warning area meet the preset position relation condition; the effective area of the target detection frame is a preset position part in the target detection frame.
The intersection of the effective area of the target detection frame and the preset warning area indicates that the intersection exists between the effective area of the target detection frame and the area included in the preset warning area. In a specific embodiment, the effective area of the target detection frame refers to 20% of the target detection frame near the lower edge, and when the target is a human body, the 20% of the target detection frame near the lower edge corresponds to the foot position of the standing human body.
Each subsequent continuous video frame refers to each continuous video frame after the current video frame; the continuous interaction action of the target and the preset warning area comprises interaction action between the target and the preset warning area, and the action of the target in each subsequent continuous video frame forms continuous action which is recorded as continuous interaction action of the target and the preset warning area. In one embodiment, for each subsequent continuous video frame, it is also required to detect whether the target detection frame and the preset alert area meet a preset positional relationship condition, and if so, executing a step of continuing interaction between the target identification frame and the preset alert area for each video frame. In one embodiment, sequentially identifying continued interaction of the target with the preset alert zone within subsequent successive video frames includes: if the continuous target detection frame corresponding to the target and the preset warning area meet the preset position relation condition in each subsequent continuous video frame, sequentially identifying the actions corresponding to the target in each subsequent continuous video frame, and determining the actions as continuous interaction actions of the target and the preset warning area.
In one embodiment, the recognition of the continued interaction of the target with the preset alert zone within each video frame may be accomplished through a classification neural network; in a specific embodiment, resNet (Residual Network) may be used to implement image feature extraction and identification of the detection frame, and determine whether a target appears in the detection frame. Wherein the residual network is determined in advance by training. The residual network is characterized by easy optimization, and can improve the accuracy by increasing a considerable depth, so that a classification result with higher accuracy can be obtained.
In one embodiment, the continuous interaction of the target and the preset warning area in each subsequent continuous video frame is sequentially identified, and the method further comprises: extracting image features from targets in target detection frames in subsequent continuous video frames in sequence, classifying actions based on the extracted image features, determining action classification results corresponding to the targets in the target detection frames, and obtaining action classification results corresponding to the videos; and obtaining continuous interaction actions of the target and the preset warning area according to action classification results corresponding to the subsequent continuous video frames.
Further, in one embodiment, the actions of the targets in the target detection frame are classified by a preset classifier. Still further, in one embodiment, the preset classifier provides a plurality of preset action categories. For example, when the target is a human body, the action classification result corresponding to the target is an action of the human body in the current video frame, and for example, the preset action categories specifically include standing, bending, squatting, lifting hands, lifting legs and other actions. Inputting the target detection frame into a preset classifier to obtain the action category corresponding to the target output by the preset classifier. In a specific embodiment, the preset classifier may be implemented using ResNet (residual network).
In this embodiment, an effective area is set for the target detection frame, when the effective area of the target detection frame intersects with a preset alert area, an effective action is considered to be detected, and at this time, tracking is started on the target, and a continuous interaction action between the target and the preset alert area in each subsequent continuous video frame is identified.
In one particular embodiment, target tracking for a target may be accomplished using IoU Tracker (IoU tracker). IoU (Intersection over Union, intersection ratio) calculate the intersection ratio (ratio of intersection and union) of "predicted border" and "real border". In this embodiment, specifically, by calculating the cross ratio of the target detection frames of the targets detected in the front and rear adjacent video frames, it is determined whether the same target appears in the two adjacent video frames; wherein, the "predicted frame" represents the target detection frame in the next video frame in the two adjacent video frames, and the "real frame" represents the target detection frame in the previous video frame in the two adjacent video frames.
In this embodiment, a IoU tracker determines whether the same target in the current video frame exists in each subsequent continuous video frame; after determining that the same target exists in each subsequent continuous video frame in the current video frame, carrying out continuous interaction action for identifying the target and a preset warning area for each subsequent video frame.
The interaction action generated by the same target and the preset warning area in each subsequent continuous video frame is the continuous interaction action in the embodiment; in one embodiment, detecting the position relationship between the target and the preset warning area in each subsequent continuous video frame, and determining the corresponding action of the target when the target and the preset warning area meet the interaction condition, namely, the continuous interaction action of the target and the preset warning area; when the position of the target and the preset warning area is in an interaction state, the action of identifying the target is the continuous interaction action in the embodiment. In one embodiment, the target event is a human body crossing the wall, that is, an interaction between the human body and the wall in each subsequent continuous video frame.
In one embodiment, sequentially identifying continued interaction of the target with the preset alert zone within subsequent successive video frames includes: acquiring a next video frame, and identifying the continued interaction action of the target in the next video frame and a preset warning area when the same target is detected in the next video frame as the same target in the previous video frame; returning to the step of acquiring the next video frame. In this embodiment, tracking the target is achieved by circularly acquiring a next video frame, detecting whether the same target exists in the next video frame, and if so, identifying a continuous interaction action of the target and a preset warning area in the next video frame, so that continuous interaction actions of the target and the preset warning area in each video frame of a current video frame can be obtained.
In one embodiment, similar to detecting whether an object is present in the current video frame, a pending object detection box is determined in the next video frame after the next video frame is acquired; and when the image features extracted from the target detection frame determine that the target detection frame contains the target to be determined according to the image features and the target to be determined to be the same target, determining that the target is detected in the next video frame. Wherein the pending target detection frame of the next video frame may comprise one or more.
In this embodiment, the target detection frame determined in the next video frame is denoted as a pending target detection frame, and when image features are extracted from the pending target detection frame to determine that the pending target detection frame includes a pending target, whether the pending target is the same as the target in the current video frame or not is determined, specifically, the pending target is matched with the target in the current video frame. In a specific embodiment, ioU computations may be performed between the target in the next video frame and the target in the current video frame, and when the cross-over ratio between the target and the target is greater than the cross-over ratio threshold, it is determined that the target and the target are the same target.
In a specific embodiment, when a plurality of targets (target detection frames) are detected for a current video frame and a plurality of target detection frames exist in a next video frame, a hungarian algorithm is adopted to determine the target detection frame corresponding to each target detection frame, that is, the target corresponding to each target in the current video frame is determined for each target in the next video frame. The Hungary algorithm is a combined optimization algorithm for solving task allocation problems in polynomial time.
In another embodiment, the target event is another event, and the corresponding preset action sequence is set to another action sequence according to the actual situation.
Step S240, based on the interaction sequence composed of the initial action and each continued interaction action, it is determined whether the target event occurs.
The initial action is an action corresponding to the target in the target detection frame in the current video frame, and the continuous interaction is used as the action of the target in each subsequent continuous video frame of the current video frame; in one embodiment, the initial actions and the continuous interactive actions are arranged according to the time sequence, and the interactive action sequence corresponding to the target is obtained.
In one embodiment, determining whether a target event has occurred based on the initial action and the sequence of interaction actions comprising the continued interaction actions comprises: determining an interactive action sequence according to the initial action and each continuous interactive action according to the time sequence; and when the interaction action sequence accords with the preset action sequence of the target event, judging that the target event occurs.
In this embodiment, it is determined whether the interaction sequence is consistent with a preset action sequence of the target event, and if so, it is determined that the interaction between the target and the preset alert region occurs. The preset action sequence can be set according to actual conditions.
In one embodiment, the target event is a human body crossing a wall. In one embodiment, the interaction sequence is determined whether to respectively determine whether to sequentially conform to a preset action sequence corresponding to a target event of a human body crossing the wall, if so, the target event of the human body crossing the wall is determined to occur. In another embodiment, the human body can be divided into an action sequence of the upper wall, the middle wall and the lower wall of the human body, and the upper wall, the middle wall and the lower wall of the human body can be respectively divided into more refined actions, specifically, for example, the action of the human body changing from standing to climbing, the action of the human body changing from sitting to climbing, or the climbing action can be determined as the action of the upper wall of the human body; the action of the human body for bending over and the action of the human body for sitting can be determined as the action of the human body in the wall; the action of the human body lower wall can correspondingly comprise the actions of bending down to climb, bending down to sit, bending down to stand, or changing from sitting to standing. As shown in table 1, the corresponding action sequences of actions of the human body on, in, and under the wall; when any combination of action sequences is detected in the interaction action sequences to meet the action sequences of the upper wall, the middle wall and the lower wall of the human body, the target event that the human body passes through the wall can be judged. It will be appreciated that in other embodiments, the preset sequence of actions for the target event may be set to other sequences of actions.
Upper wall In walls Lower wall
Sequence of actions Standing up to climb Bending down Bending to climb
Sequence of actions Sitting and climbing Sitting seat Bending to sit
Sequence of actions Climbing up Bending to stand
Sequence of actions Sitting-standing
TABLE 1
In this embodiment, the target event is divided into a plurality of actions, a preset action sequence is determined for each action according to the sequence, and the interaction action sequence is compared with the preset action sequence to determine whether the preset action in the target event occurs; further, comparing the interaction action sequence with the preset action sequence of the target event, and judging that the target event occurs if the interaction action sequence is consistent with the preset action sequence of the target event. In this embodiment, not only interaction between the target and the preset alert area occurs at a certain moment, but also whether the target event occurs or not needs to be determined by combining the change of the interaction between the target and the preset alert area within a period of time, so that the accuracy of detecting the target event can be improved, and misjudgment can be reduced.
According to the target event detection method, after the current video frame is acquired, the preset warning area is identified in the current video frame, the target detection frame is determined, and when the area is detected in the middle of the target detection frame and the event starting condition is met, continuous interaction actions of the target and the preset warning area are sequentially identified in each subsequent continuous video frame; and determining whether a target event occurs or not according to the initial action corresponding to the target in the target detection frame and the interaction action formed by each continuation interaction action. The event starting condition is that the target detection frame and the preset warning area meet the preset position relation. By the method, the position relation between the target and the warning area is detected, whether the target event occurs is determined by combining the action state change of the target within a certain time, the accuracy of detecting the target event can be improved, and misjudgment is reduced.
In one embodiment, when the target is detected in the current video frame, after obtaining the target detection frame, the method further comprises: inputting the target detection frame into a preset neural network for classification, and obtaining a classification result; when the classification result is a specific target, entering a step of sequentially identifying continuous interaction actions of the target and the preset warning area in each subsequent continuous video frame if the target detection frame and the preset warning area meet the preset position relation condition; in this embodiment, the initial action is an initial action corresponding to a specific target in the target detection frame determined based on the classification result, and the continuous interaction action is a continuous interaction action between the specific target and the preset alert area in each subsequent continuous video frame determined based on the classification result.
The preset neural network can be a classification network which is determined in advance through training; whether the target is a specific target or not can be determined by the classification result, which is used for determining whether the target detection frame is a human body or a background, or whether the target detection frame is a specific target (such as a specific person or not). Further, in one embodiment, the identification of whether the target is a specific target and the action classification of the target may be completed through the same residual network, as shown in fig. 4, which is a flow chart of target detection and classification of the current video frame through ResNet, in this embodiment, resNet is a backbone network, and the head network predicts the probabilities of two attributes (the probability of detecting the target in the target detection frame and the probability of the action classification result corresponding to the target) respectively by using two full connection layers. Inputting ResNet the current video frame, outputting two classification results, namely the probability of the target in the detection frame in the current video frame and the probability of the action classification result corresponding to the target, namely determining whether the target detection frame comprises a specific target and the action of the target through a preset classification network.
In this embodiment, the target detection frame is classified by a preset neural network, so as to obtain whether the target corresponding to the target detection frame is a specific target or not, and the action corresponding to the target in the target detection frame, and the neural network is used for classifying, so as to obtain a relatively accurate classification result. In a specific embodiment that the target event is a person crossing the wall, determining the target detection frame as a person or a background through a preset neural network, and corresponding actions when the target detection frame is a person.
Further, as shown in fig. 5, in one embodiment, if the target detection frame and the preset alert area meet the preset positional relationship condition, further comprising step S510, constructing a target sequence, and storing an initial action corresponding to the target in the target detection frame in the target sequence; in this embodiment, the step of sequentially identifying the continuous interaction action between the target and the preset alert area in each subsequent continuous video frame includes step S520 of acquiring each subsequent continuous video frame, determining the continuous interaction action between the target and the preset alert area in each subsequent continuous video frame, and storing the continuous interaction action in the target sequence corresponding to the target.
Mathematically, a sequence is an object (or event) that is aligned; such that each element is either before or after the other elements. In this embodiment, the identified actions in each video frame are stored as elements in a target sequence, where the target sequence includes an initial action and each continued interaction action in sequence. In this embodiment, after the target sequence is constructed, the motion of the target in the target detection frame in the current video frame is identified, and is recorded as an initial motion, and is stored in the target sequence.
In one embodiment, when a plurality of targets are detected in a current video frame, respectively constructing corresponding target sequences for the plurality of targets; and respectively carrying out continuous interaction action on each target and a preset warning area in each subsequent continuous video frame, and respectively storing the continuous interaction action detected in each subsequent continuous video frame into a target sequence corresponding to the same target.
In this embodiment, by constructing a target sequence corresponding to the target, the initial motion of the target in the current video frame and the continuous interactive motion in each subsequent continuous video frame are stored, and when determining the interactive motion sequence, each continuous interactive motion can be directly read from the target sequence according to the time sequence to obtain the interactive motion sequence.
In another embodiment, as shown in fig. 6, after the target sequence is constructed, step S610 is further included: storing the target detection frame and the corresponding initial action in a target sequence; in this embodiment, after sequentially identifying the continuous interaction between the target and the preset alert area in each subsequent continuous video frame, the method further includes step S620: when the target is detected in the next video frame, a predicted target detection frame of the target in the next video frame is determined, and the predicted target detection frame is stored in a target sequence corresponding to the target.
In one embodiment, storing the target detection frame in the target sequence includes location information corresponding to the target detection frame. In this embodiment, for a target sequence, a target detection frame and a corresponding initial action are stored in the target sequence; when a plurality of targets exist in the current video frame, the targets can be distinguished through the position information of the target detection frame, when the continuous interaction actions corresponding to the targets are detected in the subsequent continuous video frames, the continuous interaction actions of the targets can be respectively identified based on the position information of the targets, and the identified continuous interaction actions of the targets and the preset warning areas are stored in the corresponding target sequences.
Further, in one embodiment, as shown in fig. 7, determining a predicted target detection box of a target in a next video frame includes step S710: acquiring a first target detection frame of a target in a previous video frame in a target sequence, and determining the first target detection frame as a reference target detection frame; step S720, calculating the cross-over ratio of the reference target detection frame and each target detection frame to be determined in the next video frame, and determining the target detection frame with the cross-over ratio larger than the cross-over ratio threshold value as the target prediction target detection frame in the next video frame. The pending target detection frame refers to a target detection frame in the next video frame.
The last video frame of the object in the object sequence refers to: the target corresponds to the stored initial action or continues the last frame of video frame corresponding to the interaction action in the target sequence; further, in order to better distinguish, in this embodiment, the corresponding target detection frame of the target in the previous video frame is denoted as a reference target detection frame, and each target detection frame in the next video frame is denoted as a pending target detection frame; it will be appreciated that there may be a plurality of pending target detection boxes. And respectively calculating the cross-over ratio between each undetermined target detection frame and the reference target detection frame, and determining the undetermined target detection frame with the cross-over ratio larger than the cross-over ratio threshold value as a predicted target detection frame of a target corresponding to the reference target detection frame in the next video frame. In a specific embodiment, when the object in the previous video frame is marked as the object a and the corresponding reference object detection frame X identifies the continued interaction in each video frame that is continuous, the object detection frame 3 is determined to be the predicted object detection frame of the object a in the next video frame, assuming that the multiple to-be-determined object detection frames 1,2, 3, … n detected in the next video frame respectively calculate the intersection ratio of each to-be-determined object detection frame 1,2, 3, … n to the reference object detection frame X, and assuming that the intersection ratio of the to-be-determined object detection frame 3 to the reference template detection frame is greater than the intersection ratio threshold.
In this embodiment, a target sequence corresponding to a target is constructed, a target detection frame (location information) and actions (including an initial action and a continuous interaction action) corresponding to the target identified in each video frame are stored, and the method and the device can be applied to a situation that a plurality of targets are detected in one video frame, and the plurality of targets are separately tracked, so as to respectively determine whether each target is executing actions corresponding to a target event.
In one embodiment, when the continuous interaction action of the target and the preset warning area is not recognized in each subsequent continuous video frame, acquiring the last update time corresponding to the continuous interaction action in the target sequence; and if the last updating time exceeds the preset time threshold value from the current time, ending updating the target sequence.
And when the target is not detected in each subsequent continuous video frame or the target is detected in each subsequent continuous video frame but no interaction exists between the target and the preset warning area, judging that the continuous interaction between the target and the preset warning area is not recognized in each subsequent continuous video frame.
The last update time corresponding to the continued interaction refers to the joining time corresponding to the last continued interaction added in the target sequence. In this embodiment, when no continuous interaction action between the target and the preset alert area is identified in each subsequent continuous video frame, the target sequence is determined to determine whether to continue to track and identify the target, specifically, the target is determined according to a time interval between a last update time corresponding to the continuous interaction action in the target sequence and a current time, and when the time interval exceeds a preset time threshold, the target may have left the preset alert area, or the action of the target event is not continuously executed, at this time, the update of the target may be ended, and the continuous interaction action of the target in each subsequent continuous video frame is not identified any more. In another embodiment, if the last updated time does not exceed the preset time threshold from the current time interval, the continuous interaction with the preset alert zone may be continuously identified for each subsequent continuous video frame.
In one embodiment, the number of actions stored in the target sequence is detected every preset time period, and when the number of actions stored in the target sequence is detected to be greater than a preset threshold, a step of determining whether a target event occurs based on the interaction action sequence is entered. In another embodiment, it is also possible to directly detect the existing interaction sequence in the target sequence after every preset time period, and determine whether the target event occurs. The preset time period can be set according to actual conditions. In another embodiment, when the existence of the target sequence is detected, each newly added continued interactive action in the target sequence and the interactive action sequence determined by the initial action in the target sequence are detected, namely whether the target event occurs or not is continuously detected as long as the target sequence exists, so that the timeliness of detection can be improved, and other problems caused by untimely detection are avoided. Further, in one embodiment, when it is determined that the target event occurs, alarm information is generated and sent to a corresponding preset supervisor. That is, when the occurrence of the target event is determined, alarm information can be generated to prompt related personnel to take corresponding measures.
In a specific embodiment, the method for detecting a target event is described in detail by taking a case of detecting a target event that a human body passes over a wall.
The method comprises the steps of obtaining a current video frame and identifying a preset warning area from the current video frame. When a human body is detected in a current video frame, a human body detection frame (the target detection frame) is obtained, an image feature is extracted from targets in the human body detection frame by adopting a preset classifier, and classification results of 2 attributes are output, wherein the classification results comprise whether the targets in the human body detection frame are human bodies or not and actions corresponding to the targets in the human body detection frame. On the one hand, non-human targets can be further screened (by means of the first output result, whether the person is judged; on the other hand, when the object in the human body detection frame is a human body, the human body characteristics in the detection frame are extracted, the human body characteristics are classified, and the action category corresponding to the human body in the human body detection frame is output. In one embodiment, the preset classifier provides 6 action state categories: standing, bending over, squatting, lifting hands, lifting legs, and others. In a specific embodiment, the preset classifier uses ResNet (residual network) as a backbone network, and the head network uses two full-connection layers to respectively predict the respective probabilities of two attributes (the probability of a human body being contained in a human body detection frame, and the probability of the corresponding action of the human body being each action category).
Meanwhile, the position relation between the human body detection frame of the current video frame and the preset warning area is judged, when the effective area (the effective area is 20% of the lower edge of the human body detection frame) of the human body detection frame is detected to be intersected with the preset warning area, the event starting condition is judged to be met, at the moment, a target sequence corresponding to the human body detection frame is constructed, and the position information of the human body detection frame and the corresponding initial action are stored in the target sequence. If a plurality of human body detection frames in the current video frame intersect with a preset warning area, respectively constructing target sequences corresponding to the human body detection frames.
And carrying out target tracking on each human body detection frame with the target sequence by adopting IoU Tracker. And acquiring a subsequent continuous next video frame, marking a human body detection frame in the current video frame as a candidate frame, determining a human body detection frame in the next video frame, and marking the human body detection frame as a prediction frame. And respectively matching each predicted frame in the next video frame with each candidate frame, performing IoU calculation, and then selecting a target sequence of the candidate frame corresponding to the matching of each predicted frame according to a Hungary algorithm. If IoU of the predicted frame generated by the candidate frame and the currently matched target sequence exceeds a set threshold, the matching is considered to be successful, and the predicted frame is added into the target sequence of the target. Otherwise the match fails. If a certain current target sequence is not matched with any prediction frame, determining whether to end the target sequence according to the last updating time in the target sequence; specifically, when the time interval between the last updated time and the current time exceeds a preset time threshold, the tracking of the target sequence is ended. Further, in one embodiment, if a certain current prediction frame does not match any target sequence, a target sequence corresponding to the prediction frame is newly created to record and track the target.
And recording the corresponding actions of each human body detection frame according to each target sequence, obtaining an interactive action sequence corresponding to the human body and the warning area, and then determining whether a target event that the human body passes through the wall body occurs according to the action state change of the target human body in the interactive action sequence.
According to the target event detection method, the position relation between the target and the warning area is detected, and whether the target event that the human body passes through the wall body is generated or not is determined by combining the action state change of the target within a certain time, so that the accuracy of detecting the target event can be improved, and misjudgment is reduced. Meanwhile, whether a target event occurs is determined according to the action state change within a certain time, and the method has the characteristics of high efficiency, accuracy and high robustness.
It should be understood that, although the steps in the flowcharts of fig. 2-7 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in FIGS. 2-7 may include multiple steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the steps or stages in other steps or other steps.
In one embodiment, as shown in fig. 8, there is provided a target event detection apparatus including: a region identification module 810, a target detection module 820, an action identification module 830, and a target event detection module 840, wherein:
The region identification module 810 is configured to obtain a current video frame, and identify a preset alert region from the current video frame;
The target detection module 820 is configured to perform target detection in the current video frame, and obtain a target detection frame when a target is detected in the current video frame;
the action recognition module 830 is configured to sequentially recognize continuous interaction actions of the target and the preset alert area in each subsequent continuous video frame if the target detection frame and the preset alert area meet the preset positional relationship condition;
The target event detection module 840 is configured to determine whether a target event occurs based on an initial action and an interaction action sequence formed by each continued interaction action, where the initial action is an action corresponding to a target in the target detection frame.
The target event detection device is used for identifying a preset warning area in a current video frame after acquiring the current video frame, carrying out target detection on the current video frame, obtaining a target detection frame when a target is detected in the current video frame, and sequentially identifying continuous interaction actions of the target and the preset warning area in each subsequent continuous video frame if the target detection frame and the preset warning area meet the condition that the target detection frame and the preset warning area meet the preset position relation; and determining whether a target event occurs or not according to the initial action corresponding to the target in the target detection frame and the interaction action formed by each continuation interaction action. By the method, the position relation between the target and the warning area is detected, whether the target event occurs is determined by combining the action state change of the target within a certain time, the accuracy of detecting the target event can be improved, and misjudgment is reduced.
In one embodiment, the apparatus further comprises: the target sequence construction module is used for constructing a target sequence and storing initial actions corresponding to the target detection frame in the target sequence; in this embodiment, the action recognition module 830 is configured to obtain each subsequent continuous video frame, determine a continuous interaction action between the target and the preset alert area in each subsequent continuous video frame, and store the continuous interaction action in a target sequence corresponding to the target.
In one embodiment, the target sequence construction module of the above device is further configured to store the target detection frame and the corresponding initial action in the target sequence; in this embodiment, the above-mentioned target detection module 820 is further configured to determine a predicted target detection frame of the target in the next video frame when the target is detected in the next video frame, and store the predicted target detection frame to a target sequence corresponding to the target.
In one embodiment, the object detection module 820 of the above apparatus includes: the acquisition unit is used for acquiring a target detection frame of a target in a previous video frame in the target sequence and determining the target detection frame as a reference target detection frame; and the cross-over ratio calculating unit is used for calculating the cross-over ratio of the reference target detection frame and each target detection frame to be determined in the next video frame, and determining the target detection frame to be determined, of which the cross-over ratio is larger than the cross-over ratio threshold value, as a predicted target detection frame of the target in the next video frame. The target detection frame to be determined is a prediction target detection frame of a target in a next video frame.
In one embodiment, the apparatus further comprises: the updating time acquisition module is used for acquiring the last updating time corresponding to the continuous interaction action in the target sequence when the continuous interaction action of the target and the preset warning area is not recognized in each subsequent continuous video frame; and the judging module is used for ending the updating of the target sequence if the last updating time exceeds the preset time threshold value from the current time.
In one embodiment, the target event detection module 840 includes: the interactive action sequence determining unit is used for determining an interactive action sequence according to the initial action and each continuous interactive action according to the time sequence; and the judging unit is used for judging that the target event occurs when the interaction action sequence accords with the preset action sequence of the target event.
In one embodiment, the motion recognition module 830 of the above apparatus is specifically configured to, if the continuous target detection frame corresponding to the target and the preset alert area satisfy the preset positional relationship condition in each subsequent continuous video frame, sequentially recognize the motion corresponding to the target in each subsequent continuous video frame, and determine that the motion is a continuous interaction motion between the target and the preset alert area.
In one embodiment, the device further comprises a classification module, wherein the classification module is used for inputting the target detection frame into a preset neural network to classify, and obtaining a classification result; in this embodiment, when the classification result is a specific target, the jump-to-identification module 830 executes continuous interaction between the target and the preset alert area in each subsequent continuous video frame when the target detection frame and the preset alert area satisfy the preset positional relationship condition. In this embodiment, the initial action is an initial action corresponding to a specific target in the target detection frame determined based on the classification result, and the continuous interaction action is a continuous interaction action between the specific target and the preset alert area in each subsequent continuous video frame determined based on the classification result.
For specific limitations of the target event detection apparatus, reference may be made to the above limitations of the target event detection method, and no further description is given here. The respective modules in the above-described object event detection apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a terminal, and the internal structure thereof may be as shown in fig. 9. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of target event detection. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by persons skilled in the art that the architecture shown in fig. 9 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting as to the computer device to which the present inventive arrangements are applicable, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:
Acquiring a current video frame, and identifying a preset warning area from the current video frame;
Performing target detection in the current video frame, and obtaining a target detection frame when a target is detected in the current video frame;
If the target detection frame and the preset warning area meet the preset position relation condition, sequentially identifying continuous interaction actions of the target and the preset warning area in each subsequent continuous video frame;
And determining whether a target event occurs or not based on an initial action and an interaction action sequence formed by each continuous interaction action, wherein the initial action is an action corresponding to a target in a target detection frame.
In one embodiment, the processor when executing the computer program further performs the steps of: constructing a target sequence, and storing initial actions corresponding to the targets in the target detection frame in the target sequence; and acquiring each subsequent continuous video frame, determining continuous interaction action of the target and a preset warning area in each subsequent continuous video frame, and storing the continuous interaction action into a target sequence corresponding to the target.
In one embodiment, the processor when executing the computer program further performs the steps of: storing the target detection frame and the corresponding initial action in a target sequence; and when the target is detected in the next video frame, determining a predicted target detection frame of the target in the next video frame, and storing the predicted target detection frame into a target sequence corresponding to the target.
In one embodiment, the processor when executing the computer program further performs the steps of: acquiring a target detection frame of a target in a previous video frame in a target sequence, and determining the target detection frame as a reference target detection frame; calculating the cross-over ratio of the reference target detection frame and each target detection frame to be determined in the next video frame, and determining the target detection frame to be determined, of which the cross-over ratio is larger than the cross-over ratio threshold value, as a predicted target detection frame of the target in the next video frame.
In one embodiment, the processor when executing the computer program further performs the steps of: when the continuous interaction action of the target and the preset warning area is not recognized in each subsequent continuous video frame, acquiring the last update time corresponding to the continuous interaction action in the target sequence; and if the last updating time exceeds the preset time threshold value from the current time, ending updating the target sequence.
In one embodiment, the processor when executing the computer program further performs the steps of: determining an interactive action sequence according to the initial action and each continuous interactive action according to the time sequence; and when the interaction action sequence accords with the preset action sequence of the target event, judging that the target event occurs.
In one embodiment, the processor when executing the computer program further performs the steps of: when a part of the preset position area in the target detection frame is intersected with the preset warning area, judging that the target detection frame and the preset warning area meet the preset position relation condition.
In one embodiment, the processor when executing the computer program further performs the steps of: if the continuous target detection frame corresponding to the target and the preset warning area meet the preset position relation condition in each subsequent continuous video frame, sequentially identifying the actions corresponding to the target in each subsequent continuous video frame, and determining the actions as continuous interaction actions of the target and the preset warning area.
In one embodiment, inputting a target detection frame into a preset neural network for classification to obtain a classification result; and when the classification result is a specific target, entering a step of sequentially identifying continuous interaction actions of the target and the preset warning area in each subsequent continuous video frame if the target detection frame and the preset warning area meet the preset position relation condition.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:
Acquiring a current video frame, and identifying a preset warning area from the current video frame;
Performing target detection in the current video frame, and obtaining a target detection frame when a target is detected in the current video frame;
If the target detection frame and the preset warning area meet the preset position relation condition, sequentially identifying continuous interaction actions of the target and the preset warning area in each subsequent continuous video frame;
And determining whether a target event occurs or not based on an initial action and an interaction action sequence formed by each continuous interaction action, wherein the initial action is an action corresponding to a target in a target detection frame.
In one embodiment, the computer program when executed by the processor further performs the steps of: constructing a target sequence, and storing initial actions corresponding to the targets in the target detection frame in the target sequence; and acquiring each subsequent continuous video frame, determining continuous interaction action of the target and a preset warning area in each subsequent continuous video frame, and storing the continuous interaction action into a target sequence corresponding to the target.
In one embodiment, the computer program when executed by the processor further performs the steps of: storing the target detection frame and the corresponding initial action in a target sequence; and when the target is detected in the next video frame, determining a predicted target detection frame of the target in the next video frame, and storing the predicted target detection frame into a target sequence corresponding to the target.
In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring a target detection frame of a target in a previous video frame in a target sequence, and determining the target detection frame as a reference target detection frame; calculating the cross-over ratio of the reference target detection frame and each target detection frame to be determined in the next video frame, and determining the target detection frame to be determined, of which the cross-over ratio is larger than the cross-over ratio threshold value, as a predicted target detection frame of the target in the next video frame.
In one embodiment, the computer program when executed by the processor further performs the steps of: when the continuous interaction action of the target and the preset warning area is not recognized in each subsequent continuous video frame, acquiring the last update time corresponding to the continuous interaction action in the target sequence; and if the last updating time exceeds the preset time threshold value from the current time, ending updating the target sequence.
In one embodiment, the computer program when executed by the processor further performs the steps of: determining an interactive action sequence according to the initial action and each continuous interactive action according to the time sequence; and when the interaction action sequence accords with the preset action sequence of the target event, judging that the target event occurs.
In one embodiment, the computer program when executed by the processor further performs the steps of: when a part of the preset position area in the target detection frame is intersected with the preset warning area, judging that the target detection frame and the preset warning area meet the preset position relation condition.
In one embodiment, the computer program when executed by the processor further performs the steps of: if the continuous target detection frame corresponding to the target and the preset warning area meet the preset position relation condition in each subsequent continuous video frame, sequentially identifying the actions corresponding to the target in each subsequent continuous video frame, and determining the actions as continuous interaction actions of the target and the preset warning area.
In one embodiment, the computer program when executed by the processor further performs the steps of: inputting the target detection frame into a preset neural network for classification, and obtaining a classification result; and when the classification result is a specific target, entering a step of sequentially identifying continuous interaction actions of the target and the preset warning area in each subsequent continuous video frame if the target detection frame and the preset warning area meet the preset position relation condition.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (12)

1. A method of detecting a target event, the method comprising:
acquiring a current video frame, and identifying a preset warning area from the current video frame;
performing target detection in the current video frame, and obtaining a target detection frame when a target is detected in the current video frame;
If the target detection frame and the preset warning area meet the preset position relation condition, and the continuous target detection frame corresponding to the target in each subsequent continuous video frame and the preset warning area meet the preset position relation condition, sequentially identifying actions corresponding to the target in each subsequent continuous video frame, and determining the actions as continuous interaction actions of the target and the preset warning area;
determining an interactive action sequence according to the initial action and each continued interactive action according to the time sequence; and when the interaction action sequence accords with a preset action sequence of a target event, judging that the target event occurs, wherein the initial action is taken as an action corresponding to the target in the target detection frame.
2. The method of claim 1, further comprising the step of, if the target detection frame and the preset alert zone satisfy a preset positional relationship condition: constructing a target sequence, and storing the initial action corresponding to the target in the target detection frame in the target sequence;
sequentially identifying continuous interaction actions of the target and the preset warning area in each subsequent continuous video frame, wherein the continuous interaction actions comprise the following steps:
and acquiring each video frame which is continuous in the follow-up process, determining continuous interaction action between the target and the preset warning area in each video frame which is continuous in the follow-up process, and storing the continuous interaction action into a target sequence corresponding to the target.
3. The method of claim 2, wherein after the constructing the target sequence, further comprising: storing the target detection frame and the corresponding initial action in a target sequence;
After sequentially identifying the continuous interaction action of the target and the preset warning area in each subsequent continuous video frame, the method further comprises the following steps:
And when the target is detected in the next video frame, determining a predicted target detection frame of the target in the next video frame, and storing the predicted target detection frame into a target sequence corresponding to the target.
4. The method of claim 3, wherein said determining a predicted target detection box for the target in the next video frame comprises:
Acquiring a target detection frame of the target in the target sequence in a previous video frame, and determining the target detection frame as a reference target detection frame;
Calculating the cross ratio of the reference target detection frame to each target detection frame to be determined in the next video frame, and determining the target detection frame to be determined, of which the cross ratio is larger than a cross ratio threshold value, as a predicted target detection frame of the target in the next video frame; wherein the pending target detection frame is a target detection frame in the next video frame.
5. The method according to claim 2, characterized in that:
When the continuous interaction action of the target and the preset warning area is not recognized in each subsequent continuous video frame, acquiring the last update time corresponding to the continuous interaction action in the target sequence;
and if the last updating time exceeds the current time by a preset time threshold value, ending updating the target sequence.
6. The method according to any one of claims 1 to 5, characterized in that:
when a part of the preset position area in the target detection frame is intersected with the preset warning area, judging that the target detection frame and the preset warning area meet a preset position relation condition.
7. The method of claim 1, further comprising, after obtaining a target detection frame when a target is detected in the current video frame:
Inputting the target detection frame into a preset neural network for classification to obtain a classification result;
When the classification result is a specific target, entering a step of sequentially identifying continuous interaction actions of the target and the preset warning area in each subsequent continuous video frame if the target detection frame and the preset warning area meet the preset position relation condition;
the initial action is the initial action corresponding to the specific target in the target detection frame determined based on the classification result, and the continuous interaction action is the continuous interaction action between the specific target and the preset warning area in each subsequent continuous video frame determined based on the classification result.
8. A target event detection apparatus, the apparatus comprising:
the area identification module is used for acquiring a current video frame and identifying a preset warning area from the current video frame;
The target detection module is used for carrying out target detection in the current video frame, and when a target is detected in the current video frame, a target detection frame is obtained;
The action recognition module is used for sequentially recognizing actions corresponding to the targets in each subsequent continuous video frame and determining continuous interaction actions of the targets and the preset warning area if the target detection frame and the preset warning area meet the preset position relation condition and the continuous target detection frame corresponding to the targets in each subsequent continuous video frame and the preset warning area meet the preset position relation condition;
The target event detection module is used for determining an interaction action sequence according to the initial action and each continued interaction action according to the time sequence; and when the interaction action sequence accords with a preset action sequence of a target event, judging that the target event occurs, wherein the initial action is taken as an action corresponding to the target in the target detection frame.
9. The apparatus of claim 8, wherein the apparatus further comprises:
The target sequence construction module is used for constructing a target sequence and storing the initial action corresponding to the target in the target detection frame in the target sequence;
The action recognition module is also used for acquiring each video frame in the follow-up succession, determining the continuous interaction action of the target and the preset warning area in each video frame in the follow-up succession, and storing the continuous interaction action into a target sequence corresponding to the target.
10. The apparatus according to claim 9, wherein:
the target sequence construction module is further used for storing the target detection frame and the corresponding initial actions in a target sequence;
The target detection module is further configured to determine a predicted target detection frame of the target in a next video frame when the target is detected in the next video frame, and store the predicted target detection frame into a target sequence corresponding to the target.
11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 8 when the computer program is executed.
12. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 8.
CN202010521931.1A 2020-06-10 2020-06-10 Target event detection method, device, computer equipment and storage medium Active CN111860140B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010521931.1A CN111860140B (en) 2020-06-10 2020-06-10 Target event detection method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010521931.1A CN111860140B (en) 2020-06-10 2020-06-10 Target event detection method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111860140A CN111860140A (en) 2020-10-30
CN111860140B true CN111860140B (en) 2024-05-17

Family

ID=72987457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010521931.1A Active CN111860140B (en) 2020-06-10 2020-06-10 Target event detection method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111860140B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380971B (en) * 2020-11-12 2023-08-25 杭州海康威视数字技术股份有限公司 Behavior detection method, device and equipment
CN112419367B (en) * 2020-12-02 2023-04-07 中国人民解放军军事科学院国防科技创新研究院 Method and device for identifying specific target object
CN112711994A (en) * 2020-12-21 2021-04-27 航天信息股份有限公司 Method and system for detecting illegal operation behaviors based on scene recognition
CN113449180B (en) * 2021-05-07 2022-10-28 浙江大华技术股份有限公司 Method and device for analyzing peer relationship and computer readable storage medium
CN113591589B (en) * 2021-07-02 2022-09-27 北京百度网讯科技有限公司 Product missing detection identification method and device, electronic equipment and storage medium
CN114090802A (en) * 2022-01-13 2022-02-25 深圳市猿人创新科技有限公司 Data storage and search method, device and equipment based on embedded equipment
CN115063741B (en) * 2022-06-10 2023-08-18 嘉洋智慧安全科技(北京)股份有限公司 Target detection method, device, equipment, medium and product
CN114973165B (en) * 2022-07-14 2022-10-25 浙江大华技术股份有限公司 Event recognition algorithm testing method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104200466A (en) * 2014-08-20 2014-12-10 深圳市中控生物识别技术有限公司 Early warning method and camera
CN110400332A (en) * 2018-04-25 2019-11-01 杭州海康威视数字技术股份有限公司 A kind of target detection tracking method, device and computer equipment
WO2020001216A1 (en) * 2018-06-26 2020-01-02 杭州海康威视数字技术股份有限公司 Abnormal event detection
CN111046797A (en) * 2019-12-12 2020-04-21 天地伟业技术有限公司 Oil pipeline warning method based on personnel and vehicle behavior analysis

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070058040A1 (en) * 2005-09-09 2007-03-15 Objectvideo, Inc. Video surveillance using spatial-temporal motion analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104200466A (en) * 2014-08-20 2014-12-10 深圳市中控生物识别技术有限公司 Early warning method and camera
CN110400332A (en) * 2018-04-25 2019-11-01 杭州海康威视数字技术股份有限公司 A kind of target detection tracking method, device and computer equipment
WO2020001216A1 (en) * 2018-06-26 2020-01-02 杭州海康威视数字技术股份有限公司 Abnormal event detection
CN111046797A (en) * 2019-12-12 2020-04-21 天地伟业技术有限公司 Oil pipeline warning method based on personnel and vehicle behavior analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种改进的CAMShift跟踪算法及人脸检测框架;杨超;蔡晓东;王丽娟;朱利伟;;计算机工程与科学(第09期);全文 *
复杂背景下的视频前景检测方法研究;陈震;张紫涵;曾希萌;;数学的实践与认识(第15期);全文 *

Also Published As

Publication number Publication date
CN111860140A (en) 2020-10-30

Similar Documents

Publication Publication Date Title
CN111860140B (en) Target event detection method, device, computer equipment and storage medium
US10614316B2 (en) Anomalous event retriever
US9684835B2 (en) Image processing system, image processing method, and program
JPWO2018025831A1 (en) People flow estimation device, people flow estimation method and program
CN112001932B (en) Face recognition method, device, computer equipment and storage medium
US10586115B2 (en) Information processing device, information processing method, and computer program product
US11429985B2 (en) Information processing device calculating statistical information
US20230353711A1 (en) Image processing system, image processing method, and program
CN109886223B (en) Face recognition method, bottom library input method and device and electronic equipment
CN110660078B (en) Object tracking method, device, computer equipment and storage medium
CN113139403A (en) Violation behavior identification method and device, computer equipment and storage medium
CN111553234A (en) Pedestrian tracking method and device integrating human face features and Re-ID feature sorting
CN111832561B (en) Character sequence recognition method, device, equipment and medium based on computer vision
CN114067431A (en) Image processing method, image processing device, computer equipment and storage medium
CN111563492B (en) Fall detection method, fall detection device and storage device
US20230326041A1 (en) Learning device, learning method, tracking device, and storage medium
CN109241942B (en) Image processing method and device, face recognition equipment and storage medium
US20220108445A1 (en) Systems and methods for acne counting, localization and visualization
CN116071784A (en) Personnel illegal behavior recognition method, device, equipment and storage medium
CN111179343B (en) Target detection method, device, computer equipment and storage medium
JP2018142137A (en) Information processing device, information processing method and program
US20220405894A1 (en) Machine learning device, machine learning method, andrecording medium storing machine learning program
CN110956644B (en) Motion trail determination method and system
CN112784691A (en) Target detection model training method, target detection method and device
US11587318B1 (en) Video target tracking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant