CN113553952A

CN113553952A - Abnormal behavior recognition method and device, equipment, storage medium and program product

Info

Publication number: CN113553952A
Application number: CN202110837521.2A
Authority: CN
Inventors: 苏海昇
Original assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2021-10-26

Abstract

The embodiment of the application discloses an abnormal behavior identification method, which comprises the following steps: selecting a first image sequence from the acquired first video; acquiring a stored second image sequence based on the first image sequence; wherein the second image sequence at least comprises a history image frame with the acquisition time before the first image sequence; determining an image sequence to be identified based on the first image sequence and the second image sequence; and carrying out abnormal behavior recognition on the image sequence to be recognized by utilizing the trained abnormal behavior recognition model. The embodiment of the application also provides an abnormal behavior recognition device, equipment, a storage medium and a program product.

Description

Abnormal behavior recognition method and device, equipment, storage medium and program product

Technical Field

The present application relates to the field of computer vision, and relates to, but is not limited to, a method and apparatus for identifying abnormal behavior, a device, a storage medium, and a program product.

Background

The detection of anomalies in video is an important problem in the field of computer vision, and has wide application in the field of smart cities, such as detection of dangerous personal safety behaviors, traffic accidents, some unusual events and the like.

It is extremely difficult to identify anomalous events in an online video stream. A possible challenge is how to fully and efficiently utilize past video at the present time. For such a situation, it is a common practice to input a plurality of frames of images newly acquired before the current time into the abnormal behavior recognition network as a window to determine the result. However, such an approach has two drawbacks: firstly, the long-range semantics are lost; secondly, a longer delay is caused.

Disclosure of Invention

The embodiment of the application provides an abnormal behavior identification method, an abnormal behavior identification device, equipment, a storage medium and a program product.

The technical scheme of the embodiment of the application is realized as follows:

in a first aspect, an embodiment of the present application provides an abnormal behavior identification method, including:

selecting a first image sequence from the acquired first video;

acquiring a stored second image sequence; wherein the second image sequence at least comprises a history image frame with the acquisition time before the first image sequence;

determining an image sequence to be identified based on the first image sequence and the second image sequence;

and carrying out abnormal behavior recognition on the image sequence to be recognized by utilizing the trained abnormal behavior recognition model.

In some possible embodiments, said obtaining a stored second sequence of images based on said first sequence of images comprises: selecting at least two frames of the historical image frames from a second video as the second image sequence; the first video and the second video have the same content attribute.

In this way, by identifying images with the same content attribute, the accuracy of identification is improved.

In some possible embodiments, the first video and the second video are videos acquired from the same target object in different time periods; or the first video and the second video are videos collected from the same target area in different time periods.

In this way, the first video and the second video are set as videos acquired from the same target object or the same target area in sequence, so that the first image sequence and the second image sequence respectively selected from the first video and the second video are new unprocessed images and earlier images with the same content attribute in sequence, thereby fully utilizing historical information of the past time to sense fully and avoiding the loss of long-range semantics and longer delay in model identification.

In some possible embodiments, said selecting a first sequence of images from the captured first video comprises: determining a first sampling range based on the type of the abnormal behavior to be identified; and selecting the image frames in the first sampling range from the acquired first video as the first image sequence.

Therefore, according to different abnormal behavior types, the maximum time range corresponding to frame sampling is limited, and the accuracy of abnormal behavior identification and the sensitivity of short-time action segments are ensured.

In some possible embodiments, the first image sequence includes an image frame with a current time as a termination time, and the acquiring a stored second image sequence based on the first image sequence includes: determining a second sampling range in the second video based on a historical time before the current time; the second sampling range takes the historical time as a termination time; and selecting the image frame in the second sampling range as the second image sequence.

In this way, by defining the first image sequence as an image frame sampled before the current time and determining the second sampling range of the second image sequence based on the historical time, it is ensured that the second image sequence sampled from the second video includes an image frame with an earlier acquisition time relative to the first image sequence, thereby realizing the acquisition of information with a longer time course.

In some possible embodiments, the determining a sequence of images to be identified based on the first sequence of images and the second sequence of images includes: sampling a first preset number of image frames from the first image sequence as a first candidate set; sampling a second preset number of image frames from the second image sequence as a second candidate set; and taking the first candidate set and the second candidate set as the image sequence to be identified.

In this way, the image sequence to be identified is formed by sampling a part of image frames from the first image sequence and the second image sequence which are acquired at two different time periods, so that the online abnormal behavior identification can be realized by using a longer time course.

In some possible embodiments, the number of frames of image frames in the first image sequence is the same as the number of frames of image frames in the second image sequence; the first preset quantity and the second preset quantity have the same value.

Therefore, the image frames with the same frame number in the first image sequence and the second image sequence are limited, and half of the image frames are respectively sampled to form the image sequence to be identified, so that the image sequence to be identified comprises symmetrical historical information and unprocessed information, and the perception capability of identifying abnormal behaviors can be better improved.

In some possible embodiments, the method further comprises: and updating the second image sequence by using the image sequence to be identified.

In this way, the second image sequence dynamically stores the processed image frames with each iteration process, so that the image frames participating in abnormal behavior recognition in the next iteration are the image frames which are closest in time or semantically related to the unprocessed image frames, and thus, the absence of long-range semantics or long delay in model recognition is avoided.

In some possible embodiments, the image sequence to be identified includes N image frames, and the number of the image frames in the second image sequence is N, where N is an integer greater than or equal to 2; the updating the second image sequence by using the image sequence to be identified comprises: and taking the image sequence to be identified as a new second image sequence.

In this way, the image sequence to be recognized and the second image sequence are limited to have the same number of image frames, the image sequence to be recognized processed in the current iteration is used as the second image sequence in the next iteration to replace the old second image sequence, and the historical information of the past time can be efficiently and fully utilized.

In some possible embodiments, the abnormal behavior recognition model is trained by the following process: determining a first sequence of samples based on a video to be processed; wherein the first sequence of samples comprises at least a first image frame for starting abnormal behavior action and a second image frame for ending the abnormal behavior action; determining a second sequence of samples based on other image frames in the video to be processed except the first sequence of samples; and taking the first sequence of samples and the second sequence of samples as a training sample set, and training the abnormal behavior recognition model.

Therefore, the first sequence sample marked with the abnormal behavior action starting position and the abnormal behavior action ending position is determined from the video to be processed as the action sample, and then the second sequence sample marked with the abnormal behavior action in the video to be processed is selected as the background sample, so that the constructed background sample is added in the training process, and the generalization performance of the model in the online testing of the unclipped video stream can be improved.

In some possible embodiments, the determining a second sequence of samples based on other image frames in the video to be processed than the first sequence of samples includes: and selecting an image frame with the acquisition time before the first image frame and/or an image frame with the acquisition time after the second image frame from the video to be processed as the second sequence sample.

Therefore, the image frame before the first image frame and/or the image frame after the second image frame are/is selected from the video to be processed, and the difficult sample of the abnormal behavior action boundary is constructed, so that the perception capability and the generalization performance of the model to the action boundary are improved.

In some possible embodiments, the determining the first sequence of samples based on the video to be processed includes: acquiring an expansion coefficient from a preset sampling interval as a target expansion coefficient; wherein the sampling interval comprises expansion coefficients of at least two scales; determining a target sampling range according to a preset reference sampling duration and the target expansion coefficient; and performing frame sampling on the video to be processed based on the target sampling range to obtain the first sequence sample.

Thus, by dynamically acquiring an expansion coefficient in the training process, the corresponding time sampling range limit, namely the target sampling range, is obtained, and sampling is performed in the starting interval and the ending interval corresponding to the range, so that the maximum time range of frame sampling is limited. Meanwhile, a series of different sampling durations are set for dynamic selection, so that the robustness of the model to inconsistent sampling durations is improved.

In some possible embodiments, the determining a target sampling range according to a preset reference sampling duration and the target expansion coefficient includes: and on the basis of the reference sampling duration, expanding the reference sampling duration by using the target expansion coefficient to obtain the target sampling range.

In this way, the reference sampling duration is expanded by the target expansion coefficient, that is, the target expansion coefficient obtained during training is multiplied by the reference sampling duration to obtain a corresponding sampling range, which is the target sampling range. Therefore, the sampling range of the video with indefinite length is limited when the abnormal behavior recognition model is trained offline.

In some possible embodiments, the frame sampling the video to be processed based on the target sampling range to obtain the first sequence of samples includes: acquiring the initial sampling time of the original video; determining the sampling termination time according to the target sampling range and the initial sampling time; sampling the video between the initial sampling time and the final sampling time to obtain at least two frames of images; taking the at least two frame images as the first sequence of samples.

Therefore, an initial sampling time is randomly assigned in the training process, an end sampling time is determined according to a target sampling range, and then an interval between the initial sampling time and the end sampling time in the original video is sampled to obtain a first sequence sample. Therefore, the off-line training process is consistent with the on-line testing frequency, the testing result can be aligned with the on-line, and the robustness of the model to frame sampling is improved.

In a second aspect, an embodiment of the present application provides an abnormal behavior identification apparatus, including:

the first acquisition module is used for selecting a first image sequence from the acquired first video;

a second obtaining module, configured to obtain a stored second image sequence; wherein the second image sequence at least comprises a history image frame with the acquisition time before the first image sequence;

a first determining module, configured to determine an image sequence to be identified based on the first image sequence and the second image sequence;

and the behavior recognition module is used for recognizing the abnormal behaviors of the image sequence to be recognized by utilizing the trained abnormal behavior recognition model.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory and a processor, where the memory stores a computer program that is executable on the processor, and the processor implements the steps in the above abnormal behavior identification method when executing the program.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the above abnormal behavior identification method.

In a fifth aspect, a computer program is provided, comprising computer readable code which, when run in an electronic device, a processor in the electronic device performs steps configured to implement the above method.

In a sixth aspect, a computer program product is provided, comprising one or more instructions adapted to be loaded by a processor and to perform the steps of the above-described method.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

in the embodiment of the application, firstly, a first image sequence is selected from a collected first video; then acquiring a stored second image sequence based on the first image sequence; determining an image sequence to be identified based on the first image sequence and the second image sequence; finally, carrying out abnormal behavior recognition on the image sequence to be recognized by utilizing a trained abnormal behavior recognition model; therefore, a new unprocessed image is acquired through the acquired first video to serve as a first image sequence, then the stored earlier acquired historical image serves as a second image sequence, and a part of image frames are respectively sampled from the first image sequence and the second image sequence to serve as input of an abnormal behavior identification model, so that the information of the past time can be effectively utilized to carry out sufficient perception, and the defect of long-range semantics and long delay caused by the fact that identification is carried out only on the basis of the newly acquired image sequence when an online video stream is identified are avoided.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without inventive efforts, wherein:

fig. 1 is a schematic flowchart of an abnormal behavior identification method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of an abnormal behavior identification method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of an abnormal behavior identification method according to an embodiment of the present application;

fig. 4A is a schematic diagram of an abnormal behavior recognition model training process provided in the embodiment of the present application;

fig. 4B is a schematic diagram of an abnormal behavior recognition model training process provided in the embodiment of the present application;

fig. 5A is a flowchart of a method for dynamic sampling training according to an embodiment of the present disclosure;

fig. 5B is a schematic diagram of a framework of an abnormal behavior identification method according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an abnormal behavior recognition apparatus according to an embodiment of the present disclosure;

fig. 7 is a hardware entity diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

It should be noted that the terms "first \ second \ third" referred to in the embodiments of the present application are only used for distinguishing similar objects and do not represent a specific ordering for the objects, and it should be understood that "first \ second \ third" may be interchanged under predetermined orders or sequences where possible, so that the embodiments of the present application described herein can be implemented in an order other than that illustrated or described herein.

It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which embodiments of the present application belong. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The scheme provided by the embodiment of the application relates to an artificial intelligence technology, and is specifically explained by the following embodiment:

artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science, attempting to understand the essence of intelligence and producing a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like. The embodiment of the application relates to a machine learning technology.

Machine Learning (ML) is a multi-domain cross subject, and relates to multiple subjects such as probability theory, statistics, approximation theory, convex analysis and algorithm complexity theory. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

The detection of anomalies in video is an important problem in the field of computer vision, and has wide application in the field of smart cities, such as detection of dangerous personal safety behaviors, traffic accidents, some unusual events and the like. However, most devices that capture video sources simply record the dynamics at each moment and do not have the ability to make automated decisions (often requiring special personnel to be responsible for manual viewing). Due to the huge amount of video, it is obviously not realistic to filter the content in the video only by human power. There is a need for techniques that utilize computer vision and deep learning to automatically detect anomalous events that occur in video.

It is extremely difficult to identify an abnormal event in an online video source. Possible challenges include scarcity of annotation data due to small probability events, large inter/intra-class variance, subjectively defined differences in anomalous events, low resolution of the video source, etc. Instead, the human being recognizes the abnormality through common knowledge. For example, if people gather on a street that is not normally in traffic, an anomaly may occur, for example, a lying or falling event is determined to be an anomaly. They are not common knowledge for machines, but only visual features. Generally, the stronger the visual features, the better the desired anomaly detection performance.

The embodiment of the application provides an abnormal behavior identification method which is applied to electronic equipment. The electronic device includes, but is not limited to, a mobile phone, a laptop, a tablet and a web-enabled device, a multimedia device, a streaming media device, a mobile internet device, a wearable device, or other types of devices. The functions implemented by the method can be implemented by calling program code by a processor in an electronic device, and the program code can be stored in a computer storage medium. The processor may be configured to perform the abnormal behavior recognition process, and the memory may be configured to store data required and data generated during the abnormal behavior recognition process.

Fig. 1 is a schematic flow chart of an abnormal behavior identification method provided in an embodiment of the present application, and as shown in fig. 1, the method at least includes the following steps:

step S110, a first image sequence is selected from the acquired first video.

Here, the acquired first video is an online video stream obtained by shooting a target area or a target object by a camera module, for example, a video acquired in a smart city scene.

Some possible implementations are to sample at least two frames of images before the current time from the first video as the first sequence of images. Another possible implementation is to sample from a specific sampling range in the first video for different video recognition tasks, resulting in a first image sequence.

It should be noted that, in order to process an input video segment of an arbitrary length for performing segment-level behavior identification, a frame sampling strategy is generally used, and the common sampling manner is sparse sampling and dense sampling. For example, for a video with a duration of 20 seconds(s), a reference sampling duration is set to be 3 seconds, and the identification is usually performed by composing a sample sequence with 8 frames of images. In the implementation, a video within a time length range of any 3 seconds is divided into 8 segments, and then each segment is internally sampled to obtain one frame of image, so that the required 8 frames of images are obtained.

Step S120, acquiring a stored second image sequence based on the first image sequence;

here, the second image sequence is selected from a stored second video, that is, the second video is captured in an earlier period of time relative to the first video, and at least two frames of historical images are selected from the second video as the second image sequence, so that the second image sequence selected from the second video includes at least image frames in a past period of time, which can provide historical information for a longer period of time.

In some embodiments, the image frames in the first video and the second video are completely different, that is, the timestamp of the last frame image in the second video is earlier than the timestamp of the first frame image in the first video, for example, the second video is captured within one minute of 09:10:00 to 09:10:59, and the first video is captured within one minute of 09:11:00 to 09:12: 59. In other embodiments, the first video and the second video have at least one image frame captured at the same time, and the image frame may be updated into the second video after the previous round of selection and prediction.

It should be noted that the first video and the second video may be image sequences acquired by the same camera, or may be image sequences acquired by different cameras.

The first video and the second video have the same content attribute. For example, the first video and the second video are captured of the same scene at different time periods. One possible implementation is that the first video and the second video are videos acquired from the same target object in different time periods. In the implementation, a first video for collecting a target object in a first time period before the current moment is obtained; and then acquiring a second video acquired from the same target object before the starting time based on the starting time of the first time period, thereby realizing tracking of the same target object and acquiring a historical image frame acquired from the target object at an earlier time as a second image sequence.

Another possible implementation is that the first video and the second video are videos captured for the same target area in different time periods. In the implementation, a first video acquired from a target area in a first time period before the current moment is acquired; and acquiring a second video acquired from the same target area before the starting time based on the starting time of the first time period, so as to detect the same target area, and acquiring a historical image frame acquired from the target area at an earlier time as a second image sequence.

Step S130, determining an image sequence to be identified based on the first image sequence and the second image sequence;

here, a part of the image frames are sampled from the first image sequence and the second image sequence, respectively, to constitute an image sequence to be recognized. Therefore, the historical information of the past time can be fully utilized to carry out full perception, and the loss of long-range semantics and long delay in model identification are avoided.

And step S140, performing abnormal behavior recognition on the image sequence to be recognized by using the trained abnormal behavior recognition model.

Here, the abnormal behavior recognition model is used to recognize the image sequence to be recognized, and the abnormal behavior in the image sequence to be recognized is detected.

The method includes the steps that first, a first image sequence is selected from a collected first video; then acquiring a stored second image sequence based on the first image sequence; determining an image sequence to be identified based on the first image sequence and the second image sequence; finally, carrying out abnormal behavior recognition on the image sequence to be recognized by utilizing a trained abnormal behavior recognition model; in this way, a new unprocessed image is acquired through the acquired first video to serve as a first image sequence, an earlier acquired historical image is acquired through the stored second video with the same attribute to serve as a second image sequence, and a part of image frames are respectively sampled from the first image sequence and the second image sequence to serve as input of an abnormal behavior identification model, so that the information of the past time can be effectively utilized to carry out sufficient perception, and the defect of long-range semantics and long delay caused by identification based on the newly acquired image sequence only when the online video stream is identified are avoided.

In some possible embodiments, the first image sequence includes an image frame whose termination time is the current time, that is, a newly acquired image frame to be processed in the first image sequence. Fig. 2 is a schematic flow chart of an abnormal behavior identification method provided in an embodiment of the present application, and as shown in fig. 2, the method at least includes the following steps:

step S210, determining a first sampling range based on the type of the abnormal behavior to be identified;

here, the types of the abnormal behavior include types of climbing, falling, leaflet, and the like.

It should be noted that, for the duration of different abnormal behavior types, in order to allow the model to accurately perceive and understand the content in the captured first video, the maximum time range of frame sampling, i.e., the first sampling range, may be determined from the first video. For example, fall behaviour typically lasts several minutes, a longer sampling range can be determined; the flyer action is typically completed in a few seconds, and a shorter sampling range is determined.

Step S220, selecting image frames in the first sampling range from the collected first video as the first image sequence;

here, the image frame in the first sampling range may be a video obtained by capturing a duration corresponding to the first sampling range from the first video as the first image sequence, or may be a multi-frame image obtained by sampling in the first sampling range as the first image sequence.

It will be appreciated that unlike off-line model training, the length of the input video during on-line testing is often unpredictable. The steps S210 to S220 realize that "selecting a first image sequence from the acquired first video", and by limiting the time sampling range, selecting an image in the first sampling range as an input of the abnormal behavior model, the accuracy of the model for identifying the abnormal behavior and the sensitivity of the short-time action segment can be ensured.

Step S230, determining a second sampling range in the second video based on the historical time before the current time;

here, the second sampling range takes the history time as an end time. And the first image sequence comprises an image frame taking the current time as the termination time, namely the acquisition time of the image frame in the second sampling range is before the acquisition time of the image frame in the first sampling range.

In some embodiments, the first video and the second video are videos captured of the same target object during different time periods. For example, the first video may be a motion behavior sequence of the pedestrian a collected in real time, typically a newly collected image sequence to be processed, and the second video may be a motion behavior sequence of the pedestrian a collected in a historical time period far away from the current time, typically an already recognized image sequence, where the pedestrian a may be tracked and detected by the same camera or different cameras, so as to obtain the first video and the second video.

In other embodiments, the first video and the second video are videos captured of the same target area during different time periods. For example, the first video may be a video of the intersection B collected in real time, typically a newly collected to-be-processed image sequence within the last 1 minute, and the second video is an abnormal behavior sequence of different pedestrians or vehicles at the intersection B collected within the first two minutes, typically an already-identified image sequence, where the intersection B may be detected by the same camera or different cameras to obtain the first video and the second video.

The first video and the second video are set as videos acquired from the same target object or the same target area in sequence, so that a first image sequence and a second image sequence respectively selected from the first video and the second video are a new unprocessed image and an earlier image with the same content attribute in sequence, the historical information of the past time can be fully utilized for sensing, and the loss of long-range semantics and long delay in model identification are avoided.

Step S240, selecting the image frame in the second sampling range as the second image sequence.

The above-mentioned S230 to S240 implement the process of "acquiring the stored second image sequence based on the first image sequence". By defining the first image sequence as the image frame sampled before the current time and determining the second sampling range of the second image sequence based on the historical time, the image frame with the earlier acquisition time relative to the first image sequence is ensured to be included in the second image sequence sampled from the second video, thereby realizing the acquisition of information with longer time course.

Step S250, determining an image sequence to be identified based on the first image sequence and the second image sequence;

And step S260, performing abnormal behavior recognition on the image sequence to be recognized by using the trained abnormal behavior recognition model.

In the embodiment of the application, the first video and the second video are set as videos acquired from the same target object or the same target area in sequence, so that the first image sequence and the second image sequence respectively selected from the first video and the second video are new unprocessed images and earlier images with the same attribute in sequence, thereby fully utilizing historical information of past time to perform sufficient perception and avoiding the loss of long-range semantics and longer delay in model identification.

In some possible embodiments, the image sequence to be identified includes N frames of images, where N is an integer greater than or equal to 2. Fig. 3 is a schematic flow chart of the abnormal behavior identification method provided in the embodiment of the present application, and as shown in fig. 3, the step S120 or the step S250 "determining the image sequence to be identified based on the first image sequence and the second image sequence" may be implemented by:

step S310, sampling a first preset number of image frames from the first image sequence as a first candidate set;

here, L frames of images are sampled from the first sequence of images as a first candidate set of input abnormal behavior recognition models; and L is an integer greater than or equal to 1. Therefore, the latest collected image frame can be efficiently processed in real time when the video stream is on the test line, and the current behavior prediction result is obtained.

Step S320, sampling a second preset number of image frames from the second image sequence as a second candidate set;

here, M frames of images are sampled from the second sequence of images as a second candidate set of input abnormal behavior recognition models; and M is an integer greater than or equal to 1. This ensures that images with an earlier acquisition time relative to the first image sequence can be processed simultaneously during video streaming on the test line, thereby enabling the acquisition of information over a longer time period.

Step S330, using the first candidate set and the second candidate set as the image sequence to be identified.

Here, the image sequence to be recognized includes N frames of images, and a sum of the first preset number L and the second preset number M is N; wherein, L and M are integers which are more than or equal to 1, and N is an integer which is more than or equal to 2. Illustratively, the image sequence to be recognized is set to comprise 8 frames of images, 5 frames of images can be sampled from the first image sequence and 3 frames of images can be sampled from the second image sequence, so that the image sequence to be recognized is formed by sampling a part of image frames from the first image sequence and the second image sequence which are acquired from two different time periods respectively, and online abnormal behavior recognition is achieved by using a longer time course.

In some possible embodiments, the number of frames of image frames in the first image sequence is the same as the number of frames of image frames in the second image sequence; the first preset quantity and the second preset quantity have the same value. By way of example, setting that the image sequence to be recognized includes 8 frames of images, 4 frames can be sampled from the first image sequence and 4 frames can be sampled from the second image sequence respectively, so that the image sequence to be recognized is formed by limiting image frames including the same number of frames in the first image sequence and the second image sequence and sampling half of the image frames respectively, so that the image sequence to be recognized includes symmetrical historical information and unprocessed information, and the perception capability of recognizing abnormal behaviors can be improved better.

In some possible embodiments, after the image sequence to be recognized is obtained, the second image sequence may be updated by using the image sequence to be recognized. In one embodiment, in each iteration, adding the newly received L frames of images in the first image sequence to the stored second image sequence, and sampling a second preset number of image frames from the updated second image sequence in the next iteration; the second preset number of image frames at least comprises L frame images in the original first image sequence. In another embodiment, the image sequence to be identified includes N image frames, and the number of the image frames in the second image sequence is N, where N is an integer greater than or equal to 2; and taking the image sequence to be identified as a new second image sequence. The image sequence to be identified and the second image sequence are limited to be the same in image frame number, the image sequence to be identified processed in the iteration is used as the second image sequence in the next iteration to replace the old second image sequence, and the historical information of the past time can be efficiently and fully utilized.

In the embodiment of the application, the image sequence to be identified is formed by sampling a part of image frames from the first image sequence and the second image sequence which are acquired at two different time periods, so that the online abnormal behavior identification can be performed by using a longer time range. And meanwhile, the second image sequence is updated by utilizing the image sequence to be recognized, so that the second image sequence dynamically stores the processed image frames along with each iteration process, and the image frames participating in abnormal behavior recognition in the next iteration are the image frames which are closest in time or related to the unprocessed image frames, thereby avoiding the loss of long-range semantics or longer delay in model recognition.

In some possible embodiments, fig. 4A is a schematic diagram of a training process of an abnormal behavior recognition model provided in an embodiment of the present application, and as shown in fig. 4A, the training process of the abnormal behavior recognition model may be implemented by the following steps:

step S410, determining a first sequence sample based on a video to be processed;

here, the first sequence of samples includes at least a first image frame where an abnormal behavior action starts to occur and a second image frame where the abnormal behavior action terminates. In an implementation, the first sequence of samples may further include image frames acquired from an acquisition time of a first image frame to an acquisition time of a second image frame, that is, an abnormal behavior action is included within each sample image in the first sequence of samples. The first sequence of samples can be intercepted from the video to be processed by labeling the action frames including the abnormal behavior actions in the video to be processed. The labeling process may be labeled by a labeling person or by a device with labeling capability, and the labeling process is not limited in the embodiments of the present application.

It should be noted that the length of the first sequence of samples, i.e. the number of image frames included in the first sequence of samples, may be determined according to the actual sampling interval and duration. The embodiments of the present application do not limit this.

Step S420, determining a second sequence of samples based on other image frames except the first sequence of samples in the video to be processed;

here, the second sequence of samples is a difficult sample set, and each sample image in the second sequence is a background image corresponding to the sample image including the abnormal behavior motion.

After the first sequence of samples is determined, image frames marking abnormal behavior actions are further determined, and the image frames comprise a first image frame acquired at the moment when the abnormal behavior actions start to occur and a second image frame acquired at the moment when the abnormal behavior actions end.

In an implementation, an image frame whose acquisition time is before the first image frame may be selected from the video to be processed as the second sequence sample, an image frame whose acquisition time is after the second image frame may be selected from the video to be processed as the second sequence sample, and an image frame whose acquisition time is before the first image frame and an image frame whose acquisition time is after the second image frame may be simultaneously selected as the second sequence sample. Further, in order to achieve a better model training effect, an image frame adjacent to the first image frame and having a timestamp before the first image frame and an image frame adjacent to the second image frame and having a timestamp after the second image frame may be respectively selected as the second sequence sample. The method comprises the steps of selecting an image frame before a first image frame and/or an image frame after a second image frame from a video to be processed, and constructing a difficult sample of an abnormal behavior action boundary so as to improve the perception capability and generalization performance of a model on the action boundary.

Step S430, using the first sequence sample and the second sequence sample as training sample sets, and training the abnormal behavior recognition model.

The method comprises the steps of determining a first sequence sample marked with an abnormal behavior action starting position and an abnormal behavior action ending position from a video to be processed as an action sample, and selecting a second sequence sample marked with the abnormal behavior action from the video to be processed as a background sample, so that the constructed background sample is added in a training process, and the generalization performance of a model in online testing of an unclipped video stream can be improved.

Fig. 4B is a schematic diagram of an abnormal behavior recognition model training process provided in the embodiment of the present application, and as shown in fig. 4B, the step S410 "determining a first sequence sample based on a video to be processed" may be implemented by:

step S4101, acquiring an expansion coefficient from a preset sampling interval as a target expansion coefficient;

here, the sampling interval includes expansion coefficients of at least two scales. Illustratively, expansion coefficients with a plurality of scales are preset to form a sampling interval R ═ R₁,r₂,…,r_nIn which r is₁、r₁And r_nAre all real numbers greater than or equal to 1, and have a dimension r₁、r₁And r_nAre increasing in turn, e.g. r₁、r₁And r_n1, 1.5 and 2 are taken respectively. Any value of R is obtained in the training process and is used as a target expansion coefficient.

Step S4103, determining a target sampling range according to a preset reference sampling duration and the target expansion coefficient;

here, the preset reference sampling time period is a minimum sampling time period, and is generally set to 3 seconds, and is consistent with the on-line test frequency.

In implementation, on the basis of the reference sampling duration, the reference sampling duration is expanded by the target expansion coefficient to obtain the target sampling range. And expanding the reference sampling duration by using the target expansion coefficient, namely multiplying the target expansion coefficient obtained in training with the reference sampling duration, for example, taking the target expansion coefficient as 1.5, and obtaining a sampling range of 4.5 seconds, namely the target sampling range. Therefore, the sampling range of the video with indefinite length is limited when the abnormal behavior recognition model is trained offline.

Step S4105, based on the target sampling range, performing frame sampling on the video to be processed to obtain the first sequence sample.

Here, the actual position of the target sampling range in the video to be processed is first determined. In one embodiment, a starting sampling moment of the original video is obtained; determining the sampling termination time according to the target sampling range and the initial sampling time; in another embodiment, the initial sampling time of the original video is obtained; and determining the sampling termination time according to the target sampling range and the sampling start time. That is, one of the start sampling timing or the end sampling timing is randomly designated according to the test condition, and then the other is determined according to the duration of the target sampling range.

Then sampling the video between the initial sampling time and the final sampling time to obtain at least two frames of images; taking the at least two frame images as the first sequence of samples. Therefore, the off-line training process is consistent with the on-line testing frequency, the testing result can be aligned with the on-line, and the robustness of the model to frame sampling is improved.

For example, for a video to be identified with a duration of 20 seconds, setting the starting sampling time as 3 seconds and the target sampling range as 4.5 seconds, sampling is performed for the 3 rd to 7.5 th second intervals in the video. If 8 frames are sampled, the target sampling range of 4.5 seconds is divided into 8 segments, 1 frame is sampled in each segment, and finally 8 frames of images are obtained and used as the input of the abnormal behavior recognition model.

According to the embodiment of the application, an expansion coefficient is dynamically acquired in the training process, so that the corresponding time sampling range limit, namely the target sampling range, is obtained, and sampling is performed in the starting interval and the ending interval corresponding to the range, so that the maximum time range of frame sampling is limited. Meanwhile, a series of different sampling durations are set for dynamic selection, so that the robustness of the model to inconsistent sampling durations is improved.

The foregoing abnormal behavior recognition method is described below with reference to a specific embodiment, but it should be noted that the specific embodiment is only for better describing the present application and is not to be construed as limiting the present application.

It is extremely difficult to identify anomalous events in an online video stream. A possible challenge is how to fully and efficiently utilize past video at the present time. For this case, it is a common practice to input N frames before the current time as a window into the network to determine the result. However, such an approach has two drawbacks: firstly, the long-range semantics are lost; secondly, a longer delay is caused.

In addition, a lot of experiments show that in order to process an input video segment with an arbitrary length for performing behavior recognition at a segment level, a frame sampling strategy is generally used, and the common sampling modes are sparse sampling and dense sampling. However, whatever the sampling mode, the existing classification method is very not robust to the result of frame sampling, which means that even if the frame is different, the output result of the model may be suddenly changed. Therefore, how to improve the robustness of the model to frame sampling during training is very important, especially when security anomaly detection is performed on a line, the detection period often fluctuates, and the requirement on the detection stability of the model in a video stream is very high. It can be seen that in order for a machine to accurately perceive and understand the content in a video, robustness considering a lot of factors needs to be considered.

As a human being, identification of abnormality is generally performed through common sense. For example, if a crowd is gathered on a street that is not normally in traffic, it may be an anomaly. An anomaly may occur if, for example, a fall event occurs. They are not common knowledge for machines, but only visual features. Generally, the stronger the visual features, the better the desired anomaly detection performance.

In summary, for detecting abnormal behaviors in an online video stream, it is important how to improve robustness of a model to a frame sampling duration and how to sufficiently and efficiently utilize historical information. Unlike off-line model training, the length of the input video is often unpredictable, meaning there is uncertainty in the frame sampling interval, resulting in inconsistencies between on-line deployment and off-line (usually much larger than the on-line interval) in addition to the model itself. Meanwhile, unlike the off-line training video which is usually a clipped video, the on-line video stream is usually an unclipped video (i.e. there are a large number of background frames), and the generalization of the model to the unclipped video test is greatly tested.

The existing method for deploying the offline-trained abnormal behavior recognition model on line has the following problems: when the video stream is tested on line, the covered time range is usually short, the length of the video under the line is not fixed, and how to improve the robustness of the model to the sampling duration is realized; how to promote the sensitivity of the model to the action boundary and the generalization to the unclipped video test; how to fully and efficiently utilize long-term history information for behavior recognition of online video streams.

For the problem that the length of an input video is unpredictable, so that a recognition result of an abnormal behavior recognition model is very not robust when a frame sampling interval is unstable, an embodiment of the present application provides a method for dynamic sampling training, as shown in fig. 5A, the training process includes the following steps:

step S510, inputting an off-line video with any length;

here, the offline video may be a video captured by a camera assembly deployed at a specific location, or any piece of video in a sample set.

Step S520, setting a reference sampling duration;

here, the reference sampling time length, that is, the shortest sampling interval, is usually set to T, and the value of T is 3 seconds.

Step S530, setting a plurality of proportional sampling interval intervals;

for example, a set of sampling interval intervals R ═ R is set₁,r₂,…,r_nIn which r is₁、r₁And r_nAre all real numbers greater than or equal to 1, and have a dimension r₁、r₁And r_nThe values of (a) increase in turn. Thus, when the given reference sampling duration is T of 3 seconds, along with the adoption of expansion coefficients of different scales, a plurality of sampling ranges are generated, and the minimum value is T r₁Maximum T r_n. For example, take r₁、r₁And r_nRespectively 1, 1.5 and 2, the corresponding sampling ranges are 3 seconds, 4.5 seconds and 6 seconds in sequence.

And step S540, determining the maximum sampling range of the offline video according to the reference sampling duration and the maximum expansion coefficient.

In some embodiments, for inputting an offline video with an arbitrary length, a starting time may be randomly specified first, then a maximum sampling range is generated according to a maximum expansion coefficient and a reference sampling duration, and then an ending time is determined according to the starting time. In other embodiments, for inputting an offline video with an arbitrary length, an end time may be randomly specified first, then a maximum sampling range is generated according to the maximum expansion coefficient and the reference sampling duration, and then the start time is determined according to the end time. Thus, the interval between the start time and the end time constitutes the maximum sampling range of the offline video. Therefore, the sampling range of the video with indefinite length is limited when the abnormal behavior recognition model is trained off line.

And step S550, acquiring an expansion coefficient, sampling N segments in a corresponding sampling range, and inputting the N segments into the abnormal behavior recognition model for training.

Here, an expansion coefficient is dynamically selected during the training process, and different sampling intervals and corresponding sampling range limits are obtained. Then sparse sampling is carried out in the interval between the starting time and the ending time corresponding to the sampling range. In the test process, the expansion coefficient is taken as r₁The sampling range is limited to be T3 seconds, the sampling range is consistent with the on-line testing frequency, the testing mode and the result can be aligned on the line at the moment, and the identification robustness of the model under the condition that the frame sampling interval is unstable is improved.

In the process, the maximum sampling range of frame sampling is limited in the model training process, and a series of expansion coefficients with different scales are set for dynamic selection, so that the sensitivity of the model to short-time abnormal behavior segments is ensured, and the robustness of the model to inconsistent sampling duration is also ensured.

Aiming at the problem that the generalization performance of a model is poor when an unclipped video stream is tested on line, the embodiment of the application proposes that besides training a motion sample, a difficult sample is constructed at a motion boundary position, namely a sample with half background frames and half motion frames exists, the sample is not a motion frame (consistent with the on-line situation) completely after sampling, and the conventional method is just lack of data in the training process, so the on-line generalization performance is poor. The method and the device have the advantages that the sensitivity and generalization performance of the model to the action boundary are remarkably improved by adding the constructed difficult sample set in the training process.

Aiming at the problem that historical information cannot be efficiently and fully utilized in online testing, when a video stream is tested online, a general method is to use N frames before the current moment as a window and input an abnormal behavior recognition model to judge the result. However, such an approach has two drawbacks: firstly, the long-range semantics are lost; secondly, a longer delay is caused. Fig. 5B is a schematic diagram of a framework of the abnormal behavior recognition method according to the embodiment of the present application, and as shown in fig. 5B, the embodiment of the present application maintains two image groups including N frames, one is called a working memory group 51 (corresponding to the second image sequence) and one is called a new receiving image group 52 (corresponding to the first image sequence). The working memory set 51 and the new image receiving set 52 each include N frames of images, and each half (N respectively) of the samples from the two image sets is sampled every time the on-line real-time prediction is performed₁And N₂) The working memory group 51 is updated by forming N sampling segments 53, and the current behavior prediction result is performed by using the N sampling segments 53 as the input of the abnormal behavior recognition model 54. In this way, information over a longer time period is effectively utilized.

According to the embodiment of the application, on one hand, the sampling range of the video with the indefinite length is limited when the motion recognition model is trained offline, a plurality of different scale expansion coefficients are set to generate different sampling durations, the sampling duration corresponding to one expansion coefficient is obtained in training for sampling, the robustness of the model to different sampling durations is dynamically enhanced, and the recognition robustness of the model to the online video stream under the condition that the frame sampling interval is unstable is improved.

On the other hand, the method for constructing the difficult motion boundary sample is adopted to increase the background frame, so that the sensitivity and the generalization performance of the model to the motion boundary in the unclipped video are improved, and the problem of poor generalization of the model when the unclipped video stream is tested on line is solved.

On the other hand, the embodiment of the application accesses new and old images by designing two image workgroups, and respectively samples and forms input data tested at the current moment, so that information acquisition in a long time range is realized, and the information in the long time range is efficiently and fully utilized in online testing.

The embodiment of the application can be applied to abnormal behavior detection of intelligent video analysis and other scenes, for example, after abnormal events occur under video source collecting equipment in outdoor urban street scenes, indoor rail transit scenes and other scenes, a deployed behavior detection system automatically runs on-line robust video stream test and alarms, and high-efficiency and convenient detection capability is provided for personnel with related requirements.

Based on the foregoing embodiments, an abnormal behavior recognition apparatus is further provided in an embodiment of the present application, where the apparatus includes modules, sub-modules included in the modules, and units, and may be implemented by a processor in an electronic device; of course, the implementation can also be realized through a specific logic circuit; in the implementation process, the Processor may be a Central Processing Unit (CPU), a microprocessor Unit (MPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like.

Fig. 6 is a schematic structural diagram of an abnormal behavior recognition apparatus provided in an embodiment of the present application, and as shown in fig. 6, the apparatus 600 includes a first obtaining module 610, a second obtaining module 620, a first determining module 630, and a behavior recognition module 640, where:

the first obtaining module 610 is configured to select a first image sequence from a collected first video;

the second obtaining module 620 is configured to obtain a stored second image sequence based on the first image sequence; the second image sequence at least comprises a historical image frame of which the acquisition time is positioned before the first image sequence;

the first determining module 630 is configured to determine an image sequence to be identified based on the first image sequence and the second image sequence;

the behavior recognition module 640 is configured to perform abnormal behavior recognition on the image sequence to be recognized by using the trained abnormal behavior recognition model.

In some possible embodiments, the second obtaining module 620 is configured to select at least two frames of the historical image frames from a second video as the second image sequence; the first video and the second video have the same content attribute.

In some possible embodiments, the first obtaining module 610 includes a first determining sub-module and a second determining sub-module, wherein: the first determining submodule is used for determining a first sampling range based on the type of the abnormal behavior to be identified; the second determining submodule is used for selecting the image frames in the first sampling range from the acquired first video to serve as the first image sequence.

In some possible embodiments, the second obtaining module 620 includes a third determining submodule and a fourth determining submodule, wherein: the third determining submodule is used for determining a second sampling range in the second video based on the historical moment before the current moment; the second sampling range takes the historical time as a termination time; and the fourth determining submodule is used for selecting the image frame in the second sampling range as the second image sequence.

In some possible embodiments, the first determining module 630 includes a first sampling sub-module, a second sampling sub-module, and a fifth determining sub-module, wherein: the first sampling sub-module is used for sampling a first preset number of image frames from the first image sequence as a first candidate set; the second sampling submodule is used for sampling a second preset number of image frames from the second image sequence to serve as a second candidate set; the fifth determining sub-module is configured to use the first candidate set and the second candidate set as the image sequence to be identified.

In some possible embodiments, the apparatus 600 further includes an updating module configured to update the second image sequence with the image sequence to be identified.

In some possible embodiments, the image sequence to be identified includes N image frames, and the number of the image frames in the second image sequence is N, where N is an integer greater than or equal to 2; the updating module is further configured to use the image sequence to be identified as the new second image sequence.

In some possible embodiments, the behavior recognition module 640 includes a sample determination sub-module, a sample construction sub-module, and a training sub-module, wherein: the sample determining submodule is used for determining a first sequence of samples based on a video to be processed; wherein the first sequence of samples comprises at least a first image frame for starting abnormal behavior action and a second image frame for ending the abnormal behavior action; the sample construction sub-module is used for determining a second sequence of samples based on other image frames in the video to be processed except the first sequence of samples; and the training submodule is used for taking the first sequence sample and the second sequence sample as a training sample set to train the abnormal behavior recognition model.

In some possible embodiments, the sample construction sub-module is further configured to select, as the second sequence of samples, an image frame whose acquisition time is before the first image frame and/or an image frame whose acquisition time is after the second image frame from the video to be processed.

In some possible embodiments, the sample determination submodule includes a selecting unit, a determining unit, and a sampling unit, wherein: the selecting unit is used for acquiring an expansion coefficient from a preset sampling interval as a target expansion coefficient; wherein the sampling interval comprises expansion coefficients of at least two scales; the determining unit is used for determining a target sampling range according to a preset reference sampling duration and the target expansion coefficient; and the sampling unit is used for carrying out frame sampling on the video to be processed based on the target sampling range to obtain the first sequence sample.

In some possible embodiments, the determining unit is further configured to expand the reference sampling duration by the target expansion coefficient on the basis of the reference sampling duration to obtain the target sampling range.

In some possible embodiments, the sampling unit is further configured to obtain a starting sampling time of the original video; determining the sampling termination time according to the target sampling range and the initial sampling time; sampling the video between the initial sampling time and the final sampling time to obtain at least two frames of images; taking the at least two frame images as the first sequence of samples.

Here, it should be noted that: the above description of the apparatus embodiments, similar to the above description of the method embodiments, has similar beneficial effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.

It should be noted that, in the embodiment of the present application, if the abnormal behavior recognition method is implemented in the form of a software functional module and is sold or used as an independent product, the abnormal behavior recognition method may also be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling an electronic device (which may be a smartphone with a camera, a tablet computer, etc.) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.

Correspondingly, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in any of the above-mentioned abnormal behavior identification methods.

Correspondingly, in an embodiment of the present application, a chip is further provided, where the chip includes a programmable logic circuit and/or a program instruction, and when the chip runs, the chip is configured to implement the steps in any one of the abnormal behavior identification methods in the foregoing embodiments.

Correspondingly, in an embodiment of the present application, there is further provided a computer program product, which is used to implement the steps in any one of the abnormal behavior identification methods in the foregoing embodiments when the computer program product is executed by a processor of an electronic device.

Based on the same technical concept, the embodiment of the present application provides an electronic device, which is used for implementing the abnormal behavior identification method described in the above method embodiment. Fig. 7 is a hardware entity diagram of an electronic device according to an embodiment of the present application, as shown in fig. 7, the electronic device 700 includes a memory 710 and a processor 720, the memory 710 stores a computer program that can run on the processor 720, and the processor 720 executes the computer program to implement steps in any abnormal behavior identification method according to the embodiment of the present application.

The Memory 710 is configured to store instructions and applications executable by the processor 720, and may also buffer data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or already processed by the processor 720 and modules in the electronic device, and may be implemented by a FLASH Memory (FLASH) or a Random Access Memory (RAM).

The processor 720, when executing the program, implements the steps of any of the above-described abnormal behavior recognition methods. The processor 720 generally controls the overall operation of the electronic device 700.

The Processor may be at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a Central Processing Unit (CPU), a controller, a microcontroller, and a microprocessor. It is understood that the electronic device implementing the above-mentioned processor function may be other electronic devices, and the embodiments of the present application are not particularly limited.

The computer storage medium/Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a magnetic Random Access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical Disc, or a Compact Disc Read-Only Memory (CD-ROM), and the like; and may be various electronic devices such as mobile phones, computers, tablet devices, personal digital assistants, etc., including one or any combination of the above-mentioned memories.

Here, it should be noted that: the above description of the storage medium and device embodiments is similar to the description of the method embodiments above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the storage medium and apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application. The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms. The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiments of the present application.

In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit. Alternatively, the integrated units described above in the present application may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing an automatic test line of a device to perform all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code. The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments. The features disclosed in the several method or apparatus embodiments provided in the present application may be combined arbitrarily, without conflict, to arrive at new method embodiments or apparatus embodiments. The above description is only for the embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An abnormal behavior recognition method, characterized in that the method comprises:

selecting a first image sequence from the acquired first video;

acquiring a stored second image sequence based on the first image sequence; wherein the second image sequence at least comprises a history image frame with the acquisition time before the first image sequence;

2. The method of claim 1, wherein said retrieving a stored second sequence of images based on said first sequence of images comprises:

selecting at least two frames of the historical image frames from a second video as the second image sequence; the first video and the second video have the same content attribute.

3. The method of claim 2, wherein the first video and the second video are videos captured of a same target object during different time periods; or the first video and the second video are videos collected from the same target area in different time periods.

4. A method as claimed in any one of claims 1 to 3, wherein said selecting a first sequence of images from the captured first video comprises:

determining a first sampling range based on the type of the abnormal behavior to be identified;

and selecting the image frames in the first sampling range from the acquired first video as the first image sequence.

5. The method of any of claims 1 to 4, wherein the first image sequence includes an image frame having a current time as a termination time, and wherein the retrieving a stored second image sequence based on the first image sequence includes:

determining a second sampling range in the second video based on a historical time before the current time; the second sampling range takes the historical time as a termination time;

and selecting the image frame in the second sampling range as the second image sequence.

6. The method of any of claims 1 to 5, wherein determining the sequence of images to be identified based on the first sequence of images and the second sequence of images comprises:

sampling a first preset number of image frames from the first image sequence as a first candidate set;

sampling a second preset number of image frames from the second image sequence as a second candidate set;

and taking the first candidate set and the second candidate set as the image sequence to be identified.

7. The method of claim 6,

the number of frames of the image frames in the first image sequence is the same as the number of frames of the image frames in the second image sequence; the first preset quantity and the second preset quantity have the same value.

8. The method of any of claims 1 to 7, further comprising:

and updating the second image sequence by using the image sequence to be identified.

9. The method according to claim 8, wherein the image sequence to be identified comprises N image frames, and the number of the image frames in the second image sequence is N, N being an integer greater than or equal to 2;

the updating the second image sequence by using the image sequence to be identified comprises:

and taking the image sequence to be identified as a new second image sequence.

10. The method of any one of claims 1 to 9, wherein the abnormal behavior recognition model is trained by:

determining a first sequence of samples based on a video to be processed; wherein the first sequence of samples comprises at least a first image frame for starting abnormal behavior action and a second image frame for ending the abnormal behavior action;

determining a second sequence of samples based on other image frames in the video to be processed except the first sequence of samples;

and taking the first sequence of samples and the second sequence of samples as a training sample set, and training the abnormal behavior recognition model.

11. The method of claim 10, wherein determining a second sequence of samples based on other image frames in the video to be processed than the first sequence of samples comprises:

and selecting an image frame with the acquisition time before the first image frame and/or an image frame with the acquisition time after the second image frame from the video to be processed as the second sequence sample.

12. The method of claim 10 or 11, wherein determining the first sequence of samples based on the video to be processed comprises:

acquiring an expansion coefficient from a preset sampling interval as a target expansion coefficient; wherein the sampling interval comprises expansion coefficients of at least two scales;

determining a target sampling range according to a preset reference sampling duration and the target expansion coefficient;

and performing frame sampling on the video to be processed based on the target sampling range to obtain the first sequence sample.

13. The method of claim 12, wherein determining a target sampling range based on a preset reference sampling duration and the target expansion coefficient comprises:

and on the basis of the reference sampling duration, expanding the reference sampling duration by using the target expansion coefficient to obtain the target sampling range.

14. The method according to claim 12 or 13, wherein said frame-sampling said video to be processed based on said target sampling range to obtain said first sequence of samples comprises:

acquiring the initial sampling time of the original video;

determining the sampling termination time according to the target sampling range and the initial sampling time;

sampling the video between the initial sampling time and the final sampling time to obtain at least two frames of images;

taking the at least two frame images as the first sequence of samples.

15. An abnormal behavior recognition apparatus, characterized in that the apparatus comprises:

16. An electronic device comprising a memory and a processor, the memory storing a computer program operable on the processor, wherein the processor implements the steps of the method of any one of claims 1 to 14 when executing the program.

17. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 14.

18. A computer program comprising computer readable code which, when run in an electronic device, is executed by a processor in the electronic device and is configured to implement the method of any of claims 1 to 14.

19. A computer program product comprising one or more instructions adapted to be loaded by a processor and to perform the steps of the method according to any of claims 1 to 14.