CN114882414A

CN114882414A - Abnormal video detection method, abnormal video detection device, electronic equipment, abnormal video detection medium and program product

Info

Publication number: CN114882414A
Application number: CN202210535186.5A
Authority: CN
Inventors: 肖慧慧; 李家宏; 李思则
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2022-05-17
Filing date: 2022-05-17
Publication date: 2022-08-09

Abstract

The present disclosure relates to an abnormal video detection method, apparatus, electronic device, medium, and program product, the method comprising the steps of: obtaining a video frame picture of a target video; detecting each object in each video frame picture to obtain a plurality of image slices, wherein each image slice comprises one object; respectively identifying multiple abnormal categories of objects in each image slice to obtain an identification result of each image slice corresponding to each abnormal category; and determining whether the target video is an abnormal video or not according to the identification result of each image slice corresponding to each abnormal category. By applying the technical scheme provided by the disclosure, the computing resources and the computing time are saved, the detection efficiency of the abnormal video is improved, in addition, only corresponding classification modes need to be maintained independently for different abnormal categories, the detection modes based on the abnormal categories are maintained in a unified way, and the maintenance cost is reduced.

Description

Abnormal video detection method, abnormal video detection device, electronic equipment, abnormal video detection medium and program product

Technical Field

The present disclosure relates to the field of computer application technologies, and in particular, to a method, an apparatus, an electronic device, a medium, and a program product for detecting an abnormal video.

Background

With the rapid development of computer technology and terminal technology, various application clients capable of being installed in the terminal are gradually increased, and the application clients interact with the application server to provide corresponding functions for users. For example, the video application client interacts with the video application server to provide functions such as video uploading, video watching, video comment and the like for the user. The user can upload own video to the video application server through the video application client, and the video application server can be watched by other users after publishing the video to the network. In order to ensure that the video content has correct guidance, the video application server needs to perform anomaly detection on the video uploaded by the user so as to avoid the abnormal video from being released on the network.

Currently, a plurality of abnormal categories are preset, and various abnormal categories are detected for a video to be detected to determine whether the video is an abnormal video. Wherein, different detection and identification modes are set for different abnormal categories. For example, for the abnormal category a, semantic information is strong, a better effect can be achieved by directly setting a detection and identification mode as a classification mode, for the abnormal category B, most of the targets are small targets, the detection and identification mode can be set only as a detection mode, the detection and identification mode can be set as a detection mode before classification to improve accuracy, and for the abnormal category C, the detection and identification mode can be set as a detection mode. The detection method consumes more computing resources and time than the classification method. When the video to be detected is detected in different abnormal categories, the detection mode may need to be used for multiple times, which consumes more computing resources and computing time, so that the abnormal video detection efficiency is low, and the detection identification modes corresponding to different abnormal categories need to be maintained independently, so that the maintenance cost is high.

Disclosure of Invention

An object of the present disclosure is to provide a method, an apparatus, an electronic device, a medium, and a program product for detecting an abnormal video, so as to save computation resources and computation time, improve detection efficiency of the abnormal video, and reduce maintenance cost.

In order to solve the technical problem, the present disclosure provides the following technical solutions:

according to a first aspect of the embodiments of the present disclosure, there is provided an abnormal video detection method, including:

obtaining a video frame picture of a target video;

detecting each object in each video frame picture to obtain a plurality of image slices, wherein each image slice comprises one object;

respectively identifying multiple abnormal categories of objects in each image slice to obtain an identification result of each image slice corresponding to each abnormal category;

and determining whether the target video is an abnormal video or not according to the identification result of each image slice corresponding to each abnormal category.

In a specific embodiment of the present disclosure, the detecting each object in each video frame picture to obtain a plurality of image slices includes:

detecting each object in each video frame picture by using a detection model based on the open world obtained by pre-training;

and obtaining the image slice of each detection frame in each video frame picture according to the detection result.

In one embodiment of the present disclosure, the open world based detection model is obtained by:

obtaining a training picture set, wherein each training picture in the training picture set is marked with an object;

training a pre-established detection initial model based on the open world based on the training pictures in the training picture set and the object labels in each training picture;

and after the training is finished, obtaining the open world-based detection model.

In an embodiment of the present disclosure, the performing identification of multiple abnormal categories on the object in each image slice respectively to obtain an identification result of each image slice corresponding to each abnormal category includes:

respectively inputting the current image slice into each classification model in a classification model library aiming at each image slice, wherein the classification model library comprises a plurality of different classification models, and the different classification models are used for identifying different abnormal classes;

and obtaining the identification result of the current image slice corresponding to each abnormal category according to the abnormal confidence coefficient output by each classification model.

In a specific embodiment of the present disclosure, the obtaining, according to the confidence level of the abnormality output by each classification model, a recognition result of the current image slice corresponding to each abnormality category includes:

for each classification model, if the output abnormal confidence coefficient of the current classification model is greater than a preset confidence coefficient threshold value, the obtained identification result is that the current image slice belongs to the abnormal category corresponding to the current classification model.

In a specific embodiment of the present disclosure, the method further includes:

obtaining a new classification model, wherein the new classification model is used for identifying a new abnormal class;

adding the new classification model to the classification model library to increase identification of the new anomaly class.

In a specific embodiment of the present disclosure, the video frame picture is a key frame picture in the target video.

In a specific embodiment of the present disclosure, the determining whether the target video is an abnormal video according to the recognition result of each image slice corresponding to each abnormal category includes:

for each image slice, determining whether the current image slice belongs to at least one abnormal category in multiple abnormal categories according to the identification result of the current image slice corresponding to each abnormal category;

determining the target video to be an abnormal video if there is at least one image slice belonging to an abnormal category.

According to a second aspect of the embodiments of the present disclosure, there is provided an abnormal video detection apparatus including:

a video frame picture obtaining module configured to perform obtaining a video frame picture of a target video;

the image slice obtaining module is configured to detect each object in each video frame picture to obtain a plurality of image slices, and each image slice contains one object;

the identification result obtaining module is configured to perform identification of multiple abnormal categories of the object in each image slice respectively to obtain an identification result of each image slice corresponding to each abnormal category;

and the abnormal video determining module is configured to determine whether the target video is an abnormal video according to the identification result of each image slice corresponding to each abnormal category.

In a specific embodiment of the present disclosure, the image slice obtaining module is configured to perform:

In a specific embodiment of the present disclosure, the method further includes a detection model obtaining module configured to obtain the open world-based detection model by:

In a specific embodiment of the present disclosure, the identification result obtaining module is configured to perform:

In a specific embodiment of the present disclosure, the method further includes a classification model library updating module configured to perform:

In a specific embodiment of the present disclosure, the abnormal video determination module is configured to perform:

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the anomalous video detection method as described in the first aspect.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the abnormal video detection method according to the first aspect.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer instructions stored in a computer-readable storage medium and adapted to be read and executed by a processor to cause an electronic device having the processor to perform the abnormal video detection method according to the first aspect.

By applying the technical scheme provided by the embodiment of the disclosure, when the target video is detected abnormally, only one object detection is needed, each object in each video frame picture is detected, after the image slice is obtained, the image slice is classified and processed based on different abnormal categories, the abnormal video is judged according to the identification result of each image slice corresponding to each abnormal category, the use times of the detection modes are reduced, the calculation resources and the calculation time are saved, the detection efficiency of the abnormal video is improved, moreover, only the corresponding classification modes need to be maintained independently for different abnormal categories, the detection modes based on the abnormal video are maintained uniformly, and the maintenance cost is reduced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic diagram of an abnormal video detection scenario in an embodiment of the present disclosure;

FIG. 2 is a flowchart illustrating an implementation of a method for detecting abnormal videos according to an embodiment of the present disclosure;

FIG. 3 is a diagram illustrating an exemplary process of detecting abnormal videos according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an abnormal video detection apparatus according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device in an embodiment of the disclosure.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The core of the disclosure is to provide an abnormal video detection method, which can be applied to any scene needing abnormal detection of videos. For example, a video application server deployed in the cloud performs anomaly detection on a video uploaded by a user. Specifically, as shown in fig. 1, after a user records a video, if the user wants to distribute the video to a network for other users to watch, the video can be uploaded to a video application server through a video application client installed on a terminal, such as the video application client 1. The video application server can perform anomaly detection on the video by itself or by calling other detection services by adopting the technical scheme provided by the embodiment of the disclosure. After determining that the video is an abnormal video, an abnormal determination result may be returned to the video application client 1. The video application client 1 can output abnormal video prompt information to the user, and can also output abnormal category prompt information at the same time. And the user can adjust and upload the video again according to the prompt message. If the video application server determines that the video is not an abnormal video, other video distribution preparation works can be performed, so that the video is distributed to the network after the preparation works are completed, and other users watch the video through the video application clients, such as the video application client 2 and the video application client 3.

By executing the scheme on the video application server side, the abnormal video can be quickly detected by means of strong processing capacity of the server.

Of course, when the user uploads the video to the video application server through the video application client, the video application client may perform anomaly detection on the video by using the technical scheme provided by the embodiment of the present disclosure. After the video is determined to be the abnormal video, the abnormal video prompt information can be output to the user, and meanwhile, the abnormal category prompt information can also be output. And the user can adjust and upload the video again according to the prompt message. If the video application client determines that the video is not the abnormal video, the video can be uploaded to the video application server, and the video application server performs other video publishing preparation work so as to publish the video to the network for other users to watch after the preparation work is completed.

By executing the scheme on the video application client side, the abnormal detection can be finished without uploading the video to the video application server, the network is not required, and abnormal video prompt information can be timely returned to the user when the video is determined to be the abnormal video.

Whether the execution is performed on the video application server side or the video application client side, after the video frame picture of the target video is obtained, the execution party of the technical scheme provided by the embodiment of the disclosure firstly detects each object in each video frame picture to obtain a plurality of image slices, each image slice comprises one object, namely each object in each video frame picture is detected, then identifies the object in each image slice in multiple abnormal categories to obtain the identification result of each image slice corresponding to each abnormal category, and then determines whether the target video is the abnormal video according to the identification result of each image slice corresponding to each abnormal category. When the target video is detected abnormally, only one object detection is needed, after image slices are obtained, the image slices are classified and processed based on different abnormal categories, the abnormal video is judged according to the identification result of each image slice corresponding to each abnormal category, the use times of detection modes are reduced, computing resources and computing time are saved, the detection efficiency of the abnormal video is improved, in addition, only the corresponding classification modes need to be maintained independently for different abnormal categories, the detection modes based on the abnormal video are maintained uniformly, and the maintenance cost is reduced.

Referring to fig. 2, a flowchart of an implementation of an abnormal video detection method provided in an embodiment of the present disclosure is shown, where the method may include the following steps:

s210: and obtaining a video frame picture of the target video.

In the embodiment of the present disclosure, the target video may be any video to be currently subjected to the abnormality detection. Any video uploaded by any user can be used as a target video for anomaly detection. The credibility of the user can be determined according to the historical uploading behavior of the user or the attribute of the user, if the credibility of the user is smaller than a preset credibility threshold value, any video uploaded by the user is used as a target video for carrying out abnormity detection, and otherwise, the video uploaded by the user is not subjected to abnormity detection.

After the target video to be subjected to the abnormal detection at present is determined, a video frame picture of the target video can be further obtained. The video frame picture of the target video can be each frame picture of the target video, namely, the abnormal video detection is carried out on the basis of each frame picture of the target video so as to improve the accuracy of the abnormal video detection. The video frame pictures of the target video can also be key frame pictures in the target video, namely key frame pictures obtained after frame extraction processing is carried out on the target video, the key frame pictures refer to frames where key actions in movement changes of roles or objects are located, the number of the key frame pictures is smaller than that of all the frame pictures of the target video, anomaly detection is carried out on the basis of the key frame pictures, and the detection efficiency of the anomaly video can be improved while the accuracy of the anomaly video detection is ensured.

S220: and detecting each object in each video frame picture to obtain a plurality of image slices.

Each image slice contains an object.

In the embodiment of the present disclosure, after the video frame picture of the target video is obtained, each object in each video frame picture may be further detected, all objects in each video picture are detected, and then a corresponding image slice is obtained for each object in each detected video picture. Each image slice contains an object.

For example, for any one video frame picture, if the video frame picture includes three objects, namely a human body, a cat and a cup, each object in the video frame picture is detected, and it can be detected that the video frame picture includes three objects, so as to obtain three image slices, where the image slice 1 only includes the human body, the image slice 2 only includes the cat, and the image slice 3 only includes the cup.

S230: and respectively identifying a plurality of abnormal categories of the object in each image slice to obtain an identification result of each image slice corresponding to each abnormal category.

In the embodiment of the present disclosure, a plurality of types of abnormality may be set according to actual conditions. Such as a first exception category, a second exception category, a third exception category, and so forth.

After detecting each object in each video frame picture of the target video to obtain a plurality of image slices, identifying a plurality of abnormal categories of the objects in each image slice. Specifically, for each image slice and each abnormality category, it can be determined whether an object in the image slice belongs to the abnormality category. After identification, the identification result corresponding to each abnormal category can be obtained for each image slice.

For any one of the abnormal categories, the identification result of an image slice corresponding to the abnormal category may be two types, one type is that the image slice belongs to the abnormal category, and the other type is that the image slice does not belong to the abnormal category.

S240: and determining whether the target video is an abnormal video or not according to the identification result of each image slice corresponding to each abnormal category.

After the object in each image slice is identified by the multiple abnormal categories to obtain the identification result of each image slice corresponding to each abnormal category, whether the target video is an abnormal video can be determined according to the identification result of each image slice corresponding to each abnormal category, and meanwhile, when the target video is determined to be an abnormal video, the existing abnormal category can be determined.

An abnormality determination rule may be set in advance, and when the recognition result of each image slice corresponding to each abnormality category conforms to the abnormality determination rule, the target video may be determined to be an abnormal video. For example, the set abnormality determination rule is to determine that the target video is an abnormal video if an object in one image slice belongs to a certain abnormal category. For another example, the set abnormality determination rule is that, for the abnormality type 1, if there is an object in one image slice that belongs to the abnormality type 1, the target video is determined to be an abnormal video, and for the abnormality type 2 and the abnormality type 3, when the number of image slices in which the included object belongs to the abnormality type 2 and the abnormality type 3 is greater than a set number threshold, the target video is determined to be an abnormal video.

After the target video is determined to be the abnormal video, abnormal prompt information can be further sent to a user or other warning devices so as to process the abnormal video in time.

If the target video is determined not to be the abnormal video, subsequent operations such as watermarking, encryption and the like can be continuously carried out on the target video.

By applying the method provided by the embodiment of the disclosure, when the target video is detected abnormally, only one object detection is needed, each object in each video frame picture is detected, after the image slice is obtained, the image slices are classified and processed based on different abnormal categories, and the abnormal video is judged according to the identification result of each image slice corresponding to each abnormal category, so that the use times of the detection modes are reduced, the calculation resources and the calculation time are saved, the detection efficiency of the abnormal video is improved, moreover, only the corresponding classification modes need to be maintained independently for different abnormal categories, the detection modes based on the detection modes are maintained uniformly, and the maintenance cost is reduced.

In one embodiment of the present disclosure, step S220 may include the steps of:

the method comprises the following steps: detecting each object in each video frame picture by using a detection model based on the open world obtained by pre-training;

step two: and obtaining the image slice of each detection frame in each video frame picture according to the detection result.

For convenience of description, the above two steps are combined for illustration.

In the embodiment of the disclosure, an open-world-based detection model may be obtained by training in advance, and all objects appearing in an image may be detected by the open-world-based detection model.

After the video frame pictures of the target video are obtained, each object in each video frame picture can be detected by using an open world-based detection model obtained through pre-training. For each video frame picture, the video frame picture may be input into an open world-based detection model, the detection model detects an object in the video frame picture, and may output a detection result, where the detection result may include position information of each detection frame in the video frame picture, and each detection frame includes one object.

And according to the detection result, obtaining the image slice of each detection frame in each video frame picture. Each detection frame contains an object, and the image slice of each detection frame only contains one object.

All objects in each video frame picture can be accurately detected by using the detection model based on the open world, so that the full number of image slices can be obtained, and the smooth detection of the abnormity based on the image slices can be guaranteed.

In one embodiment of the present disclosure, the open world based detection model may be obtained by:

the first step is as follows: obtaining a training picture set, wherein each training picture in the training picture set is marked by an object;

the second step is that: training a pre-established detection initial model based on the open world based on training pictures in the training picture set and object labels in each training picture;

the third step: after training is completed, an open-world based detection model is obtained.

For convenience of description, the above three steps are combined for illustration.

In the embodiment of the present disclosure, an open-world-based detection initial model may be established in advance, and each parameter in the detection initial model may be initialized.

The training picture set can be obtained by picture collection, picture making and other modes or by utilizing an existing picture library. The training picture set can comprise a plurality of training pictures, and the larger the number of the training pictures is, the more accurate the training of the model is. Each object in each training picture can be labeled in advance, for example, the objects are labeled by using a labeling frame, and each labeling frame only contains one object. That is, for the training picture, the specific class of the object is not distinguished, and as long as the object exists, the corresponding labeling is performed, and it can be considered that each object in the training picture is the object class.

After the training picture set is obtained, a pre-established detection initial model based on the open world can be trained based on the training pictures in the training picture set and the object labels in each training picture.

Specifically, iterative training may be performed on a pre-established open-world-based detection initial model based on a training picture set, after each iterative training, a prediction loss of the detection initial model is determined, parameters of the detection initial model are adjusted according to the prediction loss, and training is determined to be completed when the iteration number reaches a set iteration number threshold or the prediction loss stops decreasing.

A threshold number of iterations may be preset. Based on the training picture set, in the process of carrying out iterative training on the detection initial model based on the open world, after each iterative training, the prediction loss of the detection initial model can be determined according to the prediction result of the detection initial model and the object label in the training picture, for example, the prediction loss is determined through a loss function. By minimizing the loss objective, the inverse gradient adjustment detects the parameters of the initial model. When the initial detection model is trained, a class-advertising mode can be adopted, and the specific class of the object is not distinguished. In the continuous iteration process, the prediction loss gradually converges and continuously decreases. When the iteration number reaches a set iteration number threshold or the prediction loss stops decreasing, the training can be determined to be completed.

After the training is completed, the currently trained detection initial model can be determined as an open-world-based detection model for an actual anomaly detection scenario.

Each training picture of the training picture set is marked with an object, so that the detection initial model can be effectively trained, the detection accuracy of the finally obtained detection model is improved, and all objects in each video frame picture can be accurately detected.

In one embodiment of the present disclosure, step S230 may include the steps of:

the method comprises the following steps: respectively inputting the current image slice into each classification model in a classification model library aiming at each image slice, wherein the classification model library comprises a plurality of different classification models, and the different classification models are used for identifying different abnormal classes;

step two: and obtaining the recognition result of the current image slice corresponding to each abnormal category according to the abnormal confidence coefficient output by each classification model.

In the embodiment of the disclosure, multiple classification models can be obtained by training in advance, and different classification models are used for identifying different abnormal classes.

For each abnormal class, a classification model corresponding to the abnormal class can be obtained through pre-training by the following steps:

obtaining a slice sample set, the slice sample set including positive samples belonging to the anomaly class and negative samples not belonging to the anomaly class;

and training a pre-established classification initial model corresponding to the abnormal category based on the slice sample set, and obtaining the classification model corresponding to the abnormal category after the training is finished.

After the classification model corresponding to each abnormal category is obtained, each classification model can be added into a classification model library. After the video frame pictures of the target video are obtained, each object in each video frame picture is detected to obtain a plurality of image slices, and then the current image slice can be respectively input into each classification model in the classification model library aiming at each image slice. The current image slice is the image slice for which the current operation is directed.

For each classification model, after the current image slice is input to the classification model, the classification model may output an anomaly confidence, which may be in the form of a score.

Each classification model outputs corresponding abnormal confidence coefficient based on each image slice, and the identification result of each image slice corresponding to each abnormal category can be obtained according to the abnormal confidence coefficient output by each classification model.

The classification model corresponding to each abnormal category is used for classifying the objects in the image slices, so that the identification result of each image slice corresponding to each abnormal category can be accurately obtained, and a reliable basis can be provided for further judging the abnormal video.

In an embodiment of the present disclosure, for each classification model, if the confidence of the abnormality output by the current classification model is greater than a preset confidence threshold, the obtained identification result is that the current image slice belongs to the abnormality category corresponding to the current classification model.

In the embodiment of the present disclosure, the confidence threshold may be set according to an actual situation, and the confidence thresholds corresponding to different classification models may be the same or different.

For each image slice, the current image slice is input into each classification model in the classification model library. For each classification model, an anomaly confidence may be output for the current image slice, and the anomaly confidence output for the current image slice is compared with a confidence threshold corresponding to the current classification model.

If the confidence coefficient of the abnormality output by the current classification model is greater than the preset confidence coefficient threshold, the probability that the object in the current image slice belongs to the abnormal category corresponding to the current classification model is considered to be higher, and the obtained identification result is as follows: the current image slice belongs to the abnormal category corresponding to the current classification model. The current classification model is the classification model for which the current operation is directed.

If the confidence level of the anomaly output by the current classification model is less than or equal to the confidence level threshold, the probability that the object in the current image slice belongs to the anomaly class corresponding to the current classification model is considered to be small, and the obtained identification result is: the current image slice does not belong to the abnormal category corresponding to the current classification model.

According to the abnormal confidence coefficient output by the classification model, the recognition result of each classification model for each image slice can be accurately determined.

And respectively identifying a plurality of abnormal categories of the object in each image slice to obtain an identification result of each image slice corresponding to each abnormal category, and further determining whether the target video is an abnormal video according to the identification result of each image slice corresponding to each abnormal category.

In one embodiment of the present disclosure, determining whether the target video is an abnormal video according to the recognition result of each image slice corresponding to each abnormal category may include the following steps:

the first step is as follows: for each image slice, determining whether the current image slice belongs to at least one abnormal category in multiple abnormal categories according to the identification result of the current image slice corresponding to each abnormal category;

the second step is that: and if at least one image slice belonging to the abnormal category exists, determining that the target video is an abnormal video.

In the embodiment of the present disclosure, the identification of multiple abnormal categories is performed on the object in each image slice, and for each image slice, the identification result of the image slice corresponding to each abnormal category may be obtained. For example, for an image slice a, the obtained image slice a corresponding to the abnormality category 1 is identified as an image slice a not belonging to the abnormality category 1, the obtained image slice a corresponding to the abnormality category 2 is identified as an image slice a belonging to the abnormality category 2, and the obtained image slice a corresponding to the abnormality category 3 is identified as an image slice a not belonging to the abnormality category 3; for the image slice B, the identification result of the obtained image slice B corresponding to the abnormality category 1 is that the image slice B does not belong to the abnormality category 1, the identification result of the obtained image slice B corresponding to the abnormality category 2 is that the image slice B does not belong to the abnormality category 2, and the identification result of the obtained image slice B corresponding to the abnormality category 3 is that the image slice B does not belong to the abnormality category 3.

For each image slice, it may be determined whether the current image slice belongs to one of the plurality of exception categories based on the identification of the current image slice corresponding to each exception category. The current image slice is the image slice for which the current operation is directed. As in the above example, the image slice a does not belong to the abnormality category 1 and the abnormality category 3, but belongs to the abnormality category 2, and it can be determined that the image slice a belongs to at least one abnormality category of the plurality of abnormality categories. As in the above example, the image slice B does not belong to the abnormality category 1, the abnormality category 2, and the abnormality category 3, and it can be determined that the image slice B does not belong to any one of the abnormality categories.

If there is at least one image slice belonging to the abnormal category, the target video may be determined to be an abnormal video. For example, the image slice a and the image slice B are both image slices of the target video, and because the image slice a belongs to the abnormality category 2, and does not belong to the abnormality category 1 and the abnormality category 3, and the image slice B does not belong to the abnormality category 1, the abnormality category 2, and the abnormality category 3, it can be determined that at least one image slice belonging to the abnormality category exists in the target video, and the target video is an abnormal video.

As long as at least one image slice belonging to the abnormal category exists, the target video is determined as the abnormal video, and the probability of false negative can be effectively reduced.

In one embodiment of the present disclosure, the method may further comprise the steps of:

the first step is as follows: acquiring a new classification model, wherein the new classification model is used for identifying a new abnormal class;

the second step is that: the new classification model is added to the classification model library to increase the identification of new anomaly classes.

In an actual application scenario, with continuous development of services, the number, types, configurations, and the like of new exception categories are diversified and dynamic, and identification of new exception categories may need to be added. In this case, a new classification model corresponding to the new anomaly class may be trained. The new classification model may be used to perform the identification of new anomaly classes.

After obtaining the new classification model, the new classification model may be added to the classification model library, so that the classification model library may include the new classification model in addition to the existing classification model. When there is a target video to be detected, after detecting each object in each video frame picture of the target video to obtain a plurality of image slices, respectively inputting the current image slice into each classification model in the classification model library for each image slice, wherein the classification model includes a new classification model, and obtaining an identification result of each image slice corresponding to each abnormal category according to the abnormal confidence coefficient output by each classification model, wherein the identification result of each image slice corresponding to the new abnormal category includes the identification result of each image slice corresponding to the new abnormal category. According to the recognition result of each image slice corresponding to each abnormal category, whether the target video is an abnormal video or not can be determined.

The method and the device have strong adaptability to unknown abnormal categories, and when new abnormal categories appear, the classification model base can be updated rapidly, the identification of the new abnormal categories is increased, and the accuracy of abnormal video detection is improved.

For the sake of understanding, the embodiment of the present disclosure will be described by taking the specific process of detecting abnormal video shown in fig. 3 as an example.

Performing frame extraction processing on a target video to be detected to obtain a video frame picture, wherein the video frame picture can be a key frame picture of the target video;

detecting each object in each video frame picture by using a detection model based on the open world, and obtaining an image slice of each detection frame in each video frame picture according to a detection result;

according to the abnormal confidence coefficient output by each classification model, the identification result of each image slice corresponding to each abnormal category can be obtained, for example, a certain image slice belongs to the abnormal category 1, does not belong to the abnormal category 2, belongs to the abnormal category 3 and the like;

according to the identification result of each image slice corresponding to each abnormal category, whether the target video is an abnormal video can be determined;

when a new classification model is obtained, the new classification model may be added to the classification model library to increase the identification of new anomaly classes.

It can be understood that, compared with the detection model, the classification model has more advantages in the aspects of implementation, resource utilization and the like, for example, when data is labeled, only the whole graph needs to be labeled, which is more efficient, and the calculation resources and the calculation time in the inference stage are less. According to the embodiment of the disclosure, a detection and identification mode of firstly detecting and then classifying is adopted, only one detection model is used at the bottom layer, and the classification models corresponding to different abnormal classes are used at the top layer to perform abnormal detection on the target video, so that the computing resources and the computing time can be effectively saved, and a better effect can be obtained when the difference of different target scales is large. Moreover, when a new abnormal class appears, only a new classification model corresponding to the new abnormal class needs to be obtained through training, and the training of the bottom detection model is not needed, so that the training cost of the bottom detection model can be saved. In addition, the bottom layer uses detection models in a unified mode, the top layer uses different classification models, the number of the models is simplified, and model management is facilitated.

Corresponding to the above method embodiments, the embodiments of the present disclosure further provide an abnormal video detection apparatus, and the abnormal video detection apparatus described below and the abnormal video detection method described above may be referred to in correspondence with each other.

Referring to fig. 4, the apparatus may include the following modules:

a video frame picture obtaining module 410 configured to perform obtaining a video frame picture of a target video;

an image slice obtaining module 420 configured to perform detection on each object in each video frame picture, resulting in a plurality of image slices, each image slice including one object;

an identification result obtaining module 430 configured to perform identification of a plurality of abnormal categories for the object in each image slice, respectively, to obtain an identification result corresponding to each abnormal category for each image slice;

and an abnormal video determination module 440 configured to determine whether the target video is an abnormal video according to the identification result of each image slice corresponding to each abnormal category.

By applying the device provided by the embodiment of the disclosure, when the target video is detected abnormally, only one object detection is needed, each object in each video frame picture is detected, after the image slice is obtained, the image slice is classified and processed based on different abnormal categories, the abnormal video is judged according to the identification result of each image slice corresponding to each abnormal category, the use times of detection modes are reduced, the calculation resources and the calculation time are saved, the detection efficiency of the abnormal video is improved, moreover, only corresponding classification modes need to be maintained independently for different abnormal categories, the detection modes based on the device are maintained uniformly, and the maintenance cost is reduced.

In one embodiment of the present disclosure, the image slice obtaining module 420 is configured to perform:

training a pre-established detection initial model based on the open world based on training pictures in the training picture set and object labels in each training picture;

after training is completed, an open-world based detection model is obtained.

In a specific embodiment of the present disclosure, the recognition result obtaining module 430 is configured to perform:

and obtaining the recognition result of the current image slice corresponding to each abnormal category according to the abnormal confidence coefficient output by each classification model.

acquiring a new classification model, wherein the new classification model is used for identifying a new abnormal class;

the new classification model is added to the classification model library to increase the identification of new anomaly classes.

In a specific embodiment of the present disclosure, the abnormal video determination module 440 is configured to perform:

and if at least one image slice belonging to the abnormal category exists, determining that the target video is an abnormal video.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Corresponding to the above method embodiment, this disclosed embodiment also provides an electronic device, including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the above-described abnormal video detection method.

As shown in fig. 5, which is a schematic view of a composition structure of an electronic device, the electronic device may include: a processor 10, a memory 11, a communication interface 12 and a communication bus 13. The processor 10, the memory 11 and the communication interface 12 all communicate with each other through a communication bus 13.

In the disclosed embodiment, the processor 10 may be a Central Processing Unit (CPU), an application specific integrated circuit, a digital signal processor, a field programmable gate array or other programmable logic device, and the like.

The processor 10 may call a program stored in the memory 11, and in particular, the processor 10 may perform operations in an embodiment of the abnormal video detection method.

The memory 11 is used for storing one or more programs, the program may include program codes, the program codes include computer operation instructions, and in the embodiment of the present disclosure, at least the program for implementing the following functions is stored in the memory 11:

obtaining a video frame picture of a target video;

In one possible implementation, the memory 11 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function (such as an object detection function, a category identification function), and the like; the storage data area may store data created during use, such as image slice data, recognition result data, and the like.

Further, the memory 11 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device or other volatile solid state storage device.

The communication interface 12 may be an interface of a communication module for connecting with other devices or systems.

Of course, it should be noted that the structure shown in fig. 5 does not constitute a limitation to the electronic device in the embodiment of the present disclosure, and in practical applications, the electronic device may include more or less components than those shown in fig. 5, or some components may be combined.

Corresponding to the above method embodiments, the present disclosure also provides a computer-readable storage medium, where instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the above abnormal video detection method.

Further, it should be noted that: embodiments of the present disclosure also provide a computer program product or computer program that may include computer instructions, which may be stored in a computer-readable storage medium. The processor of the electronic device reads the computer instruction from the computer-readable storage medium, and the processor can execute the computer instruction, so that the electronic device performs the description of the abnormal video detection method in the foregoing embodiment, and therefore, the description will not be repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in the embodiments of the computer program product or the computer program referred to in the present disclosure, refer to the description of the embodiments of the method of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An abnormal video detection method, comprising:

obtaining a video frame picture of a target video;

2. The abnormal video detection method according to claim 1, wherein the detecting each object in each video frame picture to obtain a plurality of image slices comprises:

3. The abnormal video detection method according to claim 2, wherein the open world based detection model is obtained by:

4. The abnormal video detection method according to claim 1, wherein the identifying the plurality of abnormal categories of the object in each image slice respectively to obtain the identification result of each image slice corresponding to each abnormal category comprises:

5. The abnormal video detection method according to claim 4, wherein the obtaining of the recognition result of the current image slice corresponding to each abnormal category according to the abnormal confidence level output by each classification model comprises:

6. The abnormal video detection method according to claim 4, further comprising:

7. An abnormal video detection apparatus, comprising:

8. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the anomalous video detection method as claimed in any one of claims 1 to 6.

9. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the anomalous video detection method as recited in any one of claims 1 to 6.

10. A computer program product comprising computer instructions stored in a computer readable storage medium and adapted to be read and executed by a processor to cause an electronic device having the processor to perform the anomalous video detection method as claimed in any one of claims 1 to 6.