CN114882414A - Abnormal video detection method, abnormal video detection device, electronic equipment, abnormal video detection medium and program product - Google Patents

Abnormal video detection method, abnormal video detection device, electronic equipment, abnormal video detection medium and program product Download PDF

Info

Publication number
CN114882414A
CN114882414A CN202210535186.5A CN202210535186A CN114882414A CN 114882414 A CN114882414 A CN 114882414A CN 202210535186 A CN202210535186 A CN 202210535186A CN 114882414 A CN114882414 A CN 114882414A
Authority
CN
China
Prior art keywords
abnormal
video
image slice
detection
classification model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210535186.5A
Other languages
Chinese (zh)
Inventor
肖慧慧
李家宏
李思则
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202210535186.5A priority Critical patent/CN114882414A/en
Publication of CN114882414A publication Critical patent/CN114882414A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to an abnormal video detection method, apparatus, electronic device, medium, and program product, the method comprising the steps of: obtaining a video frame picture of a target video; detecting each object in each video frame picture to obtain a plurality of image slices, wherein each image slice comprises one object; respectively identifying multiple abnormal categories of objects in each image slice to obtain an identification result of each image slice corresponding to each abnormal category; and determining whether the target video is an abnormal video or not according to the identification result of each image slice corresponding to each abnormal category. By applying the technical scheme provided by the disclosure, the computing resources and the computing time are saved, the detection efficiency of the abnormal video is improved, in addition, only corresponding classification modes need to be maintained independently for different abnormal categories, the detection modes based on the abnormal categories are maintained in a unified way, and the maintenance cost is reduced.

Description

Abnormal video detection method, abnormal video detection device, electronic equipment, abnormal video detection medium and program product
Technical Field
The present disclosure relates to the field of computer application technologies, and in particular, to a method, an apparatus, an electronic device, a medium, and a program product for detecting an abnormal video.
Background
With the rapid development of computer technology and terminal technology, various application clients capable of being installed in the terminal are gradually increased, and the application clients interact with the application server to provide corresponding functions for users. For example, the video application client interacts with the video application server to provide functions such as video uploading, video watching, video comment and the like for the user. The user can upload own video to the video application server through the video application client, and the video application server can be watched by other users after publishing the video to the network. In order to ensure that the video content has correct guidance, the video application server needs to perform anomaly detection on the video uploaded by the user so as to avoid the abnormal video from being released on the network.
Currently, a plurality of abnormal categories are preset, and various abnormal categories are detected for a video to be detected to determine whether the video is an abnormal video. Wherein, different detection and identification modes are set for different abnormal categories. For example, for the abnormal category a, semantic information is strong, a better effect can be achieved by directly setting a detection and identification mode as a classification mode, for the abnormal category B, most of the targets are small targets, the detection and identification mode can be set only as a detection mode, the detection and identification mode can be set as a detection mode before classification to improve accuracy, and for the abnormal category C, the detection and identification mode can be set as a detection mode. The detection method consumes more computing resources and time than the classification method. When the video to be detected is detected in different abnormal categories, the detection mode may need to be used for multiple times, which consumes more computing resources and computing time, so that the abnormal video detection efficiency is low, and the detection identification modes corresponding to different abnormal categories need to be maintained independently, so that the maintenance cost is high.
Disclosure of Invention
An object of the present disclosure is to provide a method, an apparatus, an electronic device, a medium, and a program product for detecting an abnormal video, so as to save computation resources and computation time, improve detection efficiency of the abnormal video, and reduce maintenance cost.
In order to solve the technical problem, the present disclosure provides the following technical solutions:
according to a first aspect of the embodiments of the present disclosure, there is provided an abnormal video detection method, including:
obtaining a video frame picture of a target video;
detecting each object in each video frame picture to obtain a plurality of image slices, wherein each image slice comprises one object;
respectively identifying multiple abnormal categories of objects in each image slice to obtain an identification result of each image slice corresponding to each abnormal category;
and determining whether the target video is an abnormal video or not according to the identification result of each image slice corresponding to each abnormal category.
In a specific embodiment of the present disclosure, the detecting each object in each video frame picture to obtain a plurality of image slices includes:
detecting each object in each video frame picture by using a detection model based on the open world obtained by pre-training;
and obtaining the image slice of each detection frame in each video frame picture according to the detection result.
In one embodiment of the present disclosure, the open world based detection model is obtained by:
obtaining a training picture set, wherein each training picture in the training picture set is marked with an object;
training a pre-established detection initial model based on the open world based on the training pictures in the training picture set and the object labels in each training picture;
and after the training is finished, obtaining the open world-based detection model.
In an embodiment of the present disclosure, the performing identification of multiple abnormal categories on the object in each image slice respectively to obtain an identification result of each image slice corresponding to each abnormal category includes:
respectively inputting the current image slice into each classification model in a classification model library aiming at each image slice, wherein the classification model library comprises a plurality of different classification models, and the different classification models are used for identifying different abnormal classes;
and obtaining the identification result of the current image slice corresponding to each abnormal category according to the abnormal confidence coefficient output by each classification model.
In a specific embodiment of the present disclosure, the obtaining, according to the confidence level of the abnormality output by each classification model, a recognition result of the current image slice corresponding to each abnormality category includes:
for each classification model, if the output abnormal confidence coefficient of the current classification model is greater than a preset confidence coefficient threshold value, the obtained identification result is that the current image slice belongs to the abnormal category corresponding to the current classification model.
In a specific embodiment of the present disclosure, the method further includes:
obtaining a new classification model, wherein the new classification model is used for identifying a new abnormal class;
adding the new classification model to the classification model library to increase identification of the new anomaly class.
In a specific embodiment of the present disclosure, the video frame picture is a key frame picture in the target video.
In a specific embodiment of the present disclosure, the determining whether the target video is an abnormal video according to the recognition result of each image slice corresponding to each abnormal category includes:
for each image slice, determining whether the current image slice belongs to at least one abnormal category in multiple abnormal categories according to the identification result of the current image slice corresponding to each abnormal category;
determining the target video to be an abnormal video if there is at least one image slice belonging to an abnormal category.
According to a second aspect of the embodiments of the present disclosure, there is provided an abnormal video detection apparatus including:
a video frame picture obtaining module configured to perform obtaining a video frame picture of a target video;
the image slice obtaining module is configured to detect each object in each video frame picture to obtain a plurality of image slices, and each image slice contains one object;
the identification result obtaining module is configured to perform identification of multiple abnormal categories of the object in each image slice respectively to obtain an identification result of each image slice corresponding to each abnormal category;
and the abnormal video determining module is configured to determine whether the target video is an abnormal video according to the identification result of each image slice corresponding to each abnormal category.
In a specific embodiment of the present disclosure, the image slice obtaining module is configured to perform:
detecting each object in each video frame picture by using a detection model based on the open world obtained by pre-training;
and obtaining the image slice of each detection frame in each video frame picture according to the detection result.
In a specific embodiment of the present disclosure, the method further includes a detection model obtaining module configured to obtain the open world-based detection model by:
obtaining a training picture set, wherein each training picture in the training picture set is marked with an object;
training a pre-established detection initial model based on the open world based on the training pictures in the training picture set and the object labels in each training picture;
and after the training is finished, obtaining the open world-based detection model.
In a specific embodiment of the present disclosure, the identification result obtaining module is configured to perform:
respectively inputting the current image slice into each classification model in a classification model library aiming at each image slice, wherein the classification model library comprises a plurality of different classification models, and the different classification models are used for identifying different abnormal classes;
and obtaining the identification result of the current image slice corresponding to each abnormal category according to the abnormal confidence coefficient output by each classification model.
In a specific embodiment of the present disclosure, the identification result obtaining module is configured to perform:
for each classification model, if the output abnormal confidence coefficient of the current classification model is greater than a preset confidence coefficient threshold value, the obtained identification result is that the current image slice belongs to the abnormal category corresponding to the current classification model.
In a specific embodiment of the present disclosure, the method further includes a classification model library updating module configured to perform:
obtaining a new classification model, wherein the new classification model is used for identifying a new abnormal class;
adding the new classification model to the classification model library to increase identification of the new anomaly class.
In a specific embodiment of the present disclosure, the video frame picture is a key frame picture in the target video.
In a specific embodiment of the present disclosure, the abnormal video determination module is configured to perform:
for each image slice, determining whether the current image slice belongs to at least one abnormal category in multiple abnormal categories according to the identification result of the current image slice corresponding to each abnormal category;
determining the target video to be an abnormal video if there is at least one image slice belonging to an abnormal category.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the anomalous video detection method as described in the first aspect.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the abnormal video detection method according to the first aspect.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer instructions stored in a computer-readable storage medium and adapted to be read and executed by a processor to cause an electronic device having the processor to perform the abnormal video detection method according to the first aspect.
By applying the technical scheme provided by the embodiment of the disclosure, when the target video is detected abnormally, only one object detection is needed, each object in each video frame picture is detected, after the image slice is obtained, the image slice is classified and processed based on different abnormal categories, the abnormal video is judged according to the identification result of each image slice corresponding to each abnormal category, the use times of the detection modes are reduced, the calculation resources and the calculation time are saved, the detection efficiency of the abnormal video is improved, moreover, only the corresponding classification modes need to be maintained independently for different abnormal categories, the detection modes based on the abnormal video are maintained uniformly, and the maintenance cost is reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic diagram of an abnormal video detection scenario in an embodiment of the present disclosure;
FIG. 2 is a flowchart illustrating an implementation of a method for detecting abnormal videos according to an embodiment of the present disclosure;
FIG. 3 is a diagram illustrating an exemplary process of detecting abnormal videos according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an abnormal video detection apparatus according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an electronic device in an embodiment of the disclosure.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The core of the disclosure is to provide an abnormal video detection method, which can be applied to any scene needing abnormal detection of videos. For example, a video application server deployed in the cloud performs anomaly detection on a video uploaded by a user. Specifically, as shown in fig. 1, after a user records a video, if the user wants to distribute the video to a network for other users to watch, the video can be uploaded to a video application server through a video application client installed on a terminal, such as the video application client 1. The video application server can perform anomaly detection on the video by itself or by calling other detection services by adopting the technical scheme provided by the embodiment of the disclosure. After determining that the video is an abnormal video, an abnormal determination result may be returned to the video application client 1. The video application client 1 can output abnormal video prompt information to the user, and can also output abnormal category prompt information at the same time. And the user can adjust and upload the video again according to the prompt message. If the video application server determines that the video is not an abnormal video, other video distribution preparation works can be performed, so that the video is distributed to the network after the preparation works are completed, and other users watch the video through the video application clients, such as the video application client 2 and the video application client 3.
By executing the scheme on the video application server side, the abnormal video can be quickly detected by means of strong processing capacity of the server.
Of course, when the user uploads the video to the video application server through the video application client, the video application client may perform anomaly detection on the video by using the technical scheme provided by the embodiment of the present disclosure. After the video is determined to be the abnormal video, the abnormal video prompt information can be output to the user, and meanwhile, the abnormal category prompt information can also be output. And the user can adjust and upload the video again according to the prompt message. If the video application client determines that the video is not the abnormal video, the video can be uploaded to the video application server, and the video application server performs other video publishing preparation work so as to publish the video to the network for other users to watch after the preparation work is completed.
By executing the scheme on the video application client side, the abnormal detection can be finished without uploading the video to the video application server, the network is not required, and abnormal video prompt information can be timely returned to the user when the video is determined to be the abnormal video.
Whether the execution is performed on the video application server side or the video application client side, after the video frame picture of the target video is obtained, the execution party of the technical scheme provided by the embodiment of the disclosure firstly detects each object in each video frame picture to obtain a plurality of image slices, each image slice comprises one object, namely each object in each video frame picture is detected, then identifies the object in each image slice in multiple abnormal categories to obtain the identification result of each image slice corresponding to each abnormal category, and then determines whether the target video is the abnormal video according to the identification result of each image slice corresponding to each abnormal category. When the target video is detected abnormally, only one object detection is needed, after image slices are obtained, the image slices are classified and processed based on different abnormal categories, the abnormal video is judged according to the identification result of each image slice corresponding to each abnormal category, the use times of detection modes are reduced, computing resources and computing time are saved, the detection efficiency of the abnormal video is improved, in addition, only the corresponding classification modes need to be maintained independently for different abnormal categories, the detection modes based on the abnormal video are maintained uniformly, and the maintenance cost is reduced.
Referring to fig. 2, a flowchart of an implementation of an abnormal video detection method provided in an embodiment of the present disclosure is shown, where the method may include the following steps:
s210: and obtaining a video frame picture of the target video.
In the embodiment of the present disclosure, the target video may be any video to be currently subjected to the abnormality detection. Any video uploaded by any user can be used as a target video for anomaly detection. The credibility of the user can be determined according to the historical uploading behavior of the user or the attribute of the user, if the credibility of the user is smaller than a preset credibility threshold value, any video uploaded by the user is used as a target video for carrying out abnormity detection, and otherwise, the video uploaded by the user is not subjected to abnormity detection.
After the target video to be subjected to the abnormal detection at present is determined, a video frame picture of the target video can be further obtained. The video frame picture of the target video can be each frame picture of the target video, namely, the abnormal video detection is carried out on the basis of each frame picture of the target video so as to improve the accuracy of the abnormal video detection. The video frame pictures of the target video can also be key frame pictures in the target video, namely key frame pictures obtained after frame extraction processing is carried out on the target video, the key frame pictures refer to frames where key actions in movement changes of roles or objects are located, the number of the key frame pictures is smaller than that of all the frame pictures of the target video, anomaly detection is carried out on the basis of the key frame pictures, and the detection efficiency of the anomaly video can be improved while the accuracy of the anomaly video detection is ensured.
S220: and detecting each object in each video frame picture to obtain a plurality of image slices.
Each image slice contains an object.
In the embodiment of the present disclosure, after the video frame picture of the target video is obtained, each object in each video frame picture may be further detected, all objects in each video picture are detected, and then a corresponding image slice is obtained for each object in each detected video picture. Each image slice contains an object.
For example, for any one video frame picture, if the video frame picture includes three objects, namely a human body, a cat and a cup, each object in the video frame picture is detected, and it can be detected that the video frame picture includes three objects, so as to obtain three image slices, where the image slice 1 only includes the human body, the image slice 2 only includes the cat, and the image slice 3 only includes the cup.
S230: and respectively identifying a plurality of abnormal categories of the object in each image slice to obtain an identification result of each image slice corresponding to each abnormal category.
In the embodiment of the present disclosure, a plurality of types of abnormality may be set according to actual conditions. Such as a first exception category, a second exception category, a third exception category, and so forth.
After detecting each object in each video frame picture of the target video to obtain a plurality of image slices, identifying a plurality of abnormal categories of the objects in each image slice. Specifically, for each image slice and each abnormality category, it can be determined whether an object in the image slice belongs to the abnormality category. After identification, the identification result corresponding to each abnormal category can be obtained for each image slice.
For any one of the abnormal categories, the identification result of an image slice corresponding to the abnormal category may be two types, one type is that the image slice belongs to the abnormal category, and the other type is that the image slice does not belong to the abnormal category.
S240: and determining whether the target video is an abnormal video or not according to the identification result of each image slice corresponding to each abnormal category.
After the object in each image slice is identified by the multiple abnormal categories to obtain the identification result of each image slice corresponding to each abnormal category, whether the target video is an abnormal video can be determined according to the identification result of each image slice corresponding to each abnormal category, and meanwhile, when the target video is determined to be an abnormal video, the existing abnormal category can be determined.
An abnormality determination rule may be set in advance, and when the recognition result of each image slice corresponding to each abnormality category conforms to the abnormality determination rule, the target video may be determined to be an abnormal video. For example, the set abnormality determination rule is to determine that the target video is an abnormal video if an object in one image slice belongs to a certain abnormal category. For another example, the set abnormality determination rule is that, for the abnormality type 1, if there is an object in one image slice that belongs to the abnormality type 1, the target video is determined to be an abnormal video, and for the abnormality type 2 and the abnormality type 3, when the number of image slices in which the included object belongs to the abnormality type 2 and the abnormality type 3 is greater than a set number threshold, the target video is determined to be an abnormal video.
After the target video is determined to be the abnormal video, abnormal prompt information can be further sent to a user or other warning devices so as to process the abnormal video in time.
If the target video is determined not to be the abnormal video, subsequent operations such as watermarking, encryption and the like can be continuously carried out on the target video.
By applying the method provided by the embodiment of the disclosure, when the target video is detected abnormally, only one object detection is needed, each object in each video frame picture is detected, after the image slice is obtained, the image slices are classified and processed based on different abnormal categories, and the abnormal video is judged according to the identification result of each image slice corresponding to each abnormal category, so that the use times of the detection modes are reduced, the calculation resources and the calculation time are saved, the detection efficiency of the abnormal video is improved, moreover, only the corresponding classification modes need to be maintained independently for different abnormal categories, the detection modes based on the detection modes are maintained uniformly, and the maintenance cost is reduced.
In one embodiment of the present disclosure, step S220 may include the steps of:
the method comprises the following steps: detecting each object in each video frame picture by using a detection model based on the open world obtained by pre-training;
step two: and obtaining the image slice of each detection frame in each video frame picture according to the detection result.
For convenience of description, the above two steps are combined for illustration.
In the embodiment of the disclosure, an open-world-based detection model may be obtained by training in advance, and all objects appearing in an image may be detected by the open-world-based detection model.
After the video frame pictures of the target video are obtained, each object in each video frame picture can be detected by using an open world-based detection model obtained through pre-training. For each video frame picture, the video frame picture may be input into an open world-based detection model, the detection model detects an object in the video frame picture, and may output a detection result, where the detection result may include position information of each detection frame in the video frame picture, and each detection frame includes one object.
And according to the detection result, obtaining the image slice of each detection frame in each video frame picture. Each detection frame contains an object, and the image slice of each detection frame only contains one object.
All objects in each video frame picture can be accurately detected by using the detection model based on the open world, so that the full number of image slices can be obtained, and the smooth detection of the abnormity based on the image slices can be guaranteed.
In one embodiment of the present disclosure, the open world based detection model may be obtained by:
the first step is as follows: obtaining a training picture set, wherein each training picture in the training picture set is marked by an object;
the second step is that: training a pre-established detection initial model based on the open world based on training pictures in the training picture set and object labels in each training picture;
the third step: after training is completed, an open-world based detection model is obtained.
For convenience of description, the above three steps are combined for illustration.
In the embodiment of the present disclosure, an open-world-based detection initial model may be established in advance, and each parameter in the detection initial model may be initialized.
The training picture set can be obtained by picture collection, picture making and other modes or by utilizing an existing picture library. The training picture set can comprise a plurality of training pictures, and the larger the number of the training pictures is, the more accurate the training of the model is. Each object in each training picture can be labeled in advance, for example, the objects are labeled by using a labeling frame, and each labeling frame only contains one object. That is, for the training picture, the specific class of the object is not distinguished, and as long as the object exists, the corresponding labeling is performed, and it can be considered that each object in the training picture is the object class.
After the training picture set is obtained, a pre-established detection initial model based on the open world can be trained based on the training pictures in the training picture set and the object labels in each training picture.
Specifically, iterative training may be performed on a pre-established open-world-based detection initial model based on a training picture set, after each iterative training, a prediction loss of the detection initial model is determined, parameters of the detection initial model are adjusted according to the prediction loss, and training is determined to be completed when the iteration number reaches a set iteration number threshold or the prediction loss stops decreasing.
A threshold number of iterations may be preset. Based on the training picture set, in the process of carrying out iterative training on the detection initial model based on the open world, after each iterative training, the prediction loss of the detection initial model can be determined according to the prediction result of the detection initial model and the object label in the training picture, for example, the prediction loss is determined through a loss function. By minimizing the loss objective, the inverse gradient adjustment detects the parameters of the initial model. When the initial detection model is trained, a class-advertising mode can be adopted, and the specific class of the object is not distinguished. In the continuous iteration process, the prediction loss gradually converges and continuously decreases. When the iteration number reaches a set iteration number threshold or the prediction loss stops decreasing, the training can be determined to be completed.
After the training is completed, the currently trained detection initial model can be determined as an open-world-based detection model for an actual anomaly detection scenario.
Each training picture of the training picture set is marked with an object, so that the detection initial model can be effectively trained, the detection accuracy of the finally obtained detection model is improved, and all objects in each video frame picture can be accurately detected.
In one embodiment of the present disclosure, step S230 may include the steps of:
the method comprises the following steps: respectively inputting the current image slice into each classification model in a classification model library aiming at each image slice, wherein the classification model library comprises a plurality of different classification models, and the different classification models are used for identifying different abnormal classes;
step two: and obtaining the recognition result of the current image slice corresponding to each abnormal category according to the abnormal confidence coefficient output by each classification model.
For convenience of description, the above two steps are combined for illustration.
In the embodiment of the disclosure, multiple classification models can be obtained by training in advance, and different classification models are used for identifying different abnormal classes.
For each abnormal class, a classification model corresponding to the abnormal class can be obtained through pre-training by the following steps:
obtaining a slice sample set, the slice sample set including positive samples belonging to the anomaly class and negative samples not belonging to the anomaly class;
and training a pre-established classification initial model corresponding to the abnormal category based on the slice sample set, and obtaining the classification model corresponding to the abnormal category after the training is finished.
After the classification model corresponding to each abnormal category is obtained, each classification model can be added into a classification model library. After the video frame pictures of the target video are obtained, each object in each video frame picture is detected to obtain a plurality of image slices, and then the current image slice can be respectively input into each classification model in the classification model library aiming at each image slice. The current image slice is the image slice for which the current operation is directed.
For each classification model, after the current image slice is input to the classification model, the classification model may output an anomaly confidence, which may be in the form of a score.
Each classification model outputs corresponding abnormal confidence coefficient based on each image slice, and the identification result of each image slice corresponding to each abnormal category can be obtained according to the abnormal confidence coefficient output by each classification model.
The classification model corresponding to each abnormal category is used for classifying the objects in the image slices, so that the identification result of each image slice corresponding to each abnormal category can be accurately obtained, and a reliable basis can be provided for further judging the abnormal video.
In an embodiment of the present disclosure, for each classification model, if the confidence of the abnormality output by the current classification model is greater than a preset confidence threshold, the obtained identification result is that the current image slice belongs to the abnormality category corresponding to the current classification model.
In the embodiment of the present disclosure, the confidence threshold may be set according to an actual situation, and the confidence thresholds corresponding to different classification models may be the same or different.
For each image slice, the current image slice is input into each classification model in the classification model library. For each classification model, an anomaly confidence may be output for the current image slice, and the anomaly confidence output for the current image slice is compared with a confidence threshold corresponding to the current classification model.
If the confidence coefficient of the abnormality output by the current classification model is greater than the preset confidence coefficient threshold, the probability that the object in the current image slice belongs to the abnormal category corresponding to the current classification model is considered to be higher, and the obtained identification result is as follows: the current image slice belongs to the abnormal category corresponding to the current classification model. The current classification model is the classification model for which the current operation is directed.
If the confidence level of the anomaly output by the current classification model is less than or equal to the confidence level threshold, the probability that the object in the current image slice belongs to the anomaly class corresponding to the current classification model is considered to be small, and the obtained identification result is: the current image slice does not belong to the abnormal category corresponding to the current classification model.
According to the abnormal confidence coefficient output by the classification model, the recognition result of each classification model for each image slice can be accurately determined.
And respectively identifying a plurality of abnormal categories of the object in each image slice to obtain an identification result of each image slice corresponding to each abnormal category, and further determining whether the target video is an abnormal video according to the identification result of each image slice corresponding to each abnormal category.
In one embodiment of the present disclosure, determining whether the target video is an abnormal video according to the recognition result of each image slice corresponding to each abnormal category may include the following steps:
the first step is as follows: for each image slice, determining whether the current image slice belongs to at least one abnormal category in multiple abnormal categories according to the identification result of the current image slice corresponding to each abnormal category;
the second step is that: and if at least one image slice belonging to the abnormal category exists, determining that the target video is an abnormal video.
For convenience of description, the above two steps are combined for illustration.
In the embodiment of the present disclosure, the identification of multiple abnormal categories is performed on the object in each image slice, and for each image slice, the identification result of the image slice corresponding to each abnormal category may be obtained. For example, for an image slice a, the obtained image slice a corresponding to the abnormality category 1 is identified as an image slice a not belonging to the abnormality category 1, the obtained image slice a corresponding to the abnormality category 2 is identified as an image slice a belonging to the abnormality category 2, and the obtained image slice a corresponding to the abnormality category 3 is identified as an image slice a not belonging to the abnormality category 3; for the image slice B, the identification result of the obtained image slice B corresponding to the abnormality category 1 is that the image slice B does not belong to the abnormality category 1, the identification result of the obtained image slice B corresponding to the abnormality category 2 is that the image slice B does not belong to the abnormality category 2, and the identification result of the obtained image slice B corresponding to the abnormality category 3 is that the image slice B does not belong to the abnormality category 3.
For each image slice, it may be determined whether the current image slice belongs to one of the plurality of exception categories based on the identification of the current image slice corresponding to each exception category. The current image slice is the image slice for which the current operation is directed. As in the above example, the image slice a does not belong to the abnormality category 1 and the abnormality category 3, but belongs to the abnormality category 2, and it can be determined that the image slice a belongs to at least one abnormality category of the plurality of abnormality categories. As in the above example, the image slice B does not belong to the abnormality category 1, the abnormality category 2, and the abnormality category 3, and it can be determined that the image slice B does not belong to any one of the abnormality categories.
If there is at least one image slice belonging to the abnormal category, the target video may be determined to be an abnormal video. For example, the image slice a and the image slice B are both image slices of the target video, and because the image slice a belongs to the abnormality category 2, and does not belong to the abnormality category 1 and the abnormality category 3, and the image slice B does not belong to the abnormality category 1, the abnormality category 2, and the abnormality category 3, it can be determined that at least one image slice belonging to the abnormality category exists in the target video, and the target video is an abnormal video.
As long as at least one image slice belonging to the abnormal category exists, the target video is determined as the abnormal video, and the probability of false negative can be effectively reduced.
In one embodiment of the present disclosure, the method may further comprise the steps of:
the first step is as follows: acquiring a new classification model, wherein the new classification model is used for identifying a new abnormal class;
the second step is that: the new classification model is added to the classification model library to increase the identification of new anomaly classes.
For convenience of description, the above two steps are combined for illustration.
In an actual application scenario, with continuous development of services, the number, types, configurations, and the like of new exception categories are diversified and dynamic, and identification of new exception categories may need to be added. In this case, a new classification model corresponding to the new anomaly class may be trained. The new classification model may be used to perform the identification of new anomaly classes.
After obtaining the new classification model, the new classification model may be added to the classification model library, so that the classification model library may include the new classification model in addition to the existing classification model. When there is a target video to be detected, after detecting each object in each video frame picture of the target video to obtain a plurality of image slices, respectively inputting the current image slice into each classification model in the classification model library for each image slice, wherein the classification model includes a new classification model, and obtaining an identification result of each image slice corresponding to each abnormal category according to the abnormal confidence coefficient output by each classification model, wherein the identification result of each image slice corresponding to the new abnormal category includes the identification result of each image slice corresponding to the new abnormal category. According to the recognition result of each image slice corresponding to each abnormal category, whether the target video is an abnormal video or not can be determined.
The method and the device have strong adaptability to unknown abnormal categories, and when new abnormal categories appear, the classification model base can be updated rapidly, the identification of the new abnormal categories is increased, and the accuracy of abnormal video detection is improved.
For the sake of understanding, the embodiment of the present disclosure will be described by taking the specific process of detecting abnormal video shown in fig. 3 as an example.
Performing frame extraction processing on a target video to be detected to obtain a video frame picture, wherein the video frame picture can be a key frame picture of the target video;
detecting each object in each video frame picture by using a detection model based on the open world, and obtaining an image slice of each detection frame in each video frame picture according to a detection result;
respectively inputting the current image slice into each classification model in a classification model library aiming at each image slice, wherein the classification model library comprises a plurality of different classification models, and the different classification models are used for identifying different abnormal classes;
according to the abnormal confidence coefficient output by each classification model, the identification result of each image slice corresponding to each abnormal category can be obtained, for example, a certain image slice belongs to the abnormal category 1, does not belong to the abnormal category 2, belongs to the abnormal category 3 and the like;
according to the identification result of each image slice corresponding to each abnormal category, whether the target video is an abnormal video can be determined;
when a new classification model is obtained, the new classification model may be added to the classification model library to increase the identification of new anomaly classes.
It can be understood that, compared with the detection model, the classification model has more advantages in the aspects of implementation, resource utilization and the like, for example, when data is labeled, only the whole graph needs to be labeled, which is more efficient, and the calculation resources and the calculation time in the inference stage are less. According to the embodiment of the disclosure, a detection and identification mode of firstly detecting and then classifying is adopted, only one detection model is used at the bottom layer, and the classification models corresponding to different abnormal classes are used at the top layer to perform abnormal detection on the target video, so that the computing resources and the computing time can be effectively saved, and a better effect can be obtained when the difference of different target scales is large. Moreover, when a new abnormal class appears, only a new classification model corresponding to the new abnormal class needs to be obtained through training, and the training of the bottom detection model is not needed, so that the training cost of the bottom detection model can be saved. In addition, the bottom layer uses detection models in a unified mode, the top layer uses different classification models, the number of the models is simplified, and model management is facilitated.
Corresponding to the above method embodiments, the embodiments of the present disclosure further provide an abnormal video detection apparatus, and the abnormal video detection apparatus described below and the abnormal video detection method described above may be referred to in correspondence with each other.
Referring to fig. 4, the apparatus may include the following modules:
a video frame picture obtaining module 410 configured to perform obtaining a video frame picture of a target video;
an image slice obtaining module 420 configured to perform detection on each object in each video frame picture, resulting in a plurality of image slices, each image slice including one object;
an identification result obtaining module 430 configured to perform identification of a plurality of abnormal categories for the object in each image slice, respectively, to obtain an identification result corresponding to each abnormal category for each image slice;
and an abnormal video determination module 440 configured to determine whether the target video is an abnormal video according to the identification result of each image slice corresponding to each abnormal category.
By applying the device provided by the embodiment of the disclosure, when the target video is detected abnormally, only one object detection is needed, each object in each video frame picture is detected, after the image slice is obtained, the image slice is classified and processed based on different abnormal categories, the abnormal video is judged according to the identification result of each image slice corresponding to each abnormal category, the use times of detection modes are reduced, the calculation resources and the calculation time are saved, the detection efficiency of the abnormal video is improved, moreover, only corresponding classification modes need to be maintained independently for different abnormal categories, the detection modes based on the device are maintained uniformly, and the maintenance cost is reduced.
In one embodiment of the present disclosure, the image slice obtaining module 420 is configured to perform:
detecting each object in each video frame picture by using a detection model based on the open world obtained by pre-training;
and obtaining the image slice of each detection frame in each video frame picture according to the detection result.
In a specific embodiment of the present disclosure, the method further includes a detection model obtaining module configured to obtain the open world-based detection model by:
obtaining a training picture set, wherein each training picture in the training picture set is marked with an object;
training a pre-established detection initial model based on the open world based on training pictures in the training picture set and object labels in each training picture;
after training is completed, an open-world based detection model is obtained.
In a specific embodiment of the present disclosure, the recognition result obtaining module 430 is configured to perform:
respectively inputting the current image slice into each classification model in a classification model library aiming at each image slice, wherein the classification model library comprises a plurality of different classification models, and the different classification models are used for identifying different abnormal classes;
and obtaining the recognition result of the current image slice corresponding to each abnormal category according to the abnormal confidence coefficient output by each classification model.
In a specific embodiment of the present disclosure, the recognition result obtaining module 430 is configured to perform:
for each classification model, if the output abnormal confidence coefficient of the current classification model is greater than a preset confidence coefficient threshold value, the obtained identification result is that the current image slice belongs to the abnormal category corresponding to the current classification model.
In a specific embodiment of the present disclosure, the method further includes a classification model library updating module configured to perform:
acquiring a new classification model, wherein the new classification model is used for identifying a new abnormal class;
the new classification model is added to the classification model library to increase the identification of new anomaly classes.
In a specific embodiment of the present disclosure, the video frame picture is a key frame picture in the target video.
In a specific embodiment of the present disclosure, the abnormal video determination module 440 is configured to perform:
for each image slice, determining whether the current image slice belongs to at least one abnormal category in multiple abnormal categories according to the identification result of the current image slice corresponding to each abnormal category;
and if at least one image slice belonging to the abnormal category exists, determining that the target video is an abnormal video.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Corresponding to the above method embodiment, this disclosed embodiment also provides an electronic device, including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the above-described abnormal video detection method.
As shown in fig. 5, which is a schematic view of a composition structure of an electronic device, the electronic device may include: a processor 10, a memory 11, a communication interface 12 and a communication bus 13. The processor 10, the memory 11 and the communication interface 12 all communicate with each other through a communication bus 13.
In the disclosed embodiment, the processor 10 may be a Central Processing Unit (CPU), an application specific integrated circuit, a digital signal processor, a field programmable gate array or other programmable logic device, and the like.
The processor 10 may call a program stored in the memory 11, and in particular, the processor 10 may perform operations in an embodiment of the abnormal video detection method.
The memory 11 is used for storing one or more programs, the program may include program codes, the program codes include computer operation instructions, and in the embodiment of the present disclosure, at least the program for implementing the following functions is stored in the memory 11:
obtaining a video frame picture of a target video;
detecting each object in each video frame picture to obtain a plurality of image slices, wherein each image slice comprises one object;
respectively identifying multiple abnormal categories of objects in each image slice to obtain an identification result of each image slice corresponding to each abnormal category;
and determining whether the target video is an abnormal video or not according to the identification result of each image slice corresponding to each abnormal category.
In one possible implementation, the memory 11 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function (such as an object detection function, a category identification function), and the like; the storage data area may store data created during use, such as image slice data, recognition result data, and the like.
Further, the memory 11 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device or other volatile solid state storage device.
The communication interface 12 may be an interface of a communication module for connecting with other devices or systems.
Of course, it should be noted that the structure shown in fig. 5 does not constitute a limitation to the electronic device in the embodiment of the present disclosure, and in practical applications, the electronic device may include more or less components than those shown in fig. 5, or some components may be combined.
Corresponding to the above method embodiments, the present disclosure also provides a computer-readable storage medium, where instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the above abnormal video detection method.
Further, it should be noted that: embodiments of the present disclosure also provide a computer program product or computer program that may include computer instructions, which may be stored in a computer-readable storage medium. The processor of the electronic device reads the computer instruction from the computer-readable storage medium, and the processor can execute the computer instruction, so that the electronic device performs the description of the abnormal video detection method in the foregoing embodiment, and therefore, the description will not be repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in the embodiments of the computer program product or the computer program referred to in the present disclosure, refer to the description of the embodiments of the method of the present disclosure.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. An abnormal video detection method, comprising:
obtaining a video frame picture of a target video;
detecting each object in each video frame picture to obtain a plurality of image slices, wherein each image slice comprises one object;
respectively identifying multiple abnormal categories of objects in each image slice to obtain an identification result of each image slice corresponding to each abnormal category;
and determining whether the target video is an abnormal video or not according to the identification result of each image slice corresponding to each abnormal category.
2. The abnormal video detection method according to claim 1, wherein the detecting each object in each video frame picture to obtain a plurality of image slices comprises:
detecting each object in each video frame picture by using a detection model based on the open world obtained by pre-training;
and obtaining the image slice of each detection frame in each video frame picture according to the detection result.
3. The abnormal video detection method according to claim 2, wherein the open world based detection model is obtained by:
obtaining a training picture set, wherein each training picture in the training picture set is marked with an object;
training a pre-established detection initial model based on the open world based on the training pictures in the training picture set and the object labels in each training picture;
and after the training is finished, obtaining the open world-based detection model.
4. The abnormal video detection method according to claim 1, wherein the identifying the plurality of abnormal categories of the object in each image slice respectively to obtain the identification result of each image slice corresponding to each abnormal category comprises:
respectively inputting the current image slice into each classification model in a classification model library aiming at each image slice, wherein the classification model library comprises a plurality of different classification models, and the different classification models are used for identifying different abnormal classes;
and obtaining the identification result of the current image slice corresponding to each abnormal category according to the abnormal confidence coefficient output by each classification model.
5. The abnormal video detection method according to claim 4, wherein the obtaining of the recognition result of the current image slice corresponding to each abnormal category according to the abnormal confidence level output by each classification model comprises:
for each classification model, if the output abnormal confidence coefficient of the current classification model is greater than a preset confidence coefficient threshold value, the obtained identification result is that the current image slice belongs to the abnormal category corresponding to the current classification model.
6. The abnormal video detection method according to claim 4, further comprising:
obtaining a new classification model, wherein the new classification model is used for identifying a new abnormal class;
adding the new classification model to the classification model library to increase identification of the new anomaly class.
7. An abnormal video detection apparatus, comprising:
a video frame picture obtaining module configured to perform obtaining a video frame picture of a target video;
the image slice obtaining module is configured to detect each object in each video frame picture to obtain a plurality of image slices, and each image slice contains one object;
the identification result obtaining module is configured to perform identification of multiple abnormal categories of the object in each image slice respectively to obtain an identification result of each image slice corresponding to each abnormal category;
and the abnormal video determining module is configured to determine whether the target video is an abnormal video according to the identification result of each image slice corresponding to each abnormal category.
8. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the anomalous video detection method as claimed in any one of claims 1 to 6.
9. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the anomalous video detection method as recited in any one of claims 1 to 6.
10. A computer program product comprising computer instructions stored in a computer readable storage medium and adapted to be read and executed by a processor to cause an electronic device having the processor to perform the anomalous video detection method as claimed in any one of claims 1 to 6.
CN202210535186.5A 2022-05-17 2022-05-17 Abnormal video detection method, abnormal video detection device, electronic equipment, abnormal video detection medium and program product Pending CN114882414A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210535186.5A CN114882414A (en) 2022-05-17 2022-05-17 Abnormal video detection method, abnormal video detection device, electronic equipment, abnormal video detection medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210535186.5A CN114882414A (en) 2022-05-17 2022-05-17 Abnormal video detection method, abnormal video detection device, electronic equipment, abnormal video detection medium and program product

Publications (1)

Publication Number Publication Date
CN114882414A true CN114882414A (en) 2022-08-09

Family

ID=82676326

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210535186.5A Pending CN114882414A (en) 2022-05-17 2022-05-17 Abnormal video detection method, abnormal video detection device, electronic equipment, abnormal video detection medium and program product

Country Status (1)

Country Link
CN (1) CN114882414A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117636079A (en) * 2024-01-25 2024-03-01 宁德时代新能源科技股份有限公司 Image classification method and device and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117636079A (en) * 2024-01-25 2024-03-01 宁德时代新能源科技股份有限公司 Image classification method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN109344908B (en) Method and apparatus for generating a model
CN109086873B (en) Training method, recognition method and device of recurrent neural network and processing equipment
CN109308490B (en) Method and apparatus for generating information
CN109145828B (en) Method and apparatus for generating video category detection model
CN111144937A (en) Advertisement material determination method, device, equipment and storage medium
CN112114986B (en) Data anomaly identification method, device, server and storage medium
CN112000822B (en) Method and device for ordering multimedia resources, electronic equipment and storage medium
CN113095563B (en) Review method and device for artificial intelligent model prediction result
CN110928889A (en) Training model updating method, device and computer storage medium
CN113902944A (en) Model training and scene recognition method, device, equipment and medium
CN114882414A (en) Abnormal video detection method, abnormal video detection device, electronic equipment, abnormal video detection medium and program product
CN113468432A (en) Mobile internet-based user behavior big data analysis method and system
CN117351271A (en) Fault monitoring method and system for high-voltage distribution line monitoring equipment and storage medium thereof
CN115905450A (en) Unmanned aerial vehicle monitoring-based water quality abnormity tracing method and system
US20240127406A1 (en) Image quality adjustment method and apparatus, device, and medium
CN111737371B (en) Data flow detection classification method and device capable of dynamically predicting
CN117688955A (en) Method, apparatus, electronic device, and computer-readable medium for humidity temperature adjustment
CN112749327A (en) Content pushing method and device
CN112784691B (en) Target detection model training method, target detection method and device
CN112668365A (en) Material warehousing identification method, device, equipment and storage medium
CN114330542A (en) Sample mining method and device based on target detection and storage medium
CN115272682A (en) Target object detection method, target detection model training method and electronic equipment
CN111428886B (en) Method and device for adaptively updating deep learning model of fault diagnosis
CN112016513A (en) Video semantic segmentation method, model training method, related device and electronic equipment
CN111309706A (en) Model training method and device, readable storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination