CN114387557A

CN114387557A - Deep learning-based method and system for detecting smoking and calling of gas station

Info

Publication number: CN114387557A
Application number: CN202210070796.2A
Authority: CN
Inventors: 李平
Original assignee: Yi Tai Fei Liu Information Technology LLC
Current assignee: Yi Tai Fei Liu Information Technology LLC
Priority date: 2022-01-21
Filing date: 2022-01-21
Publication date: 2022-04-22

Abstract

The invention provides a method and a system for detecting smoking and calling of a gas station based on deep learning, which comprises the steps of firstly obtaining a monitoring video stream of the gas station, framing the monitoring video stream and obtaining a plurality of video frame images; dividing an interested target area, inputting a video frame image into a deep learning target detection model to detect a human head area in the interested target area, and cutting out a corresponding human head image from the video frame image according to the human head area; and sending the cut human head image into an image classification model for classification, and identifying whether the human head image has smoking and calling behaviors. The invention combines the human head detection, the behavior classification model and the finite state machine to realize the advantage complementation of the models, thereby being capable of keeping high robustness and high accuracy under the conditions of complex open environment and different shielded illumination. The invention combines the smoking and calling behavior recognition method of human head target detection and image classification, and can quickly and effectively judge the image.

Description

Deep learning-based method and system for detecting smoking and calling of gas station

Technical Field

The invention relates to the technical field of machine learning, in particular to a method and a system for detecting smoking and calling of a gas station based on deep learning.

Background

With the rapid development of the traffic network in China, more and more gas stations emerge. However, in a gas station, a lot of dangerous behaviors and actions such as smoking and calling are generated, so that a lot of manpower and material resources are needed to be invested in order to regulate the behavior of the gas station and ensure personal and property safety, and detection based on manual dangerous behaviors can cause careless mistakes due to human fatigue and other factors. Based on the artificial abnormal behavior monitoring of the gas station, the requirement for guaranteeing the safety of the gas station cannot be met, intelligent analysis is used as a more accurate and effective technology to magnify the heteroscedasticity in more and more fields, and the real-time intelligent analysis of the monitoring video of the gas station by the artificial intelligent algorithm is also one of important application scenes.

The existing detection method for making a call in smoking is mainly based on a human body posture estimation method, which is used for estimating the human body posture of each video frame image in a video stream and analyzing and judging postures of making a call in smoking and the like. According to the method, dangerous behaviors such as smoking, calling and the like of the gas station are monitored manually, but the scheme is not only low in efficiency, but also careless mistakes can be easily caused by manual monitoring. In addition, the detection and calculation of the human body key points of the human body posture estimation are complex, so that the calculation resource consumption of the algorithm is high, and the real-time human body posture target requirement is met. Secondly, when the number of people appearing in the video stream is too large, the human body posture is missed to be detected, and when people are far away from the camera, the detected key point information is rather mixed.

Disclosure of Invention

In view of the above drawbacks of the prior art, an object of the present invention is to provide a method and a system for detecting a smoke and phone call of a gas station based on deep learning, which are used to solve the problems of robustness, tedious process and poor intuitiveness in the existing method for detecting contaminants in a photovoltaic module.

In order to achieve the above objects and other related objects, the present invention provides a method for detecting smoking and phone call of a gas station based on deep learning, comprising the following steps:

acquiring a monitoring video stream of a gas station, framing the monitoring video stream, and acquiring a plurality of video frame images;

dividing an interested target area, inputting the video frame image into a deep learning target detection model to detect a human head area in the interested target area, and cutting out a corresponding human head image from the video frame image according to the human head area;

and sending the cut human head image into an image classification model for classification, and identifying whether the human head image has smoking and calling behaviors.

Optionally, after recognizing that smoking and calling behaviors exist in the human head image, the method further includes:

taking the smoking and calling behaviors as dangerous behaviors, and triggering a Finite State Machine (FSM) to carry out verification analysis based on the dangerous behaviors;

and obtaining a verification analysis result, and judging whether the monitoring video stream of the gas station has smoking and calling behaviors or not by combining the classification result of the image classification model.

Optionally, the process of taking the smoking and calling behaviors as dangerous behaviors and triggering a finite state machine FSM to perform verification analysis based on the dangerous behaviors includes:

if the image classification model identifies that smoking and calling behaviors exist in the human head image, the smoking and calling behaviors are used as dangerous behaviors, all state machine FSMs are triggered based on the dangerous behaviors, and whether state machines similar to the human head image exist in all current state machines or not is judged;

if yes, adding 1 to the counting times of the existing state machine;

if not, a new state machine is created and the initial count is 1;

after all the head pictures on the current picture are judged, deleting the head state machine information which does not exist on the current picture;

judging whether the counting times of the current under-head state machine reach a preset threshold value T or not; if the counting times reach a threshold value T, the judgment is true, and whether smoking and calling behaviors exist in the monitoring video stream of the gas station or not is judged by combining the classification result of the image classification model; and if the statistical times do not reach the threshold value, judging the result to be false.

Optionally, the process of inputting the video frame image into a deep learning target detection model to detect a human head region in the target region of interest, and cutting out a corresponding human head image from the video frame image according to the human head region includes:

acquiring a model for detecting a human head in a YOLOv3 model in Darknet as the deep learning target detection model, and performing human head detection on the acquired video frame image;

and amplifying the detected human head frame, and cutting out the corresponding human head frame from a preset interested target area to be used as a human head image.

Optionally, before sending the cut-out human head image into an image classification model for classification, the method further includes:

carrying out sample classification on the human head image, and dividing the human head image into a normal sample, a smoking sample and a calling sample;

taking the classified samples as training data of an image classification model;

inputting the training data into a ResNet18 network for training to obtain a corresponding image classification model; during training, the triple loss function is used as a regularization item, the samples of the same class are used as positive samples, and the samples of different classes are used as negative samples.

Optionally, the method further comprises alerting when smoking and calling activities are present.

The invention also provides a deep learning-based gas station smoking and phone call detection system, which comprises:

the system comprises a video image acquisition module, a data processing module and a data processing module, wherein the video image acquisition module is used for acquiring a monitoring video stream of a gas station, framing the monitoring video stream and acquiring a plurality of video frame images;

setting an interested area module for dividing an interested target area;

the human head target detection module is used for inputting the video frame image into a deep learning target detection model to detect a human head area in the interested target area and cutting out a corresponding human head image from the video frame image according to the human head area;

the image classification module is used for sending the cut human head images into an image classification model for classification and identifying whether smoking and calling behaviors exist in the human head images;

the finite state machine module is used for triggering a finite state machine FSM to carry out verification analysis according to smoking and calling behaviors;

and the event alarm module is used for acquiring a verification analysis result and early warning by combining a classification result of the image classification model.

Optionally, the process that the finite-state machine module triggers the finite-state machine FSM to perform verification analysis according to smoking and calling behaviors includes:

if yes, adding 1 to the counting times of the existing state machine;

if not, a new state machine is created and the initial count is 1;

Optionally, the process of inputting the video frame image into a deep learning target detection model by the human head target detection module to detect a human head region in the target region of interest, and cutting out a corresponding human head image from the video frame image according to the human head region includes:

Optionally, before the image classification module sends the cut-out human head image to an image classification model for classification, the method further includes:

As described above, the present invention provides a method and a system for detecting a smoking and calling in a gas station based on deep learning, which have the following advantages: firstly, acquiring a monitoring video stream of a gas station, framing the monitoring video stream, and acquiring a plurality of video frame images; dividing an interested target area, inputting the video frame image into a deep learning target detection model to detect a human head area in the interested target area, and cutting out a corresponding human head image from the video frame image according to the human head area; and sending the cut human head image into an image classification model for classification, and identifying whether the human head image has smoking and calling behaviors. The invention combines the human head detection, the behavior classification model and the finite state machine to realize the advantage complementation of the models, thereby being capable of keeping high robustness and high accuracy under the conditions of complex open environment and different shielded illumination. The invention combines the smoking and calling behavior recognition method of human head target detection and image classification, and can quickly and effectively judge the image. The invention improves the accuracy and the precision of behavior recognition based on the judgment module of the finite-state machine.

Drawings

FIG. 1 is a schematic flow chart of a method for detecting a smoke and phone call at a gas station based on deep learning according to an embodiment;

FIG. 2 is a schematic flow chart illustrating a method for detecting a smoke and phone call at a gas station based on deep learning according to another embodiment;

fig. 3 is a schematic hardware configuration diagram of a deep learning-based gas station smoking and phone call detection system according to an embodiment.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

Referring to fig. 1, the present invention provides a method for detecting a smoking and phone call in a gas station based on deep learning, comprising the following steps:

s100, acquiring a monitoring video stream of a gas station, framing the monitoring video stream, and acquiring a plurality of video frame images. As an example, the monitoring video stream of the gas station is acquired by an analog camera and a digital camera.

S200, dividing an interested target area, inputting the video frame image into a deep learning target detection model to detect a human head area in the interested target area, and cutting out a corresponding human head image from the video frame image according to the human head area. As an example, a model for detecting a human head in a YOLOv3 model in Darknet is obtained as the deep learning target detection model, and human head detection is performed on a collected video frame image; and amplifying the detected human head frame, and cutting out the corresponding human head frame from a preset interested target area to be used as a human head image.

S300, sending the cut human head images into an image classification model for classification, and identifying whether smoking and calling behaviors exist in the human head images. When smoking and calling behaviors exist, the embodiment can also give an alarm.

According to the above description, in an exemplary embodiment, after recognizing that smoking and calling actions exist in the human head image, the method further includes: taking the smoking and calling behaviors as dangerous behaviors, and triggering a Finite State Machine (FSM) to carry out verification analysis based on the dangerous behaviors; and obtaining a verification analysis result, and judging whether the monitoring video stream of the gas station has smoking and calling behaviors or not by combining the classification result of the image classification model. Specifically, if the image classification model identifies that smoking and calling behaviors exist in the human head image, the smoking and calling behaviors are used as dangerous behaviors, all state machines FSMs are triggered based on the dangerous behaviors, and whether state machines similar to the human head image exist in all current state machines or not is judged; if yes, adding 1 to the counting times of the existing state machine; if not, a new state machine is created and the initial count is 1; after all the head pictures on the current picture are judged, deleting the head state machine information which does not exist on the current picture; judging whether the counting times of the current under-head state machine reach a preset threshold value T or not; if the counting times reach a threshold value T, the judgment is true, and whether smoking and calling behaviors exist in the monitoring video stream of the gas station or not is judged by combining the classification result of the image classification model; and if the statistical times do not reach the threshold value, judging the result to be false.

According to the above description, in an exemplary embodiment, before sending the cut-out head image to the image classification model for classification, the method further includes: carrying out sample classification on the human head image, and dividing the human head image into a normal sample, a smoking sample and a calling sample; taking the classified samples as training data of an image classification model; inputting the training data into a ResNet18 network for training to obtain a corresponding image classification model; during training, the triple loss function is used as a regularization item, the samples of the same class are used as positive samples, and the samples of different classes are used as negative samples.

In a specific exemplary embodiment, as shown in fig. 2, the embodiment further provides a deep learning-based smoking and phone call detection method for a gas station, which includes the following steps:

step 201, detecting whether a human head exists in a target area under a gas station camera by using a deep learning target detection method, if the human head is detected, respectively inputting each detected human head image into step 202, otherwise, ending.

Step 202, receiving the judgment result of step 201, judging whether the detected human head image belongs to smoking and making a call dangerous behavior, if so, sending the judgment information to the step 203 of the state machine, otherwise, ending.

And step 203, judging whether the state machine similar to the human head image exists in all the current state machines, if so, adding one to the counting times of the existing state machines, and if not, creating a new state machine and initializing the counting number to be 1. And if all the human head pictures on the current picture are judged, deleting the human head state machine information which does not exist on the current picture.

And step 204, judging whether the counting frequency of the state machine under the current head reaches a preset threshold value T, if the counting frequency reaches the threshold value T, judging that the counting frequency is true, outputting judgment information to the step 205, and if not, finishing.

And step 205, receiving the judgment result of the step 204, and giving out a dangerous behavior judgment result after comprehensive analysis.

In an exemplary embodiment, the embodiment further provides a method for detecting a smoking and calling behavior of a gas station based on deep learning, which includes the specific steps of:

(1) and acquiring images of the monitoring video stream of the gas station.

(2) And dividing an interested target region, and sending the image into a deep learning target detection model to detect the head of the person in the interested target region. And the image of the human head is cut.

(3) And sending the cut head image into an image classification module for classification, and identifying smoking, calling and normal behaviors.

(4) And (4) if the classification model judges the dangerous behaviors in the step (3), triggering the FSM (finite state machine) to perform further verification analysis.

(5) And (5) comprehensively analyzing the results of the steps (3) and (4), and judging whether smoking and calling exist and giving an alarm under the monitoring video of the gas station.

In summary, the present invention provides a method for detecting a smoke and phone call in a gas station based on deep learning, the method comprises the steps of firstly obtaining a surveillance video stream of the gas station, framing the surveillance video stream, and obtaining a plurality of video frame images; dividing an interested target area, inputting the video frame image into a deep learning target detection model to detect a human head area in the interested target area, and cutting out a corresponding human head image from the video frame image according to the human head area; and sending the cut human head image into an image classification model for classification, and identifying whether the human head image has smoking and calling behaviors. The method combines human head detection, a behavior classification model and a finite state machine to realize the advantage complementation of the models, and further can keep high robustness and high accuracy under the conditions of complex open environment and different shielded illumination. The method combines the smoking and calling behavior recognition method of human head target detection and image classification, and can quickly and effectively judge the image. The method is based on a judgment module of a finite state machine, and improves the accuracy and the precision of behavior recognition.

As shown in fig. 3, the present invention further provides a deep learning-based smoking and phone call detection system for a gas station, comprising:

the system comprises a video image acquisition module 101, a monitoring video stream acquisition module and a video frame display module, wherein the video image acquisition module 101 is used for acquiring a monitoring video stream of a gas station, framing the monitoring video stream and acquiring a plurality of video frame images;

a region-of-interest module 102 is configured to divide a target region of interest;

and the human head target detection module 103 is configured to input the video frame image into a deep learning target detection model to detect a human head region in the target region of interest, and cut out a corresponding human head image from the video frame image according to the human head region. As an example, the process of inputting the video frame image into a deep learning target detection model by the human head target detection module to detect a human head region in the target region of interest, and cutting out a corresponding human head image from the video frame image according to the human head region includes: acquiring a model for detecting a human head in a YOLOv3 model in Darknet as the deep learning target detection model, and performing human head detection on the acquired video frame image; and amplifying the detected human head frame, and cutting out the corresponding human head frame from a preset interested target area to be used as a human head image.

And the image classification module 104 is used for sending the cut-out human head image into an image classification model for classification, and identifying whether the human head image has smoking and calling behaviors. As an example, before the image classification module sends the cut-out human head image to the image classification model for classification, the method further includes: carrying out sample classification on the human head image, and dividing the human head image into a normal sample, a smoking sample and a calling sample; taking the classified samples as training data of an image classification model; inputting the training data into a ResNet18 network for training to obtain a corresponding image classification model; during training, the triple loss function is used as a regularization item, the samples of the same class are used as positive samples, and the samples of different classes are used as negative samples.

And the finite-state machine module 105 is used for triggering the finite-state machine FSM to carry out verification analysis according to smoking and calling behaviors. As an example, the process of the finite state machine module triggering the finite state machine FSM for verification analysis according to smoking and calling behaviors includes: if the image classification model identifies that smoking and calling behaviors exist in the human head image, the smoking and calling behaviors are used as dangerous behaviors, all state machine FSMs are triggered based on the dangerous behaviors, and whether state machines similar to the human head image exist in all current state machines or not is judged; if yes, adding 1 to the counting times of the existing state machine; if not, a new state machine is created and the initial count is 1; after all the head pictures on the current picture are judged, deleting the head state machine information which does not exist on the current picture; judging whether the counting times of the current under-head state machine reach a preset threshold value T or not; if the counting times reach a threshold value T, the judgment is true, and whether smoking and calling behaviors exist in the monitoring video stream of the gas station or not is judged by combining the classification result of the image classification model; and if the statistical times do not reach the threshold value, judging the result to be false.

And the event alarm module 106 is used for acquiring a verification analysis result and early warning by combining a classification result of the image classification model.

As shown in fig. 3, the present invention further provides a deep learning-based smoking and phone call detection system for a gas station, comprising: the system comprises a video image acquisition module 101, a region of interest setting module 102, a target detection module 103, an image classification module 104, a finite state machine module 105 and a result alarm module 106.

The video image acquisition module 101 is used for acquiring a gas station real-time scene image from a monitoring camera (including an analog camera, a digital camera, and the like).

The region of interest 102 is set, and in order to reduce the complexity of calculation and consider the head quality problem of the detection region, the invention firstly needs to divide an obvious detection region in the image.

The target detection module 103 detects a human head model by using a YOLOv3 model in Darknet to perform human head detection on the video image acquired by the video image acquisition module 101, performs appropriate amplification according to the detected human head frame, and finally cuts out the human head of the region of interest set by the region of interest 102 according to the target detection frame.

The image classification module 104 uses ResNet18 as a model for image classification, and divides the head image into normal samples, smoking samples and calling samples as training data of the classification model. Because the division of behavior actions such as smoking, calling and the like in the face area of the human head range is small, the design scheme adopts the triple loss function as a regularization item during training, samples of the same class are used as positive samples, samples of different classes are used as negative samples for training, and the improvement of the recognition degree of the model on dangerous actions is facilitated.

The finite state machine module 105, the output result of the image classification at the image classification module 104, serves as a trigger module of the finite state machine. When the image classification module 104 outputs the dangerous action information, the state machine starts and stores the head image, and then records the continuous frame number of the dangerous action of different heads. And if the dangerous action lasts for a period of time, judging that the dangerous action occurs in the monitoring video.

And a result alarm module 106, which receives the signal and outputs alarm information after the finite state machine module 105 outputs the dangerous action signal.

In summary, the present invention provides a deep learning-based system for detecting a smoke and a phone call in a gas station, which first obtains a surveillance video stream of the gas station, and frames the surveillance video stream to obtain a plurality of video frame images; dividing an interested target area, inputting the video frame image into a deep learning target detection model to detect a human head area in the interested target area, and cutting out a corresponding human head image from the video frame image according to the human head area; and sending the cut human head image into an image classification model for classification, and identifying whether the human head image has smoking and calling behaviors. The system combines human head detection, a behavior classification model and a finite state machine to realize the advantage complementation of the models, and further can keep high robustness and high accuracy under the conditions of complex open environment and different shading illumination. The system combines the smoking and calling behavior recognition method of human head target detection and image classification, and can quickly and effectively judge the image. The system is based on a judgment module of a finite state machine, and improves the accuracy and the precision of behavior recognition.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A gas station smoking and calling detection method based on deep learning is characterized by comprising the following steps:

2. The deep learning based gas station smoking and phone call detection method according to claim 1, further comprising, after recognizing that there is smoking and phone call behavior in the human head image:

3. The deep learning based gas station smoking and phone call detection method according to claim 2, wherein the process of taking the smoking and phone call behavior as dangerous behavior and triggering a Finite State Machine (FSM) to perform verification analysis based on the dangerous behavior comprises:

if yes, adding 1 to the counting times of the existing state machine;

if not, a new state machine is created and the initial count is 1;

4. The deep learning based gas station smoking and phone call detection method according to claim 1, wherein the process of inputting the video frame images into a deep learning target detection model to detect a human head region in the target region of interest, and cutting out a corresponding human head image from the video frame images according to the human head region comprises:

5. The deep learning based gas station smoking and phone call detection method according to claim 4, wherein before sending the cut-out human head image into the image classification model for classification, the method further comprises:

6. The deep learning based gasoline station smoking call detection method according to any one of claims 1 to 5, characterized in that the method further comprises alerting when there is smoking and call activity.

7. The utility model provides a filling station smoking detection system of making a call based on deep learning which characterized in that, including:

setting an interested area module for dividing an interested target area;

8. The deep learning based gas station smoking and phone call detection system according to claim 7, wherein the finite state machine module triggers the finite state machine FSM to perform verification analysis according to smoking and phone call behavior, comprising:

if yes, adding 1 to the counting times of the existing state machine;

if not, a new state machine is created and the initial count is 1;

9. The deep learning based gas station smoking and phone call detection system according to claim 7, wherein the process of the human head target detection module inputting the video frame images into a deep learning target detection model to detect human head regions in the target regions of interest, and cutting out corresponding human head images from the video frame images according to the human head regions comprises:

10. The deep learning based gas station smoking and phone call detection system according to claim 9, wherein the image classification module sends the cut-out human head image to the image classification model for classification, and further comprises: