CN112766179A

CN112766179A - Fire smoke detection method based on motion characteristic hybrid depth network

Info

Publication number: CN112766179A
Application number: CN202110087146.4A
Authority: CN
Inventors: 郑远攀; 李广阳; 刘芳华; 张亚丽; 马贺; 吴庆岗; 王泽宇; 张秋闻; 朱付保; 甘勇; 陈燕; 钟大成; 刘新新; 姚浩伟; 王振宇; 徐博阳
Original assignee: Zhengzhou University of Light Industry
Current assignee: Zhengzhou University of Light Industry
Priority date: 2021-01-22
Filing date: 2021-01-22
Publication date: 2021-05-07

Abstract

The invention provides a fire smoke detection method based on a motion characteristic mixed depth network, which is used for solving the technical problem of low detection precision of video smoke in a complex scene. The method comprises the following steps: firstly, acquiring a data set from a video image library, and dividing the data set into a training set and a testing set; secondly, constructing a motion characteristic mixed depth network, inputting a training set into the motion characteristic mixed depth network for training to obtain a motion characteristic mixed depth network model, and testing the motion characteristic mixed depth network model by using a test set; then, acquiring a video sequence to be detected, and processing a detection video by using a motion region detection algorithm to obtain a video motion image; and finally, inputting the motion image into the motion characteristic mixed depth network model, outputting a detection result, and finishing video smoke detection. The invention can realize the continuous transmission of the smoke characteristics on the whole video stream, improve the timeliness of smoke detection and reduce the false alarm rate of smoke early warning.

Description

Fire smoke detection method based on motion characteristic hybrid depth network

Technical Field

The invention relates to the technical field of fire early warning, in particular to a fire smoke detection method based on a motion characteristic hybrid depth network.

Background

Smoke is an expression characteristic of the initial stage of fire, and rapid and efficient detection and identification of fire smoke is one of important ways for fire early warning. The traditional smoke detection method mainly adopts a smoke detector, realizes fire early warning to a certain extent, but the smoke detector has the advantages of small detection range, high detection precision and the like, is easily influenced by external factors such as smoke temperature, concentration, airflow and the like, and cannot early warn the fire in time. With the development of video image technology and computer vision technology, video image-based smoke detection algorithms are widely researched.

In the document [ Peng Y, Wang Y, real-time for smoke detection using and designing both the suspected smoke areas and deep learning [ J ]. Computers and Electronics in the Agriculture,2019,167:105029 ], a suspected smoke area is extracted by using a manual design algorithm and is input into an improved deep neural network Squeezenet model to realize smoke detection. This method can effectively distinguish between smoke and other challenging targets, but the suspected smoke area is acquired by manual design, ignoring some of the characteristics of smoke to some extent. The document Lin G, Zhang Y, Xu G, et al, Smok Detection on Video Sequences Using 3D volumetric Neural Networks [ J ]. Fire Technology,2019,55(5): 1827) uses an improved fast RCNN method with non-maximal swallowing for locating smoke target locations on the basis of static spatial information, and a joint Detection framework based on fast RCNN and 3D-CNNs has been developed. The framework significantly improves the smoke detection task. The network should not be too deep in view of the structural design, since the data set is small to prevent overfitting. In order to capture spatial information and motion information between smoke frames, a video smoke detection method based on a space-time convolutional neural network is provided in the document [ Hu Y, Lu X. The effectiveness of the method is verified by experimental results, and due to the defect of network structure design, the motion information between video frames cannot be completely captured. The method is characterized in that a cascaded convolutional neural network smoke identification framework is provided, wherein the method effectively reduces the false detection rate of a non-smoke area. However, the accuracy of smoke region detection still needs to be improved. A video smoke detection method based on a depth significance network is proposed in the documents [ Xu G, Zhang Y, Zhang Q, et al.video detection based on depth significance network [ J ]. Fire Safety Journal,2019,105: 277-. However, the highlighting of the object region still has a deficiency. A Deep convolution Integrated Long Short-Term Memory Network (DC-ILSTM) is proposed in a document [ Weixin, Wushuhong, Wangzuang power ] forest fire smoke detection model [ J ] computer application, 2019,39(10): 2883-. In the smoke detection experiment, compared with a Deep convolution Long-Current Networks (DCLRN), the optimal efficiency of the DC-ILSTM is superior to that of the 10 th frame to detect smoke, and the test accuracy is improved by 1.23%. Experimental results show that the DC-ILSTM has good applicability in forest fire smoke detection. However, the complexity of the algorithm affects the real-time nature of smoke detection.

Disclosure of Invention

Aiming at the defects in the background technology, the invention provides a fire smoke detection method based on a motion characteristic mixed deep network, and solves the technical problems of low detection precision, high false alarm rate and high omission factor of video smoke in a complex scene.

The technical scheme of the invention is realized as follows:

a fire smoke detection method based on a motion characteristic mixed depth network comprises the following steps:

the method comprises the following steps: acquiring a data set from a video image library, and dividing the data set into a training set and a testing set, wherein video images in the data set comprise smoke video images and non-smoke video images;

step two: constructing a motion feature mixed depth network, wherein the motion feature mixed depth network comprises a visual feature extraction layer, a motion feature extraction layer and a time context information learning layer, and the visual feature extraction layer and the motion feature extraction layer are connected with the time context information learning layer;

step three: inputting the training set into a motion characteristic mixed depth network for training to obtain a motion characteristic mixed depth network model, and testing the motion characteristic mixed depth network model by using the test set;

step four: acquiring a video sequence to be detected, and processing the video sequence to be detected by utilizing a motion region detection algorithm to obtain a video motion image;

step five: and inputting the video motion image into the motion characteristic mixed depth network model obtained in the third step, and outputting a detection result to finish video smoke detection.

The network structure of the visual feature extraction layer sequentially comprises an input layer I, a convolution layer I, a pooling layer I, a convolution layer II, a pooling layer II, a convolution layer III, a pooling layer III and a full-connection layer I; the network structure of the motion characteristic extraction layer sequentially comprises an input layer II, a convolution layer IV, a convolution layer V, a convolution layer VI, a convolution layer VII, a convolution layer VIII, a convolution layer IX and a full connection layer II.

The method for inputting the training set into the motion characteristic mixed depth network for training comprises the following steps:

setting parameters of a visual characteristic extraction layer and a motion characteristic extraction layer, wherein the parameters comprise weight attenuation of 0.01 and an initial value of a learning rate of 0.01;

respectively training the visual characteristic extraction layer and the motion characteristic extraction layer by adopting a BP algorithm until the training end condition is met, and respectively obtaining the visual characteristic and the motion characteristic;

and fusing the visual features and the motion features, inputting the fused visual features and the motion features into the time context information learning layer for training, and finishing the training when the value of the loss function of the time context information learning layer is unchanged to obtain the motion feature mixed depth network model.

The training end condition is that the learning rate is attenuated to le-9 or the value of a loss function corresponding to the BP algorithm is unchanged; the loss function is a cross entropy loss function.

The method for fusing the visual features and the motion features comprises the following steps:

wherein the content of the first and second substances,

in order to be a visual characteristic diagram,

i is the abscissa of the feature map x, j is the ordinate of the feature map x,

the maximum feature map after fusion is shown.

The temporal context information learning layer is:

g^(t)＝k₁y^(t)+k₂q^(t-1)

q^(t)＝relu(g^(t))

wherein, y^(t)Is the maximum feature map after fusion at time t, q^(t-1)Information of the previous time step at time t, q^(t)Information of time step at time t, k₁And k₂Are all coefficients, g^(t)Represents the output of the temporal context information learning layer at time t, relu (a) represents the nonlinear activation function.

The method for processing the detected video sequence by utilizing the motion region detection algorithm comprises the following steps:

judging the movement direction of the smoke according to the movement characteristics of the smoke to obtain the main movement direction of the smoke, and calculating the cumulant S (i, j) of all pixel points (i, j) in the detection video sequence in the main movement direction:

wherein, theta_l(i, j) represent digital codes for detecting image frames of main direction l in a video sequence, each digital code representing a direction of motion, H_t(θ_l(i, j)) represents a histogram of an image frame having a principal direction l within a time window, l being 2,3,4, Z_tRepresenting the number of image frames, t representing the time instant;

setting the threshold value T (i, j) to be 5, and calculating the smoke motion image at the time T:

wherein f (i, j) is a smoke moving image, background is a video background, and forkround is a video foreground.

Compared with the prior art, the invention has the following beneficial effects:

1) the motion region detection algorithm based on the main motion direction and the ViBe effectively removes interference factors in video smoke frames, retains more smoke characteristic information and improves the accuracy of smoke detection.

2) The motion characteristic extraction layer acquires motion characteristic information between continuous video smoke frames, and the motion characteristic extraction layer is combined with the visual characteristic extraction layer, so that the continuous transmission of smoke characteristics is realized on the whole video stream, the timeliness of smoke detection is improved, and the false alarm rate of smoke early warning is reduced.

3) Simulation experiments verify that the hybrid depth network model designed by the invention has the advantages of high accuracy, high reliability, low omission factor, low false alarm rate, low delay and the like in the aspect of video smoke detection, and has very high application and popularization values.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of the present invention.

Figure 2 is an example smoke video set.

Fig. 3 is an example of a non-smoke video set.

Fig. 4 is a visual feature extraction diagram for a single frame video image according to the present invention.

Fig. 5 is a diagram of motion feature extraction for adjacent two-frame video images according to the present invention.

FIG. 6 is a schematic view of the loss function of the visual feature extraction layer according to the present invention.

FIG. 7 is a schematic diagram of the loss function of the motion feature extraction layer according to the present invention.

FIG. 8 is a diagram illustrating a loss function of the temporal context information learning layer according to the present invention.

Fig. 9 is a schematic view of the smoke movement direction.

Fig. 10 is a diagram of the smoke motion detection effect obtained by the motion region detection algorithm of the present invention.

Fig. 11 shows the detection results of the present invention for 8 smoke videos and 4 non-smoke videos.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

According to the anti-interference performance, the real-time performance and the accuracy of actual smoke, the invention provides a fire smoke detection method based on a motion characteristic mixed depth network, which mainly comprises the following research contents: 1) a motion region detection algorithm based on a main motion direction and a ViBe algorithm is provided to obtain a suspicious smoke motion region and reduce interference of a non-smoke region in a video image. 2) A deep neural network model for extracting motion characteristics is designed, and the motion characteristics among continuous video frames are obtained. 3) A time context information learning layer is designed, and smoke detection is completed on time flow.

The video smoke detection process proposed by the present invention is shown in fig. 1. Firstly, a motion area detection algorithm is used for acquiring a video motion area; then, respectively acquiring the visual characteristics and the motion characteristics of the smoke image by the visual characteristic extraction layer and the motion characteristic extraction layer, and performing characteristic fusion; inputting the fusion features into a time context information learning layer, accumulating the fusion features on a time stream, acquiring context information of continuous video frames, and finishing smoke detection; and finally, carrying out experimental verification and analysis on the proposed smoke detection network model. The method comprises the following specific steps:

the method comprises the following steps: acquiring a data set from a video image library, and dividing the data set into a training set and a testing set, wherein video images in the data set comprise smoke video images and non-smoke video images; video image libraries were derived from the computer vision and pattern recognition laboratory at the university of Korea, the signal processing team at the university of Turkey, the computer vision laboratory at Reynolds' university of Nevada, publicly published datasets from professor Yuanhui, and video segments of fire smoke collected from other research laboratories. And randomly selecting 70% as a training set and 30% as a testing set. Training collection smoke images are intercepted from the smoke video, non-smoke images are intercepted from the non-smoke video, and the non-smoke images are not repeated with the test video. The data set comprises 3 ten thousand RGB images of 24 × 32 of positive and negative samples, the negative sample image comprises interference images seen in daily scenes, the data label of the smoke image in the training set and the test set is marked as 1, and the data label of the non-smoke image is marked as 0. A partial smoke video is shown in fig. 2 and a non-smoke video is shown in fig. 3. Because the image data sets used by the invention have different sources and different sizes, formats and resolutions of the images, all the images are subjected to uniform cutting, compression and normalization processing before training to finally obtain the RGB images with the sizes of 24 multiplied by 32.

the visual feature extraction layer is used for extracting visual features of a single-frame video image: the network structure of the device comprises an input layer I, a convolution layer I, a pooling layer I, a convolution layer II, a pooling layer II, a convolution layer III, a pooling layer III and a full-connection layer I in sequence; and a non-linear activation function is added after each convolution layer, the original video RGB frame is the input of the visual feature extraction layer, and the network structure is shown in fig. 4.

As can be seen from fig. 4, the input of the visual feature extraction layer is a picture with a frame size of 24 × 32 × 3. The 1 st, 3 rd and 5 th layers are convolution layers, the 2 nd, 4 th and 6 th layers are pooling layers, and the 7 th layer is a full-connection layer. The convolution layer I, the convolution layer II and the convolution layer III all adopt 5 x 5 convolution templates, the step length is 1, the image depth is one time of that of the previous layer after one convolution operation, each convolution layer adopts a relu activation function, and the last convolution layer outputs a characteristic diagram of 6 x 8 x 128. And the pooling layers I, II and III adopt maximum pooling, 2 x 2 pooling templates are used, the step length is set to be 2, the image size is 1/2 of the previous layer through one pooling operation, and the last layer of pooling layer outputs a 3 x 4 x 128 characteristic diagram. The fully connected layer I is connected after the third pooling layer, outputting a 1 × 1 × 1024 feature vector.

The visual feature extraction layer mainly extracts the physical features of the video image, such as color, texture, energy and the like. The characteristics reflect the change rules of different smoke images to a certain extent, such as the color characteristics of smoke, objects made of different materials, and the characteristics of smoke generated by combustion are different, and in addition, clouds, fog and the like in the sky are different from the physical characteristics of smoke generated by fire. Therefore, in order to enable the network model to have a better detection effect on smoke generated by a fire, it is necessary to establish a visual feature extraction layer and acquire a higher-level physical feature of video smoke.

The motion feature extraction layer is used for extracting motion features facing adjacent double-frame video images: the motion feature extraction layer processes adjacent double-frame video smoke images to capture and predict motion information of continuous frames. The network structure of the motion characteristic extraction layer comprises an input layer II, a convolution layer IV, a convolution layer V, a convolution layer VI, a convolution layer VII, a convolution layer VIII, a convolution layer IX and a full connection layer II in sequence; the motion feature extraction layer comprises 6 convolutional layers, each convolutional layer is connected with a nonlinear activation function, and the last convolutional layer is connected with a full connection layer II. The network structure of the motion feature extraction layer is shown in fig. 5.

As can be seen from fig. 5, the input of the motion feature extraction layer is an adjacent two-frame image, and the size is 24 × 32 × 6. And the convolutional layer IV and the convolutional layer V adopt 5 × 5 convolutional templates, the convolutional layer VI, the convolutional layer VII, the convolutional layer VIII and the convolutional layer IX adopt 3 × 3 convolutional templates, the step size in each convolutional layer is set to be 2, the image size is reduced to 1/2 of the image of the previous layer after each step, and after three times of convolution operation, the last layer of convolutional layer outputs a feature map with the size of 3 × 4 × 128. And finally, connecting a full connection layer II and outputting a feature vector of 1 multiplied by 1024.

The motion characteristic extraction layer well expresses motion information between two adjacent frames of the video image, the motion image of the previous frame determines the motion state of the image of the next frame to a certain extent, and the motion characteristic extraction layer solves the problem of irrelevant states among the frames in the traditional video smoke detection algorithm, so that more video smoke characteristic information is obtained.

The traditional neural network model has the leading advantages in the field of static image detection, but cannot detect an input sequence with characteristic information. In order to solve the problem, a visual feature extraction structure and a motion feature extraction structure are added on the basis of a traditional Neural Network model, an obtained output value is used as an input value of a next Network node and is further transmitted to a next layer of the Network, the output value contains context information of the current Network, and the Network is called a Recurrent Neural Network (RNN). The RNN network is mainly applied to the fields of natural language processing, speech recognition, etc., and achieves good effects, but the RNN has relatively few research applications in video sequences. The research of the invention aims at video images, and the video images have similarities with natural language and voice and have a certain sequence in a certain time period. According to the characteristic, the RNN is used for building a time context information learning layer in the invention. The RNN is used for video smoke detection which is a new field of video image detection.

Step three: inputting the training set into the motion characteristic mixed depth network for training to obtain a motion characteristic mixed depth network model, and testing the motion characteristic mixed depth network model by using the test set.

The method for inputting the training set into the motion characteristic mixed depth network for training comprises the following steps: the input image is a 24 × 32 RGB image, Adam is selected as an optimization method, and the convergence rate is faster than that of a standard random gradient descent method. In the training process of the model, a BP algorithm is adopted for training, the weight attenuation of a visual feature extraction layer and a motion feature extraction layer is 0.01, the learning rate is attenuated to le-9 from the initial 0.01, or when the value of a loss function tends to be stable and does not decrease any more, the training is stopped. Then, fusing the characteristics, inputting the fused characteristics into a time context information learning layer consisting of an RNN (radio network), training the time context information learning layer, and stopping training when the loss function value of the time context information learning layer tends to be stable; and finishing the training of the whole network model. In the model training process, the loss function is a cross entropy loss function, and the size of the batchsize is 128. The loss functions of the visual feature extraction layer, the motion feature extraction layer and the temporal context information learning layer are shown in fig. 6-8.

The specific operation method comprises the following steps:

respectively training the visual characteristic extraction layer and the motion characteristic extraction layer by adopting a BP algorithm until the training end condition is met, and respectively obtaining the visual characteristic and the motion characteristic; the training end condition is that the learning rate is attenuated to le-9 or the value of a loss function corresponding to the BP algorithm is unchanged; the loss function is a cross entropy loss function.

wherein the content of the first and second substances,

in order to be a visual characteristic diagram,

i is the abscissa of the feature map x, j is the ordinate of the feature map x,

the maximum feature map after fusion is shown.

The fused features include visual features and motion features that vary over time. The length of the sequence is arbitrary, and the visual and motion characteristics of each frame of image are also not fixed. Therefore, the invention utilizes the temporal context information learning layer constructed by RNN to process any long time sequence, and solves the problem of accumulation learning of the fusion characteristic context information on the time flow. The RNN has a feedback connection that can hold information for a period of time, including information for the current input and all previous time steps. The temporal context information learning layer is:

g^(t)＝k₁y^(t)+k₂q^(t-1)

q^(t)＝relu(g^(t))

wherein the content of the first and second substances,y^(t)is the fused maximum feature map at time t, q^(t-1)Information of the previous time step at time t, q^tInformation of time step at time t, k₁And k₂Are all coefficients, g^(t)Represents the output of the temporal context information learning layer at time t, relu (a) represents the nonlinear activation function.

And finally, classifying and identifying each frame of image by the time context information learning layer, labeling the smoke image if the network model detects that the real smoke image exists in the video frame, otherwise, detecting the next frame of image on the time stream until the whole video segment is detected.

Step four: and acquiring a video sequence to be detected, and processing the video sequence to be detected by utilizing a motion region detection algorithm to obtain a video motion image.

In a real smoke scene, a static area in a video can increase the calculation amount of an algorithm, the smoke detection rate is influenced, and a non-smoke motion area can cause interference to smoke detection. Therefore, in order to reduce the calculation amount of the algorithm and improve the accuracy of smoke detection, the invention uses the moving object detection algorithm to preprocess the video. When the target detection algorithm detects smoke, the non-smoke motion area is also detected, and the non-smoke motion area is taken as a smoke image and input into the neural network model. Therefore, before the smoke region is detected by using the neural network model, eliminating the interference of static regions and non-smoke motion regions in the smoke video is one of effective ways for improving the smoke detection efficiency. According to the method, a suspected smoke motion area in the video is obtained by adopting a motion area detection algorithm based on the combination of the main motion direction and the ViBe according to the motion trend of the smoke.

The ViBe algorithm is a pixel-level background modeling method proposed in 2011 by Olivier Barnich and Marc Van Droogenbroeck. Compared with other moving object extraction algorithms, the method has the advantages of small operand, high detection efficiency and the like, and is more suitable for moving object detection in a fixed scene. When the ViBe algorithm is used for smoke detection, a non-smoke motion area in a video is detected, so that on the basis of the ViBe algorithm, the motion direction of smoke is judged according to the motion characteristics of the smoke, the main motion direction of the smoke is obtained, and the motion area of the smoke is obtained in the main motion direction. In the invention, a time window-based smoke motion direction histogram statistical method is used for accumulating motion data blocks in a sliding window, and the accumulation amount is used for judging the main motion direction of smoke. Calculating the accumulation amount S (i, j) of all pixel points (i, j) in the detection video sequence in the main motion direction:

wherein, theta_l(i, j) represent digital codes for detecting image frames in the video sequence with a main direction/, each digital code representing one direction of motion, for a total of nine directions of motion, as shown in fig. 9. H_t(θ_l(i, j)) represents a histogram of an image frame having a principal direction l within a time window, l being 2,3,4, Z_tRepresenting the number of image frames; the initial smoke generated by fire is approximately upward or obliquely upward movement, and as can be seen from fig. 9, 2,3 and 4 are upward and obliquely upward movement directions, which are characteristic directions of main smoke movement, so far, only the data accumulation amounts in the 2,3 and 4 directions need to be counted in the invention.

And when the time is T, judging whether the accumulation amount of the pixel point (i, j) reaches a set threshold value T (i, j) or not, and setting the pixel value as f (i, j). And judging whether the pixel value is a foreground according to the following formula, wherein the foreground is the detected smoke motion image.

Wherein f (i, j) represents a smoke moving image, background is a video background, and forkround is a video foreground. T (i, j) is a threshold value set in advance. As known a priori, T (i, j) ═ 5, the experimental result of the motion region detection algorithm is shown in fig. 10.

As can be seen from fig. 10, in the video preprocessing stage, the motion region detection algorithm based on the main motion direction and the ViBe effectively reduces the interference of the non-smoke region in the video image, accurately extracts the smoke motion region in the video, and theoretically enhances the accuracy of the neural network model in extracting the video smoke image features.

Step five: and inputting the smoke motion image into the motion characteristic mixed depth network model, outputting a detection result, and finishing video smoke detection. The preprocessed video image is input into the visual characteristic extraction layer and the motion characteristic extraction layer designed by the invention to obtain the visual characteristic and the motion characteristic of the image, which plays an important role in improving the accuracy of smoke detection and shortening the early warning response time, and is verified and analyzed in the experimental stage.

The evaluation index of the motion characteristic mixed depth network model is evaluated through the accuracy rate (ACC), the false alarm rate (FPR) and the omission factor (MDR). The specific evaluation formula is shown in the following formula.

Wherein TP represents the number of video frames in the smoke sample that are identified as smoke; FP represents the number of video frames in the smoke sample that are not identified as smoke. FN represents the number of video frames in the non-smoke sample that are identified as smoke.

In order to verify the superiority of the network model, the network model is compared with a classical convolutional neural network AlexNet and VGG16 in an experiment, and meanwhile, a document [1] - [ Lepeng, Zhang Yan ] is compared, video smoke detection [ J ] based on a Gaussian mixture model and a convolutional neural network is compared, the progress of laser and optoelectronics is made, 2019,56(21): 140-. In order to verify the importance of the video motion region detection algorithm and the motion feature extraction layer to the network model, the network model with the motion region detection algorithm removed and the network model with the motion feature extraction layer removed are respectively verified in experiments. All network models were trained and tested separately on the neural network model in the experiment using the same smoke image dataset. The results of the experiment are shown in table 1.

TABLE 1 comparison of the results of different network models

As can be seen from Table 1, the accuracy and the omission factor of the network model are superior to those of other six network models, the accuracy reaches 98.82%, and is improved by 0.97% compared with the optimal network model; the omission factor is reduced to 16.32%, which is reduced by 2.1% compared with the optimal network model; the false alarm rate is reduced to 3.32%, almost the same as that of the document [2], and is far lower than that of other five network models. In the network model, the motion area detection algorithm is removed, so that the accuracy of the network model is reduced by 2.09%; the false alarm rate is improved by 1.35%; the omission ratio is improved by 13.95 percent. The motion characteristic extraction layer is removed, and the accuracy of the network model is reduced by 1.15%; the false alarm rate is improved by 0.86%; the omission ratio is improved by 3.35 percent. Therefore, the motion region detection algorithm and the motion feature extraction layer provided by the invention improve the smoke detection precision and reduce the false alarm rate. The network model provided by the invention is more suitable for a real scene and has higher application value.

In the experiment of the present invention, a plurality of video segments were tested, and the detection results of 8 smoke videos and 4 non-smoke videos are shown in the present invention, as shown in fig. 11. Comparative analysis was performed under different network models, where videos 1-8 were smoke videos and videos 9-12 were non-smoke videos.

Through experimental tests, the response time of different networks for detecting the first frame of smoke on part of the smoke video test set is shown in table 2.

Table 2 response time of first frame smoke detected by different network models

As can be seen from table 2, in videos 1 to 8, the response time of the network model of the present invention is better than that of the other six network models, and the response time of the network model of the present invention for detecting the smoke in the first frame is the shortest. The response time of the network model for removing the motion region detection algorithm is improved compared with the response time of the network model of the invention. The response time after the motion characteristic extraction layer is removed is also improved compared with the response time of the network model. Experimental results show that the network model has higher instantaneity.

The number of false positive frames for different network models in a portion of the non-smoke video data set is shown in table 3.

TABLE 3 number of false detection frames for different network models

As can be seen from table 3, the number of false detection frames in the non-smoke video of the network model of the present invention is the same as that of VGG16, and is reduced to 0, which is lower than that of the other five networks, but the response time of the network model of the present invention in the smoke video is much lower than that of VGG 16. The network model with the motion region detection algorithm removed and the motion feature extraction layer removed has false detection frame numbers. The experimental result shows that the network model can effectively reduce the number of false detection frames of non-smoke videos, and the overall detection effect of the network model is superior to that of other six networks. In a real scene, the network provided by the invention has stronger reliability.

In order to improve the detection precision of video smoke in a complex scene and reduce the problems of false alarm rate, missed detection rate and the like, a motion characteristic mixed depth network model for video fire smoke detection is provided. Acquiring a motion area image in a video by using a motion area detection algorithm; visual features of the single-frame video image and motion features of the adjacent double-frame video image are obtained through a convolutional neural network, and feature fusion is carried out by adopting a maximum fusion method. And in the time domain, accumulating the fusion information of the continuous video frames by using a time context information learning network constructed by the RNN, further analyzing and processing the fusion characteristics, and completing video smoke detection.

(1) The motion area detection algorithm effectively removes interference factors in the video smoke frame, retains more smoke characteristic information and improves the accuracy of smoke detection.

(2) The motion characteristic extraction layer acquires motion characteristic information between continuous video smoke frames, and the motion characteristic extraction layer is combined with the visual characteristic extraction layer, so that the continuous transmission of smoke characteristics is realized on the whole video stream, the timeliness of smoke detection is improved, and the false alarm rate of smoke early warning is reduced.

(3) The hybrid depth network model designed by the invention has the advantages of high accuracy, high reliability, low omission factor, low false alarm rate, low delay and the like in the aspect of video smoke detection, and has very high application and popularization values.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A fire smoke detection method based on a motion characteristic mixed depth network is characterized by comprising the following steps:

2. The fire smoke detection method based on the motion feature mixed depth network as claimed in claim 1, wherein the network structure of the visual feature extraction layer is sequentially an input layer I-a convolutional layer I-a pooling layer I-a convolutional layer II-a pooling layer II-a convolutional layer III-a pooling layer III-a full-connected layer I; the network structure of the motion characteristic extraction layer sequentially comprises an input layer II, a convolution layer IV, a convolution layer V, a convolution layer VI, a convolution layer VII, a convolution layer VIII, a convolution layer IX and a full connection layer II.

3. The fire smoke detection method based on the motion feature hybrid deep network as claimed in claim 1 or 2, wherein the method of inputting the training set into the motion feature hybrid deep network for training is as follows:

4. The fire smoke detection method based on the motion feature hybrid depth network according to claim 3, wherein the training end condition is that the learning rate is attenuated to be the same as the value of a loss function corresponding to le-9 or BP algorithm; the loss function is a cross entropy loss function.

5. The fire smoke detection method based on the motion feature hybrid depth network as claimed in claim 3, wherein the method for fusing the visual feature and the motion feature is as follows:

wherein the content of the first and second substances,

in order to be a visual characteristic diagram,

i is the abscissa of the feature map x, j is the ordinate of the feature map x,

the maximum feature map after fusion is shown.

6. The fire smoke detection method based on the motion feature hybrid deep network as claimed in claim 1, wherein the temporal context information learning layer is:

g^(t)＝k₁y^(t)+k₂q^(t-1)

q^(t)＝relu(g^(t))

7. The fire smoke detection method based on the motion feature hybrid depth network as claimed in claim 1, wherein the method for processing the detection video sequence by using the motion region detection algorithm is as follows:

wherein, theta_l(i, j) represent digital codes for detecting image frames of main direction l in a video sequence, each digital code representing a direction of motion, H_t(θ₁(i, j)) represents a histogram of an image frame having a principal direction l within a time window, l being 2,3,4, Z_tRepresenting the number of image frames, t representing the time instant;