CN112215122B

CN112215122B - Fire detection method, system, terminal and storage medium based on video image target detection

Info

Publication number: CN112215122B
Application number: CN202011069784.5A
Authority: CN
Inventors: 胡金星; 王传胜
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2023-10-24
Anticipated expiration: 2040-09-30
Also published as: CN112215122A

Abstract

The application relates to a fire detection method, a fire detection system, a fire detection terminal and a fire detection storage medium based on video image target detection. Comprising the following steps: converting an original natural image into a dust haze image and a sand dust image by adopting a data enhancement algorithm based on an atmospheric scattering model, and generating a data set for training the model; constructing a convolutional neural network model LFNT, inputting a data set into the LFNT model for iterative training to obtain optimal model parameters; the convolutional neural network model LFNT comprises a skeleton feature extraction model, a main feature extraction model and a variable-scale feature fusion model; the skeleton feature extraction model extracts main features of an input image through convolution of three different scales; the main feature extraction model is used for carrying out further feature extraction on main features to generate three groups of feature graphs; and the variable scale feature fusion model carries out self-adaptive fusion on the three groups of feature graphs, and outputs a detection result. The method can improve the robustness of the model in abnormal weather such as sand dust, dust haze and the like, and enable the model to obtain a better detection result.

Description

Fire detection method, system, terminal and storage medium based on video image target detection

Technical Field

The application belongs to the technical field of fire detection, and particularly relates to a fire detection method, a fire detection system, a fire detection terminal and a fire detection storage medium based on video image target detection.

Background

Fire detection plays a vital role in safety monitoring. At present, the traditional fire detection method is a method based on image priori, and the method is based on the color and shape of the image for fire detection, however, the robustness of the color and motion characteristics and the error rate are often affected by preset parameters, so that the method cannot be applied in a complex environment, and the positioning accuracy is easily affected by the area.

Monitoring is a tedious and time-consuming task, especially in an uncertain monitoring environment, which has a large uncertainty in time, space and even scale. The sensor-based detector has limited performance in terms of bit error rate and sensing range, and therefore it cannot detect remote or small fires. In recent years, with the rapid development of deep learning technology, convolutional Neural Networks (CNNs) have been applied to fire detection. However, the existing fire detection method based on deep learning has the following disadvantages:

1. the deep learning-based method requires a large amount of remote sensing images as training data, and training of a model is very challenging due to the scarcity of real remote sensing images.

2. The fire detection model based on deep learning is too large in scale to be suitable for resource-constrained devices.

3. The complexity of the existing algorithm is too high to detect in real time.

4. The anti-interference capability is weak, and the influence of severe monitoring environments such as dust haze, dust and the like is easy to be received.

5. Most fire detection algorithms focus on only a single environment and therefore a high error rate can occur in an uncertain environment.

In summary, the existing fire detection method has great room for improvement in terms of algorithm complexity, application scene range, model size and the like.

Disclosure of Invention

The application provides a fire detection method, a fire detection system, a fire detection terminal and a fire detection storage medium based on video image target detection, and aims to at least solve one of the technical problems in the prior art to a certain extent.

In order to solve the problems, the application provides the following technical scheme:

a fire detection method based on video image object detection, comprising:

converting an original natural image into a dust haze image and a sand dust image by adopting a data enhancement algorithm based on an atmospheric scattering model, and generating a data set for training the model;

constructing a convolutional neural network model LFNT, and inputting the data set into the LFNT model for iterative training to obtain optimal model parameters; the skeleton feature extraction model adopts convolution of 3, 5 and 7 to 7 scales to extract the features of the input image to obtain feature images with the sizes of 13, 26 and 52; the main feature extraction model performs further feature extraction on the main features to generate three groups of feature graphs with the sizes of 52, 26 and 13; the variable scale feature fusion model maps three groups of feature images to different convolution kernels and step sizes for convolution, and splices all convolutions with the same size to obtain three groups of feature images, and the three groups of feature images are operated by using a channel-based attention mechanism to obtain feature images with the sizes of 13, 26 and 52, which are respectively used for detecting small, medium and large objects; the inputting the data set into the LFNet model for iterative training further comprises: respectively selecting a mean square error and a cross entropy as a loss function to perform model optimization;

the convolutional neural network model LFNT comprises a skeleton feature extraction model, a main feature extraction model and a variable-scale feature fusion model; the skeleton feature extraction model extracts main features of an input image through convolution of three different scales; the main feature extraction model is used for carrying out further feature extraction on the main features to generate three groups of feature graphs; the variable-scale feature fusion model carries out self-adaptive fusion on the three groups of feature images and outputs a detection result;

and inputting the fire image to be detected into a trained LFNT model, and outputting a fire positioning area and a fire type of the fire image to be detected through the LFNT model.

The technical scheme adopted by the embodiment of the application further comprises the following steps: the method for converting the original natural image into the dust haze image and the sand dust image by adopting the data enhancement algorithm based on the atmospheric scattering model comprises the following steps:

acquiring an original natural image; the original natural image includes a non-alarm image without a fire alarm area and a real fire alarm image.

The technical scheme adopted by the embodiment of the application further comprises the following steps: the method for converting the original natural image into the dust haze image by adopting the data enhancement algorithm based on the atmospheric scattering model comprises the following steps:

the atmospheric scattering model respectively adopts at least two transmission rates to respectively simulate and generate dust haze images with different concentrations; the dust haze image imaging formula is as follows:

I(x)＝J(x)t(x)+ɑ(1-t(x))

in the above formula, I (x) is a simulated haze image, J (x) is an input haze-free image, a is an atmospheric light value, and t (x) is a scene transmission rate.

The technical scheme adopted by the embodiment of the application further comprises the following steps: the method for converting the original natural image into the dust image by adopting the data enhancement algorithm based on the atmospheric scattering model comprises the following steps:

the atmospheric scattering model adopts fixed transmissivity and atmospheric light values, and combines three colors to simulate and generate sand images with different concentrations; the sand image simulation formula is as follows:

D(x)＝J(x)t(x)+a(C(x)*(1-t(x)))

in the above formula, D (x) is a simulated dust image, J (x) is an input haze-free image, and C (x) is a color value.

The technical scheme adopted by the embodiment of the application further comprises the following steps: the loss function is specifically:

counting the brightness, dark channel value and R channel data of a path of a fire region, regarding the statistic data as a combustion histogram prior, and writing a formula of CHP:

in the above formula, R (x) represents R channel of the image, SCP (x) is the difference between brightness of the image and dark channel, w is the width of the histogram, and h is the height of the histogram;

SCP(x)＝||v(x)-DCP(x)||

in the above formula, v (x) is the brightness of the image, and DCP (x) is the value of the dark channel of the image;

L _CHP ＝||CHP(I)-CHP(R)|| ²

in the formula, CHP represents a combustion histogram prior, and CHP (I) and CHP (R) respectively represent CHP values of a region selected by a target detection algorithm and a marked region;

the loss function is a weighted summation of three different loss functions:

L _CHP ＝βL _CE +γL _MSE +δL _CHP

in the above formula, L _CHP As a final loss function, I _CE L is a cross entropy loss function _MSE As a mean square error loss function, L _CHP A priori losses for the combustion histogram.

The embodiment of the application adopts another technical scheme that: a fire detection system based on video image object detection, comprising:

the data set construction module: the method comprises the steps of converting an original natural image into a dust haze image and a sand dust image by adopting a data enhancement algorithm based on an atmospheric scattering model, and generating a data set for training the model;

LFNet model training module: the method comprises the steps of constructing a convolutional neural network model LFNT, inputting the data set into the LFNT model for iterative training, and obtaining optimal model parameters; the skeleton feature extraction model adopts convolution of 3, 5 and 7 to 7 scales to extract the features of the input image to obtain feature images with the sizes of 13, 26 and 52; the main feature extraction model performs further feature extraction on the main features to generate three groups of feature graphs with the sizes of 52, 26 and 13; the variable scale feature fusion model maps three groups of feature images to different convolution kernels and step sizes for convolution, and splices all convolutions with the same size to obtain three groups of feature images, and the three groups of feature images are operated by using a channel-based attention mechanism to obtain feature images with the sizes of 13, 26 and 52, which are respectively used for detecting small, medium and large objects; the inputting the data set into the LFNet model for iterative training further comprises: respectively selecting a mean square error and a cross entropy as a loss function to perform model optimization;

the convolutional neural network model LFNT comprises a skeleton feature extraction model, a main feature extraction model and a variable-scale feature fusion model; the skeleton feature extraction model extracts main features of an input image through convolution of three different scales; the main feature extraction model is used for carrying out further feature extraction on the main features to generate three groups of feature graphs; the variable-scale feature fusion model carries out self-adaptive fusion on the three groups of feature images and outputs a detection result; the detection result comprises a fire localization area of the fire image and a fire type.

The embodiment of the application adopts the following technical scheme: a terminal comprising a processor, a memory coupled to the processor, wherein,

the memory stores program instructions for implementing the fire detection method based on video image target detection;

the processor is configured to execute the program instructions stored by the memory to control fire detection based on video image object detection.

The embodiment of the application adopts the following technical scheme: a storage medium storing program instructions executable by a processor for performing the fire detection method based on video image object detection.

Compared with the prior art, the embodiment of the application has the beneficial effects that: according to the fire detection method, the system, the terminal and the storage medium based on the video image target detection, the original image is converted into the dust haze or sand dust image with different degrees by using the data enhancement algorithm based on the atmospheric scattering model, the data set for training the model is generated, and the convolutional neural network model LFNT suitable for fire smoke detection under uncertain environments is constructed, so that the robustness of the model in abnormal weather such as sand dust, dust haze and the like can be improved, and a better detection result can be obtained. Meanwhile, the LFNT model in the embodiment of the application has smaller size, can reduce the calculation cost and is beneficial to being applied to equipment with limited resources.

Drawings

FIG. 1 is a flow chart of a fire detection method based on video image object detection in an embodiment of the present application;

FIG. 2 is a schematic diagram of the simulation effect of dust haze and sand images based on an atmospheric scattering model according to an embodiment of the present application;

FIG. 3 is a block diagram of a convolutional neural network model of an embodiment of the present application;

FIG. 4 is a block diagram of a variable scale feature fusion model of an embodiment of the present application;

FIG. 5 is a block diagram of a channel-based attention mechanism of an embodiment of the present application;

FIG. 6 is a schematic diagram of a fire detection system based on video image object detection according to an embodiment of the present application;

fig. 7 is a schematic diagram of a terminal structure according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a storage medium according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

Referring to fig. 1, a flow chart of a fire detection method based on video image object detection according to an embodiment of the application is shown. The fire detection method based on video image target detection provided by the embodiment of the application comprises the following steps of:

s10: acquiring an original natural image;

in this step, the raw natural images obtained include 293 non-alarm images of areas without fire alarm and 5073 real fire alarm images. The robustness of the training algorithm to the non-alarm target can be improved by utilizing the non-alarm image, and the error rate of the detector is reduced. The detection capability of the target detection model can be improved by using a real fire alarm image.

S20: converting an original natural image into a new synthetic image influenced by different types and different degrees of abnormal weather by adopting a data enhancement algorithm based on an atmospheric scattering model, and generating a data set for training the model;

in the step, the influence of abnormal weather such as dust haze or sand dust on performance is usually ignored by the existing intelligent monitoring algorithm, so that the robustness of the monitoring algorithm under uncertain weather conditions is poor. In order to solve the defects, the problem of influence of abnormal weather on a fire detection algorithm is considered, and the data enhancement method based on the atmospheric scattering model is used for respectively simulating dust haze images and dust images with different degrees, so that an original natural image is converted into a new synthetic image influenced by dust haze or dust weather with different degrees, and a large-scale reference data set for training and testing the fire detection model is constructed, so that the robustness of the target detection model in abnormal weather such as dust and dust haze is improved.

Further, please refer to fig. 2, which is a schematic diagram illustrating the simulation effect of the dust-haze and sand-dust images based on the atmospheric scattering model according to the embodiment of the present application, wherein (a) is an original image, (b), (c) and (d) are respectively dust-haze images synthesized by the atmospheric scattering model with different transmission rates, and (e), (f) and (g) are respectively sand-dust images simulated by adopting fixed transmittance and atmospheric light values and combining three different colors. The imaging formula of the dust-haze image is as follows:

I(x)＝J(x)t(x)+ɑ(1-t(x)) (1)

in equation (1), I (x) is the simulated haze image, J (x) is the input haze-free image, a is the atmospheric light value, and t (x) is the scene transmission rate, which describes the portion of the view that is unscattered and reaches the camera sensor. In order to simulate the haze weather with different concentrations, the embodiment of the application sets the atmospheric light value alpha to 0.8 and the transmittance to 0.8, 0.6 and 0.4 respectively.

Since depth information does not play a major role in the image dusting task, it is assumed that the transmission does not change with the depth of the image. Through priori statistics, the embodiment of the application selects three colors suitable for simulating the dust images to simulate respectively, and the dust image simulation formula is as follows:

D(x)＝J(x)t(x)+a(C(x)*(1-t(x))) (2)

in the formula (2), D (x) is a simulated dust image, J (x) is an input haze-free image, and C (x) is a selected color value.

S30: constructing a convolutional neural network model LFNT;

in an embodiment of the present application, a framework of a convolutional neural network model is shown in fig. 3. LFNet is made up of common convolution layer, bottleneck building block, parameter correction linear unit, group normalization, etc., including: the framework feature extraction model, the main feature extraction model and the variable scale feature fusion model have the following specific functions:

and (3) extracting a skeleton characteristic extraction model: for extracting the main features of the input image. In order to extract more abundant image features, the features of the input image are first extracted by convolution of 3 x 3, 5 x 5 and 7 x 7 scales respectively, the receiving field is enlarged, and more image features are extracted. After convolution at three different scales, feature maps of 13 x 13, 26 x 26 and 52 x 52, respectively, are obtained. Based on the above, by extracting the feature map by adopting multi-scale convolution, feature information of different sizes around the pixel can be extracted, which is particularly important for fire images.

Main feature extraction model: the method is used for further feature extraction of main features extracted by a skeleton feature extraction model, and three groups of feature graphs with the sizes of 52, 26 and 13 are generated, wherein each small-size feature graph is extracted from the upper-layer large-size feature graph, and each convolution block is extracted by a one-layer convolution structure and a five-layer residual structure.

Variable scale feature fusion model: features extracted by the main feature extraction model are connected in series by adopting Variable Scale Feature Fusion (VSFF), and then the features are extracted by convolution and subjected to self-adaptive fusion. The structure of the variable scale feature fusion model is shown in fig. 4. To fuse the convolutionally extracted feature maps of different scales, three sets of feature map maps are fused, extending the functions of 13 x 13 and 26 x 26 to 52 x 52. The three inputs are feature maps of sizes 13 x 13, 26 x 26, 52 x 52, respectively, and the three feature maps of different sizes are mapped to different convolution kernels and steps for convolution to upsample or downsample into two other sizes. And finally, splicing all convolutions with the same size to obtain three groups of feature maps. Because the feature map obtained by splicing contains richer image features, the model positioning can be more accurate.

Further, embodiments of the present application operate three sets of feature maps extracted in a VSFF using a channel-based attention mechanism. The channel-based attention mechanism can be seen as a process of weighting the feature map according to its importance. For example, in a set of convolutions of 24 x 13, the channel-based attention mechanism will determine which of the set of feature maps has a more pronounced effect on the prediction result, and then increase the weight of that portion. By means of the attention mechanism, three times of fusion are carried out to obtain characteristic diagrams with the sizes of 13, 26 and 52 respectively for detecting small, medium and large objects. The detailed structure of the channel-based attention mechanism is shown in fig. 5.

Based on the structure, the LFNT model of the embodiment of the application has very small size (22.5M), but takes the lead position in quantitative and qualitative evaluation, reduces the calculation cost and is beneficial to the application of LNet to equipment with limited resources.

S40: inputting the data set into an LFNT model for iterative training to obtain optimal model parameters;

in this step, in the model training process, the LFNet model has two tasks: firstly, accurately positioning an alarm area in an image; and secondly, classifying disaster types of the alarm areas. In order to enable the model to better complete the two tasks, the embodiment of the application respectively selects a Mean Square Error (MSE) and a Cross Entropy (CE) as loss functions to guide network optimization, and the loss functions are based on a large amount of statistics on different fire images or videos, so that the LFNT can be helped to effectively detect fire areas.

Specifically, through a great deal of experiments on various fire images, it is found that in a smoke area, the absolute value of the difference between the brightness and the dark channel value is higher than that in other areas, the R channel of the fire area is higher than that in a non-fire area, namely the brightness, the dark channel value and the R channel of a path change along with the difference of the fire hazard area, the smoke concentration increases along with the absolute value of the difference between the brightness and the dark channel, and the visual characteristics of the fire are closely related to the pixel value of the R channel. Based on the above features, embodiments of the present application consider these statistics as a Combustion Histogram Prior (CHP), from which they are written as a formula for CHP:

in the formula (3), R (x) represents R channel of the image, SCP (x) is the difference between brightness of the image and dark channel, w is the width of the histogram, and h is the height of the histogram; can also be written as:

SCP(x)＝||v(x)-DCP(x)|| (4)

in formula (4), v (x) is the brightness of the image, and DCP (x) refers to the value of the dark channel of the image.

L _CHP ＝||CHP(I)-CHP(R)|| ² (5)

In equation (5), CHP represents the combustion histogram prior, and CHP (I) and CHP (R) represent the CHP values of the region selected by the target detection algorithm and the region labeled in the group trunk, respectively.

The final loss function is weighted summation of three different loss functions of a cross entropy loss function, a mean square error loss function and a combustion histogram prior loss function, and the formula is as follows:

L _CHP ＝βL _CE +γL _MSE +δL _CHP (6)

in the formula (6), L _CHP L is the final loss function _CE L is a cross entropy loss function _MSE As a mean square error loss function, L _CHP For the combustion histogram prior loss, β, γ, and δ are set to 0.25, and 0.5, respectively.

S50: and inputting the fire image to be detected into a trained LFNT model, and outputting a fire positioning area and a fire type of the fire image to be detected through the LFNT model.

Referring to fig. 6, a schematic diagram of a fire detection system based on video image object detection according to an embodiment of the application is shown. The fire detection system 40 based on video image object detection according to an embodiment of the present application includes:

the data set construction module 41: the method comprises the steps of converting an original natural image into a dust haze image and a sand dust image by adopting a data enhancement algorithm based on an atmospheric scattering model, and generating a data set for training the model;

LFNet model training module 42: the method comprises the steps of constructing a convolutional neural network model LFNT, inputting the data set into the LFNT model for iterative training, and obtaining optimal model parameters; the skeleton feature extraction model adopts convolution of 3, 5 and 7 to 7 scales to extract the features of the input image to obtain feature images with the sizes of 13, 26 and 52; the main feature extraction model performs further feature extraction on the main features to generate three groups of feature graphs with the sizes of 52, 26 and 13; the variable scale feature fusion model maps three groups of feature images to different convolution kernels and step sizes for convolution, and splices all convolutions with the same size to obtain three groups of feature images, and the three groups of feature images are operated by using a channel-based attention mechanism to obtain feature images with the sizes of 13, 26 and 52, which are respectively used for detecting small, medium and large objects; the inputting the data set into the LFNet model for iterative training further comprises: respectively selecting a mean square error and a cross entropy as a loss function to perform model optimization;

model optimization module 43: and the method is used for respectively selecting the mean square error and the cross entropy as loss functions to perform model optimization.

Fig. 7 is a schematic diagram of a terminal structure according to an embodiment of the application. The terminal 50 includes a processor 51, a memory 52 coupled to the processor 51.

The memory 52 stores program instructions for implementing the fire detection method based on video image object detection described above.

The processor 51 is operative to execute program instructions stored in the memory 52 to control the detection of fire based on the detection of video image objects.

The processor 51 may also be referred to as a CPU (Central Processing Unit ). The processor 51 may be an integrated circuit chip with signal processing capabilities. Processor 51 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Fig. 8 is a schematic structural diagram of a storage medium according to an embodiment of the application. The storage medium of the embodiment of the present application stores a program file 61 capable of implementing all the methods described above, where the program file 61 may be stored in the storage medium in the form of a software product, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes, or a terminal device such as a computer, a server, a mobile phone, a tablet, or the like.

According to the fire detection method, the system, the terminal and the storage medium based on the video image target detection, the original image is converted into the dust haze or sand dust image with different degrees by using the data enhancement algorithm based on the atmospheric scattering model, the data set for training the model is generated, and the convolutional neural network model LFNT suitable for fire smoke detection under uncertain environments is constructed, so that the robustness of the model in abnormal weather such as sand dust, dust haze and the like can be improved, and a better detection result can be obtained. Meanwhile, the LFNT model in the embodiment of the application has smaller size, can reduce the calculation cost and is beneficial to being applied to equipment with limited resources.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A fire detection method based on video image target detection, comprising:

2. The fire detection method based on video image object detection according to claim 1, wherein before converting the original natural image into the dust-haze image and the dust-sand image by adopting the data enhancement algorithm based on the atmospheric scattering model, the method comprises:

3. The fire detection method based on video image object detection according to claim 1 or 2, wherein the converting the original natural image into the dust-haze image using the data enhancement algorithm based on the atmospheric scattering model comprises:

I(x)＝J(x)t(x)+ɑ(1-t(x))

4. A fire detection method based on video image object detection as defined in claim 3, wherein said converting the original natural image into a dust image using a data enhancement algorithm based on an atmospheric scattering model comprises:

D(x)＝J(x)t(x)+a(C(x)*(1-t(x)))

5. The fire detection method based on video image object detection according to claim 1, wherein the loss function is specifically:

SCP(x)＝||v(x)-DCP(x)||

L _CHP ＝||CHP(I)-CHP(R)|| ²

the loss function is a weighted summation of three different loss functions:

L _CHP ＝βL _CE +γL _MSE +δL _CHP

in the above formula, L _CHP L is the final loss function _CE L is a cross entropy loss function _MSE As a mean square error loss function, L _CHP A priori losses for the combustion histogram.

6. A fire detection system based on video image object detection, comprising:

7. A terminal comprising a processor, a memory coupled to the processor, wherein,

the memory stores program instructions for implementing the video image object detection-based fire detection method of any one of claims 1 to 5;

8. A storage medium storing program instructions executable by a processor for performing the fire detection method based on video image object detection according to any one of claims 1 to 5.