CN112949453A

CN112949453A - Training method of smoke and fire detection model, smoke and fire detection method and smoke and fire detection equipment

Info

Publication number: CN112949453A
Application number: CN202110215838.2A
Authority: CN
Inventors: 曹毅超; 孙飞; 施燕平; 李溯; 陈斌锋; 封晓强
Original assignee: NANJING ENBO TECHNOLOGY CO LTD
Current assignee: NANJING ENBO TECHNOLOGY CO LTD
Priority date: 2021-02-26
Filing date: 2021-02-26
Publication date: 2021-06-11
Anticipated expiration: 2041-02-26
Also published as: CN112949453B

Abstract

The invention discloses a training method of a smoke and fire detection model, a smoke and fire detection method and smoke and fire detection equipment, and belongs to the technical field of smoke and fire detection. The training method comprises the following steps: constructing a video firework sample data set; performing feature extraction on an input image input texture branch to obtain multi-scale feature representation, and fusing the multi-scale feature representation into texture features through a feature pyramid; calculating a frame difference image of the input image and the reference image, and inputting the frame difference image into a motion branch to calculate a motion attention weight map; performing motion perception enhancement on the texture features; generating a weak firework target mask with a weak attention module; and obtaining a firework characteristic diagram according to the texture characteristics and the weak firework target mask after the motion perception is enhanced, and detecting the firework target. And a method for smoke and fire detection using the trained model, and an apparatus for performing the method for smoke and fire detection are proposed. The method can effectively improve the detection accuracy of fireworks, and is low in calculation cost and convenient to deploy.

Description

Training method of smoke and fire detection model, smoke and fire detection method and smoke and fire detection equipment

Technical Field

The invention belongs to the technical field of smoke and fire detection, and particularly relates to a training method of a smoke and fire detection model, a smoke and fire detection method and smoke and fire detection equipment.

Background

The occurrence of fire can not only cause property loss, but also seriously harm the life safety of people, once heavy and extra-large fire occurs, the direct industrial loss caused is often more huge, even if the fire occurs in the first brain, communication links, units involved in foreign countries, ancient buildings, scenic spots and other areas, serious political influence is often caused, and the whole country is influenced or even the world is involved. With the development of the deep learning technology, the computational vision technology has been developed greatly, the deep learning technology has been successful in the fields of target detection, behavior recognition, super-resolution and the like, and the computer vision technology is used for detecting fire and smoke, so that the computer vision technology has attracted extensive attention in the academic and industrial fields.

However, the firework target is different from the general rigid body target detection, the edge of the firework target has the properties of blurring and translucency, and the firework target belongs to a special fluid target. In addition, color and texture may also vary greatly under different lighting conditions. The existing firework detection method can be divided into image-based and video-based according to the dimension of input data. Image-based detection methods typically focus on static texture, edges, contours, etc. information of pyrotechnic objects. The video-based detection algorithm focuses more on the dynamic characteristics of diffusion speed, frequency variation and the like of the firework target. Due to the lack of dynamic information, compared with a video-based method, an image-based detection algorithm generally causes a higher false negative rate and a higher false positive rate, and therefore, a video-based method is adopted in many existing firework detection methods. For video-based smoke detection methods, the accuracy of smoke detection needs to be considered, and in most application scenarios, the deployment of the method needs to be considered.

Disclosure of Invention

The technical problem is as follows: aiming at the problem that the accuracy of the existing firework detection method based on video is low, the invention firstly provides a firework detection model training method, so that a firework detection model with higher identification accuracy can be trained; then, based on the trained smoke and fire detection model, a smoke and fire detection method capable of accurately identifying smoke and fire is provided; further, a device for implementing the method for detection of smoke and fire is proposed, so as to enable deployment, accurate detection of smoke and fire; in addition, the invention has low calculation cost and is convenient to deploy.

The technical scheme is as follows: in one aspect, the invention provides a method for training a smoke and fire detection model, the smoke and fire detection model comprising a texture branch and a motion branch, the method comprising:

constructing a video firework sample data set, wherein the video firework sample data set comprises a plurality of samples, and each sample comprises an input image and a reference image;

performing feature extraction on an input image input texture branch to obtain multi-scale feature representation, and fusing the multi-scale feature representation into texture features through a feature pyramid;

calculating a frame difference image of the input image and the reference image, and inputting the frame difference image into a motion branch to calculate a motion attention weight map;

performing motion perception enhancement on the texture features;

generating a weak firework target mask with a weak attention module;

and obtaining a firework characteristic diagram according to the texture characteristics and the weak firework target mask after the motion perception is enhanced, and detecting the firework target.

Further, the method for inputting the frame difference image into the motion branch to calculate the motion attention weight map comprises the following steps:

down-sampling the frame difference image;

graying the down-sampled frame difference image;

and inputting the grayed image into a standard residual block, and calculating to obtain a motion attention weight map.

Further, the method for enhancing motion perception of the texture features comprises the following steps:

F^m＝F^a+F^a*A_m

wherein, F^mFor texture features after enhancement of motion perception, F^aTexture features before motion perception enhancement, A_mFor the motor attention weight map, channel-by-channel multiplication using A_mSingle element with F^aThe channel elements at that location in (a) are multiplied.

Further, the method of generating a weak pyrotechnic target mask with a weak attention module includes:

randomly sampling pixel points of a plurality of smoke and fire targets and non-smoke and fire targets in a data labeling frame of an input image to construct a smoke and fire pixel data set;

training a random forest model by taking RGB (red, green and blue) channels as features;

classifying pixels in a labeling frame in the input image by using a trained random forest model to obtain a mask code in the labeling frame;

and (3) representing the pixel value of the firework area by 1, representing the pixel value of the non-firework area by 0, and pasting the mask in the marking frame to a mask with the total number of 0 according to the position of the marking frame to obtain a complete weak firework target mask.

Further, the method for obtaining the firework characteristic diagram according to the texture characteristics and the weak firework target mask after the motion perception enhancement comprises the following steps:

F^w＝F^m*A_w+F^r

wherein, F^wShows the Firework characteristics, A_wRepresenting weak-pyrotechnic object masks, F^mRepresenting a textural feature after enhancement of motion perception, F^rIndicating the compensated residual.

Further, the samples in the video pyrotechnic sample data set comprise positive and negative samples, wherein:

the positive samples are frames with firework targets in the video, any frame is selected as an input image for each positive sample, one frame is randomly extracted in the range of 200 ms-5 s before and after the frame to serve as a corresponding reference image, and firework target boundary box labeling is carried out on the input image;

and the negative samples are frames without firework targets in the video, any frame is selected as an input image for each negative sample, and one frame is randomly extracted as a corresponding reference image in the range of 200 ms-5 s before and after the frame.

In another aspect, the invention provides a smoke and fire detection method, wherein a smoke and fire detection model is obtained by training by using the training method, and comprises a texture branch and a motion branch; the method comprises the following steps:

acquiring a firework video image, randomly extracting one frame in a preset time range before and after the frame as a corresponding reference image for any frame of video input image, and calculating a frame difference image between the input image and the reference image;

inputting an input image into a texture branch to obtain multi-scale feature representation, and fusing the multi-scale feature representation into texture features through a feature pyramid;

inputting the frame difference image into a motion branch circuit to calculate a motion attention weight map;

performing motion perception enhancement on the texture features;

predicting a weak firework target mask according to the texture features after the motion perception is enhanced;

and obtaining a firework characteristic diagram according to the texture characteristics and the weak firework target mask after the motion perception is enhanced, and detecting a firework target.

Further, the method for calculating the motion attention weight map according to the frame difference image input motion branch comprises the following steps:

down-sampling the frame difference image;

graying the down-sampled frame difference image;

F^m＝F^a+F^a*A_m

Further, the method for predicting the weak firework target mask according to the texture features after the motion perception enhancement comprises the following steps:

optimization was performed using a standard Focal local.

Further, the method for obtaining the smoke and fire feature map according to the texture features and the weak smoke and fire target masks after the motion perception enhancement comprises the following steps:

F^w＝F^m*A_w+F^r

In yet another aspect, the present invention provides a smoke and fire detection apparatus comprising:

the image acquisition device is used for acquiring firework video images;

a processor; and

a memory having stored therein computer program instructions which, when executed by the processor, cause the processor to perform the pyrotechnic detection method of any of claims 7-10.

Has the advantages that: compared with the prior art, the invention has the following advantages:

(1) the model trained by the training method for the smoke and fire detection model has a two-way network, can avoid the problem of loss of dynamic information of the traditional single-frame detection method, can highlight the dynamic characteristics of smoke and fire areas, and enables the model to have strong identification capability and interference factor resisting discrimination capability for unobvious smoke and fire targets, so that the trained model has high identification accuracy for smoke and fire, and can accurately identify smoke and fire when the model trained by the method is applied to smoke and fire detection. Moreover, the trained model can finish accurate detection of the firework target only by inputting two frames of images, so that the model is ensured to have lower calculation cost, and engineering deployment is facilitated.

In addition, during the training of the model, a weak guide attention module is introduced, mask generation for the semi-transparent smoke region is achieved, and the problem of divergence possibly generated by manual labeling of the semi-transparent smoke target is solved. By using the weak guide attention module and a multi-task learning strategy, the attention degree of the model to the pixel level of the firework target is improved, so that the training data can be more fully utilized by the model, and the detection accuracy of the model to the firework target is improved.

(2) According to the firework detection method, the firework detection model trained by the firework detection model training method provided by the invention is utilized, and the trained firework detection model has a two-way network, so that the problem of dynamic information loss of the traditional single-frame detection method can be avoided, the dynamic characteristics of a firework area can be highlighted, a firework target which is not highlighted has strong recognition capability and interference factor resisting discrimination capability, and firework has high recognition accuracy, so that the firework target can be detected more accurately when the firework target is detected. In addition, the firework detection model has higher attention degree to the firework target pixel level, training data can be more fully utilized, and the firework detection accuracy is further improved. In addition, the model has low calculation cost, so that the application and the deployment of the method are facilitated.

(3) The smoke and fire detection equipment provided by the invention can be used for more accurately detecting smoke and fire targets, and can be used for accurately detecting the smoke and fire targets when being deployed in a specific application scene, so that fire can be timely found, and casualties and property loss caused by fire are reduced.

Drawings

FIG. 1 is a network architecture framework diagram in an embodiment of the invention;

FIG. 2 is a flow chart of a pyrotechnic detection model training method in an embodiment of the invention;

FIG. 3 is a flow diagram of a weak lead attention module generating a weak pyrotechnic target mask during model training in an embodiment of the invention;

FIG. 4 is a graph of the results of the visualization of five samples at different stages during model training according to an embodiment of the present invention;

FIG. 5 is a flow chart of a method of smoke and fire detection in an embodiment of the present invention.

Detailed Description

The invention is further described with reference to the following examples and the accompanying drawings.

First, a structure of a smoke and fire detection model according to an embodiment of the present invention is described, and as shown in fig. 1, in the embodiment of the present invention, the smoke and fire detection model is a two-way network, which is called a two-frame motion-aware backbone network, and the smoke and fire detection model includes two branches, namely a texture branch for processing an input image and a motion branch for processing a frame difference image. In one embodiment of the present invention, the texture branch adopts a network structure of MobileNetV3, and the motion branch includes a down-sampling module, a graying module, and a convolutional neural network residual block, in one embodiment of the present invention, a residual block of MobileNetV2 is adopted, and it is described that, in other embodiments, the residual blocks in the texture branch and the operation branch can be replaced by other existing networks by those skilled in the art.

With reference to fig. 1 and 2, an embodiment of the training method of the smoke detection model of the present invention is described, the training method comprising:

s100: and constructing a video firework sample data set. In the embodiment of the invention, the video firework sample data set comprises a plurality of positive samples and negative samples, wherein the positive samples are frames with firework targets in a video, any frame is selected as an input image for each positive sample, one frame is randomly extracted in the range of 200 ms-5 s before and after the frame as a corresponding reference image, and firework target boundary box labeling is carried out on the input image; the negative sample is a frame without a firework target in the video, and the input image and the reference image of the negative sample are extracted by the same method without marking the input image of the negative sample.

S110: and inputting the input image into the texture branch for feature extraction to obtain multi-scale feature representation, and fusing the multi-scale feature representation into texture features through a feature pyramid. In one embodiment of the present invention, the texture branch adopts a network structure of MobileNetV3, and is formulated as follows:

the texture branch in this embodiment can produce a four-scale representation of the features, where IF is the input image, with dimensions of 3 × H × W, the number of color image channels is 3, H and W are the height and width of the input image respectively,

in this embodiment, after the texture branch performs feature extraction on the input image, feature representations of four scales are obtained. Then, the characteristics of four scales are expressed and fused into a texture characteristic F through a characteristic pyramid FPN^aExpressed as:

in the embodiment of the invention, the characteristic pyramid adopts a standard characteristic pyramid, multi-scale characteristic representation of the input image is extracted through the texture branch, and the multi-scale characteristic representation is fused by the characteristic pyramid, so that the texture characteristic F of the input image is extracted^aAnd, the texture feature F^aHas a size of

S120: and calculating a frame difference image of the input image and the reference image, and inputting the frame difference image into the motion branch to calculate a motion attention weight map.

In one embodiment of the invention, the frame difference image is obtained by subtracting the input image from the reference image, and the frame difference image is calculated to highlight the changed area in the firework image. Since the information of the frame difference image is sparse and the calculation amount is large if the frame difference image is not processed, in an embodiment of the present invention, after the frame difference image is input into the motion branch, the frame difference image of 3 × H × W is first down-sampled to

Then grayed to

Then inputting the grayed image into a standard residual block, and calculating to obtain a motion attention weight map A_mThe calculation flow of the motion branch can be formulated as follows:

A_m＝f_MC{grayscale[downscale(IF-RF)]}

wherein A is_mRepresents the motion attention weight graph, down scale represents the length-width fourfold down sampling, graph represents the graying of the color frame difference image, f_MCRepresenting the calculation of the standard residual block of the motion branch, IF representing the input image, RF representing the reference image, IF-RF representing the frame difference image. In this embodiment, the dimensions are calculated as

The exercise attention weight map A_m。

It should be noted that, in other embodiments of the present invention, steps S110 and S120 may be performed synchronously, or step S120 may be performed first, and then step S110 may be performed.

S130: and performing motion perception enhancement on the texture features. When obtaining the attention weight map A_mThen, the attention weight map is used to compare the texture features F of the input image^aPerforming motion perception enhancement, in one of the present inventionIn an embodiment, the method for enhancing motion perception includes:

F^m＝F^a+F^a*A_m

wherein, F^mFor texture features after enhancement of motion perception, F^aTexture features before motion perception enhancement, A_mFor the motor attention weight map, channel-by-channel multiplication using A_mSingle element with F^aMultiplication of the channel elements at this position, F^mSize of (D) and F^aThe same is true. Through the process, the characteristics corresponding to the motion areas in the image can be enhanced.

S140: a weak pyrotechnic target mask is generated with a weak attention module. In an embodiment of the invention, in order to raise the attention of the model to the pyrotechnic region, a weak guidance attention module is introduced for generating a weak pyrotechnic target mask to guide the identification of the model to the pyrotechnic region. Specifically, in one embodiment of the present invention, the method of generating the weak pyrotechnic target mask in the weak attention module is, as shown in fig. 3:

s1401: randomly sampling pixel points of a plurality of smoke and fire targets and non-smoke and fire targets in a data labeling frame of an input image to construct a smoke and fire pixel data set;

s1402: training a random forest model by taking RGB (red, green and blue) channels as features;

s1403: classifying pixels in a label box in the input image by using the trained random forest model to obtain a mask in the label box, for example, in one embodiment, in this way, the mask in the label box shown in fig. 1 is obtained

S1404: and (3) representing the pixel value of the firework area by 1, representing the pixel value of the non-firework area by 0, and pasting the mask in the marking frame to a mask with the total number of 0 according to the position of the marking frame to obtain a complete weak firework target mask. In one embodiment of the invention, the size of the weak pyrotechnic object mask corresponds to the size of the sports attention weight map, e.g. when a sports betThe gravity graph has the size of

The size of the weak pyrotechnic target mask is also

S150: and obtaining a firework characteristic diagram according to the texture characteristics and the weak firework target mask after the motion perception is enhanced, and detecting the firework target. Specifically, in the embodiment of the present invention, the specific method is:

F^w＝F^m*A_w+F^r

As can be seen from the above method, the weak pyrotechnic target mask A_wIs used again for the feature map F^mAnd (6) carrying out adjustment. Compensating residual F^rHas the effects of_wMay lose a portion of the detail information and therefore add a compensation component to preserve the detail feature information. The resulting pyrotechnic profile F characterized by weak guidance^wThe size of which is still equal to A_mAnd (5) the consistency is achieved.

In the training process of the model, A can be predicted by using the texture features after the enhancement of motion perception_wAnd F^rWhen prediction is carried out, the method can be similar to a multitask supervision process, and standard Focal local is adopted for optimization, so that A is obtained_wAnd F^rBut during training, for A_wObtained by using a weak lead attention module, not predicted, F^rThe method is obtained by prediction.

In the embodiment of the invention, when the firework target detection is carried out, the firework target detection can be carried out by adopting a CenterNet detection head of Anchor-free.

FIG. 4 shows the states of the video image when the model is trained, wherein FIG. 4 showsLine (a) shows the input image, line (b) shows the residual image, and line (c) shows the attention-to-motion weight map A_m(ii) a Line (d) shows a weak pyrotechnic target mask A_wLine (e) shows a pyrotechnic characteristic diagram F^w. Where fig. 4 gives an example of five samples in total, the first three columns being positive samples and the last two columns being negative samples.

The model trained by the training method for the smoke and fire detection model has a two-way network, can avoid the problem of loss of dynamic information of the traditional single-frame detection method, can highlight the dynamic characteristics of smoke and fire areas, and enables the model to have strong identification capability and interference factor resisting discrimination capability for unobvious smoke and fire targets, so that the trained model has high identification accuracy for smoke and fire, and can accurately identify smoke and fire when the model trained by the method is applied to smoke and fire detection. Moreover, the trained model can finish accurate detection of the firework target only by inputting two frames of images, so that the model is ensured to have lower calculation cost, and engineering deployment is facilitated.

Based on the training method, a firework detection model with high detection accuracy can be trained, and by using the trained firework detection model, the invention provides a firework detection method, as shown in fig. 5, the method comprises the following steps:

s200: acquiring a firework video image, randomly extracting one frame in a set time range before and after the frame image as a corresponding reference image for any frame of video input image, and calculating a frame difference image between the input image and the reference image. In the implementation process of the specific method, video acquisition equipment such as a camera and the like can be adopted to acquire a firework video image, and in a general situation, a firework detection system deployed in a specific scene works in real time and needs to continuously detect surrounding scenes, so that each frame of the acquired firework video image is input into a firework detection model for detection, and therefore, one frame can be randomly extracted as a corresponding reference image within a range of 200ms to 5s before and after the frame of image.

S210: inputting an input image into a texture branch to obtain multi-scale feature representation, and fusing the multi-scale feature representation into texture features through a feature pyramid; specifically, the specific operation of this step is the same as the operation of step S110 in the model training method, and is not described here again.

S220: inputting the frame difference image into a motion branch circuit to calculate a motion attention weight map; the specific operation of this step is the same as the operation mode of step S220 in the model training method, and the frame difference image is first down-sampled, then the down-sampled frame difference image is grayed, and finally the grayed image is input into the standard residual block, and the motion attention weight map is obtained by calculation, and more specifically, this is not repeated here.

Similar to the training process of the model, in other embodiments, steps S210 and S220 may be performed simultaneously, or S220 may be performed first and then S210 may be performed.

S230: performing motion perception enhancement on the texture features; the specific operation of this step is the same as the operation of step S220 in the model training method, and is not described herein again.

S240: predicting a weak firework target mask according to the texture features after the motion perception is enhanced; unlike the training method of the firework detection model, when firework target detection is performed, a weak firework target mask needs to be predicted through the texture features after motion perception enhancement, and a weak guiding attention module is not used for generating the weak firework attention mask. In one embodiment of the invention, the weak Firework attention mask A is predicted using a standard Focal local optimization, similar to the process of multitask supervision_w. While predicting the weak firework attention mask, the compensation residual F is predicted at the same time^r。

S250: and obtaining a firework characteristic diagram according to the texture characteristics and the weak firework target mask after the motion perception is enhanced, and detecting a firework target. Specifically, the specific operation of this step is the same as the operation of step S110 in the model training method, and is not described here again. In the embodiment of the invention, the smoke and fire target detection is carried out by adopting a CenterNet detection head of Anchor-free.

According to the firework detection method, the firework detection model trained by the firework detection model training method provided by the invention is utilized, and the trained firework detection model has a two-way network, so that the problem of dynamic information loss of the traditional single-frame detection method can be avoided, the dynamic characteristics of a firework area can be highlighted, a firework target which is not highlighted has strong recognition capability and interference factor resisting discrimination capability, and firework has high recognition accuracy, so that the firework target can be detected more accurately when the firework target is detected. In addition, the firework detection model has higher attention degree to the firework target pixel level, training data can be more fully utilized, and the firework detection accuracy is further improved. In addition, the model has low calculation cost, so that the application and the deployment of the method are facilitated.

Further, the invention provides a smoke and fire detection device, which comprises an image acquisition device, a processor and a memory, wherein the image acquisition device is used for acquiring smoke and fire video images, for example, a camera can be used for acquiring the smoke and fire video images; the memory has stored therein computer program instructions which, when executed by the processor, cause the processor to perform a smoke detection method in an embodiment of the invention.

The processor may be one or more, may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.

The memory may likewise be one or more, and the memory may be various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium and executed by a processor to implement the smoke detection method in the embodiments of the present application described above. Also, the memory may store images generated at various stages during the smoke and fire detection process.

The smoke and fire detection equipment provided by the invention can be used for more accurately detecting smoke and fire targets, and can be used for accurately detecting the smoke and fire targets when being deployed in a specific application scene, so that fire can be timely found, and casualties and property loss caused by fire are reduced.

The above examples are only preferred embodiments of the present invention, it should be noted that: it will be apparent to those skilled in the art that various modifications and equivalents can be made without departing from the spirit of the invention, and it is intended that all such modifications and equivalents fall within the scope of the invention as defined in the claims.

Claims

1. A method of training a smoke detection model, the smoke detection model comprising a texture branch and a motion branch, the method comprising:

performing motion perception enhancement on the texture features;

generating a weak firework target mask with a weak attention module;

2. The training method of claim 1, wherein the method of inputting the frame difference image into the motion branch to calculate the motion attention weight map comprises:

down-sampling the frame difference image;

graying the down-sampled frame difference image;

3. The training method according to claim 1, wherein the method for enhancing the texture features by motion perception is as follows:

F^m＝F^a+F^a*A_m

4. A training method as claimed in claim 1, wherein the method of generating a weak pyrotechnic target mask using a weak attention module comprises:

5. Training method according to claim 1, wherein the method of deriving a smoke signature graph from the motion perception enhanced texture features and the weak smoke target masks is:

F^w＝F^m*A_w+F^r

6. A training method as claimed in any one of claims 1-5, wherein the samples in the video pyrotechnic sample data set comprise positive and negative samples, wherein:

7. A smoke and fire detection method, characterized in that a smoke and fire detection model is obtained by training according to the training method of any one of claims 1 to 6, wherein the smoke and fire detection model comprises a texture branch and a motion branch; the method comprises the following steps:

performing motion perception enhancement on the texture features;

8. The method of claim 7, wherein the method of computing the motion attention weight map from the frame difference image input motion branch comprises:

down-sampling the frame difference image;

graying the down-sampled frame difference image;

9. The method of claim 7, wherein the method for enhancing motion perception of the texture feature comprises:

F^m＝F^a+F^a*A_m

10. The method according to claim 7, wherein the method of predicting a weak smoke target mask from motion perception enhanced texture features comprises:

optimization was performed using a standard Focal local.

11. The method according to claim 10, wherein the method for obtaining the smoke feature map according to the texture feature and the weak smoke target mask after the motion perception enhancement comprises:

F^w＝F^m*A_w+F^r

12. A smoke detection device, comprising:

the image acquisition device is used for acquiring firework video images;

a processor; and

a memory having stored therein computer program instructions which, when executed by the processor, cause the processor to perform the pyrotechnic detection method of any of claims 7-11.