CN112418005B

CN112418005B - Smoke multi-classification identification method based on reverse radiation attention pyramid network

Info

Publication number: CN112418005B
Application number: CN202011226816.8A
Authority: CN
Inventors: 顾锞; 张永慧; 乔俊飞; 李泽东
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2020-11-06
Filing date: 2020-11-06
Publication date: 2024-05-28
Anticipated expiration: 2040-11-06
Also published as: CN112418005A

Abstract

The invention discloses a smoke multi-classification identification method based on a reverse radiation attention pyramid network, which is used for judging the working state of an emptying torch. The network is first made up of three pyramid blocks in series, consisting of 3, 4 and 5 basic convolution modules. An attention mechanism is then introduced in each pyramid module for feature filtering. Finally, all feedforward outputs of the pyramid modules are connected through reverse radiation, and the system comprehensively fuses low, medium and high-level features. A smoke multi-classification recognition method based on a reverse radiation attention pyramid network belongs to the field of atmospheric environmental protection and the field of machine learning.

Description

Smoke multi-classification identification method based on reverse radiation attention pyramid network

Technical Field

The invention relates to a method for identifying the working state of a thermal power generation system, in particular to a smoke multi-classification identification method based on a reverse radiation attention pyramid network, belonging to the field of atmospheric environmental protection and the field of machine learning.

Background

The discovery and widespread use of electricity is the result of the second and third technical revolution, and has brought many benefits to human life. Common power generation modes mainly comprise thermal power generation, solar power generation, nuclear power generation, wind power generation and the like. However, the cost of solar power generation is too high to realize large-scale use; nuclear power generation is prone to cause unrecoverable pollution to the environment; wind power generation is limited by wind power and topography. Therefore, thermal power has been the mainstream method in the power generation system up to now.

In order to better maintain the safety of life production and protect the ecological environment, the waste gas generated by the thermal power system is discharged into the atmosphere through an emptying torch after being completely combusted. Under normal operating conditions, the thermal power generation system flare should produce white smoke (i.e., water vapor). However, when the flare discharges black smoke (i.e. carbon black) or colorless smoke (i.e. toxic gas), the unrecoverable degree specified by the thermal power generation industry is exceeded, so that the thermal power system operates abnormally, and serious safety risks and environmental problems are caused. In particular, black smoke is caused by incomplete combustion of exhaust gases, and is liable to cause fatal damage to public health and ecological environment. Colorless smoke is likely to mean that toxic exhaust gases are discharged directly into the air without treatment, which often results in serious air pollution. Therefore, a well-designed flue gas identification model is required to monitor the working state of the thermal power generation system, so that the human health and the ecological environment are protected. In recent years, with the rapid development of deep learning, it has been successfully applied to a number of important tasks. The feature extracted by the deep convolutional network has better characterization capability than the feature extracted by the traditional manual extraction, which is widely accepted by researchers.

CN201910319875.0 discloses a substation smoke intelligent identification monitoring method based on deep learning, which takes a frame of YOLO v3 as a basis for optimization, and adopts an image dataset to complete the construction and training of an image identification model; in addition, the method also utilizes video data to train a pseudo three-dimensional convolution residual error network to complete the construction and training of a video identification model; after the extracted video stream and the image are preprocessed, the extracted video stream and the image are sent into an image recognition model, and when the existence of smoke is detected, the video recognition model is automatically called to carry out secondary rechecking, and the detection result is checked.

CN201611116917.3 discloses a method for identifying flame/smoke after cutting forest image based on RGB reconstruction, which uses the difference of three component values of flame, smoke area and background area RGB to destroy the background area and continuously reduce the area to be detected; after the modified gray formula processing of the enhanced flame characteristics, the smoke or flame suspected area is judged according to the proportion of non-negative elements through filtering, cutting, corrosion and expansion; the invention is applied to the occasion of identifying flame/smoke in forest fire prevention.

Although deep learning has achieved remarkable results in recognition, research in smoke multi-class recognition is still very limited. Therefore, a new model is urgently needed to solve the multi-classification recognition problem of the smoke concentration. Based on the consideration, the invention firstly researches the problem of multi-classification smoke identification, and carefully develops a novel smoke multi-classification network based on the reverse radiation attention pyramid so as to fully utilize the visual characteristics of smoke of different layers, thereby improving the classification accuracy of the classifier and the accuracy of an algorithm.

Disclosure of Invention

Aiming at the serious problems of severe explosion and atmospheric pollution caused by incomplete or untreated combustion of exhaust gas and large amount of black smoke or colorless and toxic exhaust gas discharged into the atmosphere under the abnormal operation condition of the thermal power generation system, the invention researches the working state of the thermal power generation system and designs a smoke multi-classification recognition method (IRAP-Net) based on a reverse radiation attention pyramid network. According to the invention, the real-time monitoring image of the vented torch smoke collected by the monitoring camera of the thermal power plant is used as data to be detected, the data is optimized through a deep learning frame based on a pyramid network, the image dataset is utilized for construction and training, the overall smoke classification condition of the real-time monitoring image of the smoke is obtained, and the real-time monitoring of the smoke of the thermal power plant is completed.

The technical scheme adopted by the invention is a smoke multi-classification identification method based on a reverse radiation attention pyramid network, which is mainly established by the following steps:

And step 1, monitoring a video in real time by using smoke to obtain an image to be detected.

For video clips obtained by industrial-grade cameras, image data is required to be obtained by extracting video frames, image cropping, as input to a model.

Step 1.1, extracting video frames, extracting the current video frames at intervals of 1.5s, and directly discarding other video parts;

Step 1.2, cutting a smoke image segment, cutting each frame of image extracted from the video into 48 multiplied by 48 smoke image blocks in a mode of defining a cutting position, and discarding other image areas;

and 2, constructing a reverse radiation attention pyramid-based network.

A learning network based on a reverse radiation attention pyramid is built, and the network is composed of three pyramid modules connected in series, and the three pyramid layers are respectively composed of 3, 4 and 5 basic convolution modules. An attention mechanism module is introduced into each pyramid block to perform feature filtering. And finally, connecting all feedforward outputs of the pyramid modules through reverse radiation, and comprehensively fusing low, medium and high-level features. The input of the method is the image block obtained in the step 1, and the output is smoke monitoring data.

Step 2.1, preprocessing a network by adopting a basic convolution module formed by two groups of convolution layers, a normalization layer and an activation layer, inputting a smoke image with the input of (48 multiplied by 3), and after the pre-training formed by the basic convolution module and a maximum pooling layer, outputting the smoke image with the output of (24 multiplied by 16), thereby realizing the pre-adjustment of the characteristics;

step 2.2, extracting low-level features by adopting 3 groups of complex convolution modules comprising 2 unit convolution modules as a first-level pyramid model, wherein the input is a (24×24×16) dimensional network pre-training result, and the output is (24×24×96) dimensional low-level features after stacking the features;

and 2.3, enhancing relevant characteristics by adopting an attention mechanism. Firstly, considering signals mapped by the output pyramid block to each feature, initializing weights by using global average pooling, and adopting the following formula:

Wherein, Representing an initialization result of the global average pooling weight; /(I)A feature map representing an nth smoke image in f _c; f _c denotes a feature map of the smoke image of group c; /(I)And/>Feature map/>, respectively representing smoke imagesIs the height and width of (2); i and j represent/>Coordinates of height and width. Then, a two-layer neural network consisting of two complete connection layers and two activation layers is introduced, and the initialized weight is learned and updated, so that the optimal weight is obtained:

Wherein, Representing optimal weights of the smoke image multi-classification network; w ₁,W₂ is the weight of two full connection layers; wherein σ _ReLU and σ _Sigmoid refer to the ReLU function and Sigmoid function, respectively, used in the active layer; omega _c represents the initialization weights corresponding to the feature maps of the smoke images of group c. And finally, adding the original feature map to the associated enhanced feature map to generate a new feature map. The input is (24×24×96) low-level features, and the output is (24×24×96) weighted low-level features;

Step 2.4, extracting middle layer characteristics by adopting 4 groups of complex convolution modules comprising 3 unit convolution modules as a second layer pyramid model, wherein the input is (24×24×96) weighted low layer characteristics, and the output is (24×24×160) middle layer characteristics after the characteristics are stacked;

step 2.5, step 2.3, the input of which is the (24×24×160) middle layer feature, and the output of which is the (24×24×160) weighted middle layer feature;

Step 2.6, extracting high-level features by adopting 5 groups of complex convolution modules comprising 4 unit convolution modules as a third-level pyramid model, wherein the input of the complex convolution modules is (24×24×160) weighted high-level features, and the output of the complex convolution modules is (24×24×240) high-level features after stacking the features;

step 2.7, step 2.3, the input of which is (24×24×240) high-level features, and the output of which is (24×24×240) weighted high-level features;

And 2.8, combining the trunk of the pyramid network and the feedforward of the characteristic diagrams output by the three pyramid blocks to form a reverse radiation connection. Firstly, a unit convolution module is adopted to superimpose four groups of output characteristic diagrams extracted from a trunk and three pyramid blocks together, so as to obtain a new group of characteristic diagrams. The newly created feature map is then stretched to a size of 128 x 1 using a global averaging pooling operation. Finally, the vector is processed by using the full-connection operation to obtain a smoke multi-classification recognition result;

step 3, training pyramid network based on reverse radiation attention.

And 3.1, performing smoke multi-classification training on the basis of the reverse radiation attention pyramid network formed by the step 2 and the step 3 by using the smoke image block obtained in the step 1 as a training set. Wherein, the momentum coefficient is set to be 0.9, the initial learning rate is set to be 0.001, and the learning rate attenuation coefficient is set to be 0.00001. A small batch cycle of size 16 was used for 300 training cycles.

And 4, performing multi-classification monitoring on the smoke images.

Video obtained by the industrial-grade camera is subjected to video frame extraction, image clipping, feature extraction and smoke classification evaluation.

Step 4.1, extracting video frames and cutting images of the video to be tested according to the step 1;

and 4.2, performing smoke multi-classification evaluation by using the pyramid network based on the reverse radiation attention trained in the step 4.

Compared with the prior art, the invention has the following advantages:

(1) Aiming at the problems that abnormal working conditions of a thermal power system easily cause fatal harm to public health and ecological environment and cause severe explosion and air pollution, the invention provides a novel smoke multi-classification recognition depth network which is carefully designed for the first time.

(2) By combining the pyramid with the attention mechanism and the back-radiation connection, the back-radiation attention pyramid network can effectively fuse various low-level and high-level features to realize visual smoke recognition.

(3) The invention performs thorough tests and algorithm comparisons on a large smoke image database. As a result, IRAP-Net achieves superior performance, which is obviously superior to advanced depth models in other countries.

Drawings

FIG. 1 is a diagram of a reverse-radiation attention pyramid-based network designed in accordance with the present invention;

fig. 2 is a table of network parameters of the present invention.

Detailed Description

The following describes the present invention in detail, and the present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the protection scope of the present invention is not limited to the following process.

The embodiment is as shown in fig. 1 and 2, and comprises the following steps:

step S10, extracting and cutting out the video to obtain image data;

Step S20, constructing a pyramid network based on the reverse radiation attention;

step S30, training a pyramid network based on the reverse radiation attention;

Step S40, multi-classification monitoring is carried out on the smoke image;

the step S10 of obtaining image data by video extraction and cropping according to the embodiment further includes the steps of:

Step S100, extracting video frames, extracting the current video frames at constant intervals of 1.5S, and discarding other video clips;

step S110, clipping the smoke image, clipping each frame of smoke image into a 48X 48 smoke image block, and discarding other image areas;

The step S20 of building the pyramid network of attention according to the embodiment further includes the steps of:

Step S200, preprocessing a network by adopting a basic convolution module and a maximum pooling layer, wherein the basic convolution module consists of two groups of convolution layers, a normalization layer and an activation layer, and the step realizes the pre-adjustment of the characteristics;

step S210, extracting low-level features by using 3 groups of complex convolution modules consisting of 2 unit convolution modules as a first-level pyramid model;

step S220, initializing the weight by using global average pooling, introducing a two-layer neural network consisting of two groups of complete connection layers and an activation layer to learn and update the initialized weight, adding the weight to the original feature map to the associated enhanced feature map, and generating a new feature map.

Step S230, extracting middle layer characteristics by using a complex convolution module consisting of 4 groups of 3 unit convolution modules as a second layer pyramid model;

Step S240, extracting weighted middle layer characteristics from the step 2.3;

Step S250, extracting high-level features by using 5 groups of complex convolution modules consisting of 4 unit convolution modules as a third-level pyramid model;

step S260, extracting weighted high-level features from the step 2.3;

Step S270, adopting reverse radiation connection to combine the trunk of the pyramid network and the feedforward of the feature map output by the pyramid block. Firstly, a unit convolution module is adopted to superimpose four groups of output characteristic diagrams extracted from a trunk and three pyramid blocks together, so as to obtain a new group of characteristic diagrams. The newly created feature map is then stretched to a size of 128 x 1 using a global averaging pooling operation. Finally, the vector is processed by using full-connection operation to obtain a smoke multi-classification recognition result;

Training the pyramid network based on back-radiating attention step S30 of an embodiment further comprises the steps of:

and step S300, performing smoke multi-classification training on the basis of the reverse radiation attention pyramid network formed by the step 2 by using the smoke image block obtained in the step 1 as a training set. Wherein, the momentum coefficient is set to be 0.9, the initial learning rate is set to be 0.001, and the learning rate attenuation coefficient is set to be 0.00001. Training was performed for 300 cycles in a small batch of 16.

The multi-classification monitoring step S40 of the embodiment further includes the steps of:

step S400, video frame extraction and image clipping are carried out on the video to be tested according to the step 1;

Step S410, performing smoke multi-classification evaluation by using the pyramid network based on the reverse radiation attention trained in step 3.

In step S420, three commonly used evaluation criteria, namely, accuracy, detection rate and false recognition rate, are used as indexes for detecting network performance. The first criterion is Global Accuracy (GAR):

wherein N ₊,N₀,N_- represents the positive number of samples, the neutral number of samples and the negative number of samples, respectively; t ₊,T₀,T_- is the number of correctly identified positive samples, correctly identified neutral samples, and correctly identified negative samples. The second criterion is Global Detection Rate (GDR):

Wherein DR ₊,DR₀,DR_- represents the ratio of the number of correct results in the positive, neural and negative samples, respectively, to the number of total positive, total neutral and total negative samples, defined as follows:

the third criterion is the false recognition rate (GFAR):

Wherein,

Where F ₊ is the number of positive samples misclassified as neutral or negative; f ₀ is the number of neutral samples that are misclassified as positive or negative; f _- is the number of negative samples that are incorrectly classified as positive and neutral.

The present invention is compared with the existing Alex-Net [1], ZF-Net [2], VGG-Net [3], googLe-Net [4], res-Net [5], xception [6], dense-Net [7], DNCNN [8] and DCNN [9] networks for performance, and experimental results using the present invention are given below.

Table 1 comparative test results of the invention

Reference to the literature

[1]Krizhevsky,I.Sutskever,and G.E.Hinton,"ImageNet classification with deep convolutional neural networks,"Proc.Adv.Neural Inf.Process.Syst.(NIPS),vol.25.pp.1097-1105,Dec.2012.

[2]M.D.Zeiler and R.Fergus,"Visualizing and understanding convolutional networks,"Proc.Eur.Conf.Comp.Vis.(ECCV),pp.818-833,Sep.2014.

[3]K.Simonyan and A.Zisserman,"Very deep convolutional networks for large-scale image recognition,"arXiv preprint arXiv：1409.1556,Sep.2014.

[4]C.Szegedy,W.Liu,Y.Jia,P.Sermanet,S.E.Reed,D.Anguelov,D.Erhan,V.Vanhoucke,and A.Rabinovich,"Going deeper with convolutions,"IEEE Conf.Computer Vision&Pattern Recognition(CVPR),pp.1-9,2015.

[5]K.He,X.Zhang,S.Ren,and J.Sun,"Deep residual learning for image recognition,"IEEE Conf.Computer Vision&Pattern Recognition(CVPR),pp.770-778,Jun.2016.

[6]F.Chollet,"Xception:Deep Learning with depthwise separable convolutions,"IEEE Conf.Computer Vision&Pattern Recognition(CVPR),pp.1800-1807,Nov.2017.

[7]G.Huang,Z.Liu,L.van der Maaten,and K.Q.Weinberger,"Densely connected convolutional networks,"IEEE Conf.Computer Vision&Pattern Recognition(CVPR),pp.4700-4708,Jul.2017.

[8]Z.Yin,B.Wan,F.Yuan,X.Xia,and J.Shi,"A deep normalization and convolutional neural network for image smoke detection,"IEEEAccess,vol.5,pp.18429-18438,Aug.2017.

[9]K.Gu,Z.Xia,J.Qiao,and W.Lin,"Deep dual-channel neural network for image-based smoke detection,"IEEE Trans.Multimedia,vol.22,no.2,pp.311-323,Feb.2020.

Claims

1. A smoke multi-classification identification method based on a reverse radiation attention pyramid network is characterized by comprising the following steps of: the method is established by the following steps:

step 1, monitoring a video in real time by smoke to obtain an image to be detected;

For video clips obtained by an industrial-grade camera, image data is needed to be obtained by extracting video frames and cutting images to be used as the input of a model;

step 2, constructing a reverse radiation attention pyramid-based network;

Constructing a reverse radiation attention pyramid-based learning network, wherein the network is composed of three pyramid modules connected in series, and the three pyramid layers are respectively composed of 3, 4 and 5 basic convolution modules; introducing an attention mechanism module into each pyramid block to perform characteristic filtering; finally, all feedforward outputs of the pyramid modules are connected through reverse radiation, and low, medium and high-level features are comprehensively fused; the input of the method is the image block obtained in the step 1, and the output is smoke monitoring data;

Step 3, training a pyramid network based on the reverse radiation attention;

Step 4, multi-classification monitoring of smoke images is carried out;

video obtained by the industrial-grade camera is subjected to video frame extraction, image cutting, characteristic extraction and smoke classification evaluation;

In the step 2, step 2.1, a basic convolution module consisting of two groups of convolution layers, a normalization layer and an activation layer is adopted to perform network pretreatment, the input of the basic convolution module is 48 multiplied by 3 smoke images, and after the preliminary training consisting of the basic convolution module and the maximum pooling layer is performed, the output of the basic convolution module is 24 multiplied by 16, so that the characteristic is pre-adjusted;

2.2, extracting low-level features by adopting 3 groups of complex convolution modules comprising 2 unit convolution modules as a first-level pyramid model, inputting a 24×24×16-dimensional network pre-training result, and stacking the features to output 24×24×96-dimensional low-level features;

Step 2.3, adopting an attention mechanism to enhance relevant characteristics; firstly, considering signals mapped by the output pyramid block to each feature, initializing weights by using global average pooling, and adopting the following formula:

Wherein, Representing an initialization result of the global average pooling weight; /(I)A feature map representing an nth smoke image in f _c; f _c denotes a feature map of the smoke image of group c; /(I)And/>Feature map/>, respectively representing smoke imagesIs the height and width of (2); i and j represent/>Coordinates of height and width; then, a two-layer neural network consisting of two complete connection layers and two activation layers is introduced, and the initialized weight is learned and updated, so that the optimal weight is obtained:

Wherein, Representing optimal weights of the smoke image multi-classification network; w ₁,W₂ is the weight of two full connection layers; wherein σ _ReLU and σ _Sigmoid refer to the ReLU function and Sigmoid function, respectively, used in the active layer; omega _c represents the initialization weight corresponding to the feature map of the smoke image of group c; finally, adding the original feature map to the associated enhanced feature map to generate a new feature map; the input is 24×24×96-dimensional low-level features, and the output is 24×24×96-dimensional weighted low-level features;

step 2.4, extracting middle layer characteristics by adopting 4 groups of complex convolution modules comprising 3 unit convolution modules as a second layer pyramid model, wherein the input of the complex convolution modules is 24 multiplied by 96 weighted low layer characteristics, and the output of the complex convolution modules is 24 multiplied by 160 medium layer characteristics after the characteristics are stacked;

Step 2.5, step 2.3, with an input of 24×24×160 middle layer features and an output of 24×24×160 weighted middle layer features;

Step 2.6, extracting high-level features by adopting 5 groups of complex convolution modules comprising 4 unit convolution modules as a third-level pyramid model, wherein the input of the complex convolution modules is a weighted high-level feature of 24 multiplied by 160, and the output of the complex convolution modules is a high-level feature of 24 multiplied by 240 after the features are stacked;

Step 2.7, step 2.3, the input of which is 24×24×240 high-level features and the output of which is 24×24×240 weighted high-level features;

Step 2.8, combining the trunk of the pyramid network with the feedforward of the feature map output by the three pyramid blocks to form reverse radiation connection; firstly, superposing four groups of output characteristic diagrams extracted from a trunk and three pyramid blocks by adopting a unit convolution module to obtain a group of new characteristic diagrams; then, a global average pooling operation is used for the newly created feature map, and the feature map is stretched to 128 x 1; and finally, processing the vector by using full-connection operation to obtain a smoke multi-classification recognition result.

2. A method of smoke multi-classification recognition based on a reverse radiant attention pyramid network as claimed in claim 1 wherein: in the step 1, in the step 1.1, extracting video frames, extracting the current video frames at intervals of 1.5s, and directly discarding the video of other parts;

And 1.2, cutting the smoke image fragments, cutting each frame of image extracted by the video into 48 multiplied by 48 smoke image blocks by defining a cutting position, and discarding other image areas.

3. A method of smoke multi-classification recognition based on a reverse radiant attention pyramid network as claimed in claim 1 wherein: in the step 3, step 3.1, the smoke image block obtained in the step 1 is used as a training set, and the smoke multi-classification training is performed on the basis of the reverse radiation attention pyramid network formed in the steps 2 and 3.

4. A method of smoke multi-classification recognition based on a reverse radiant attention pyramid network as claimed in claim 1 wherein: in the step 4, step 4.1, video frame extraction and image clipping are carried out on the video to be tested according to the step 1;