CN116051977A - Multi-branch fusion-based lightweight foggy weather street view semantic segmentation algorithm - Google Patents

Multi-branch fusion-based lightweight foggy weather street view semantic segmentation algorithm Download PDF

Info

Publication number
CN116051977A
CN116051977A CN202211523734.9A CN202211523734A CN116051977A CN 116051977 A CN116051977 A CN 116051977A CN 202211523734 A CN202211523734 A CN 202211523734A CN 116051977 A CN116051977 A CN 116051977A
Authority
CN
China
Prior art keywords
network
feature map
street view
training
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211523734.9A
Other languages
Chinese (zh)
Inventor
刘丽伟
王芮
王玲
杜磊
赵强
候德彪
侯阿临
李秀华
梁超
杨冬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changchun University of Technology
Original Assignee
Changchun University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changchun University of Technology filed Critical Changchun University of Technology
Priority to CN202211523734.9A priority Critical patent/CN116051977A/en
Publication of CN116051977A publication Critical patent/CN116051977A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/182Network patterns, e.g. roads or rivers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a lightweight foggy weather street view semantic segmentation algorithm based on multi-branch fusion, and aims to improve city street view segmentation accuracy in foggy weather and speed up network training. Aiming at the characteristics of numerous target features and difficult distinction of city streetscapes in foggy days, the classical model algorithm for road semantic segmentation cannot meet the requirements of accuracy and instantaneity of city streetscape identification in practical application. The lightweight foggy weather street view semantic segmentation algorithm based on multi-branch fusion uses a frame of coding and decoding and MobileNet V2 as a main network, and an advanced feature map extracted by the main network is transmitted into a hole space pyramid pooling layer and a global information extraction layer for fusion and addition, and the channel number is adjusted by 1X1 convolution for output and then up-sampling operation, so that the algorithm is convenient for shallow feature map fusion and splicing. The network model can divide the city street view more accurately, can greatly shorten the training time, and provides effective help for the unmanned field.

Description

Multi-branch fusion-based lightweight foggy weather street view semantic segmentation algorithm
Technical Field
The invention provides a light foggy weather street view semantic segmentation algorithm based on multi-branch fusion, which adopts a segmentation algorithm based on deep V < 3+ > network model improvement; the network main body is similar to the deep labV < 3+ > network structure, the network main body adopts a lightweight network, a global information extraction layer is designed and constructed and connected with a cavity space pyramid pooling layer, and two different branches respectively extract global information and multi-scale information and then are fused; the improved deep Ven3+ network model can be used for more accurately segmenting foggy-day fuzzy object images, can improve the characteristic multiplexing efficiency while guaranteeing the overall segmentation accuracy, solves the problem of slow training time caused by complex original deep Ven3+ network structure, and effectively captures foggy-day low visibility to cause low information blurring and segmentation accuracy.
Background
Streets are important components in cities, and understanding of urban street scenes is an important basis for realizing emerging applications of smart cities such as automatic driving, intelligent navigation and intelligent monitoring; in recent years, with the rapid development of equipment such as vehicle-mounted cameras, monitoring cameras and the like, the quantity and quality of acquired urban street scene images are greatly improved; the object size difference in the road scene is large, the object types are various, the weather environment is changeable, the scene is complex, and the problems of inaccurate road scene segmentation and slow segmentation speed are caused; therefore, under the condition of limited training data volume, the method research of improving the performance and generalization capability of the city street view segmentation model in foggy days by comprehensively utilizing various methods such as model improvement, data enhancement and data generation is carried out, the model has higher robustness under different weather conditions, and the method has very important significance for the practical application of the city street view segmentation algorithm.
With the progress of deep learning technology and the development of large-scale data sets, semantic segmentation tasks are developed rapidly; particularly, the appearance of the deep series network has greatly progressed to the semantic segmentation of streetscapes; deep labv3+ is a context information aggregation semantic segmentation network based on a spatial feature pyramid, and the context information is acquired by using expansion convolution; however, deep labv3+ often generates a large amount of parameters during operation, consumes a large amount of operation time, only considers the segmentation precision and does not consider the real-time performance of the network, but the intelligent driving field with the greatest application of street view segmentation not only requires the segmentation precision, but also is very sensitive to the real-time performance of the algorithm, and the semantic segmentation algorithm is required to have real-time processing speed and rapid interaction and response capability, so how to ensure the precision while improving the operation speed is the key of the urban street view segmentation algorithm.
Disclosure of Invention
Aiming at the problems of low segmentation precision, large network parameter quantity and low running speed caused by incomplete semantic information and insufficient context information connection of a deep V & lt3+ & gt network, the invention provides a light-weight foggy weather street view semantic segmentation algorithm based on multi-branch fusion, and a designed global information extraction layer solves the problems of incomplete semantic information and insufficient context information connection of the deep V & lt3+ & gt network, so that the foggy weather city street view target segmentation effect is better; the lightweight backbone network MobileNet V2 solves the problem of slow training speed caused by complex network structure, and simultaneously ensures the target segmentation precision.
In order to achieve the above object, the technical scheme of the present invention is as follows:
a lightweight foggy weather street view semantic segmentation algorithm based on multi-branch fusion comprises the following steps:
step one: data preprocessing, namely changing a data set into a network trainable size according to requirements and enhancing data;
step two: constructing a lightweight backbone network MobileNet V2 and a global information extraction layer;
step three: training the training set by using the model to obtain a foggy weather street view image segmentation result, and storing the best network model;
step four: and loading a network model, and testing the test set to obtain a foggy weather street view image segmentation result.
The specific process in the first step is as follows:
(1) Collecting a semantic segmentation data set of a foggy weather street view image, and dividing the semantic segmentation data set into a training set, a verification set and a test set;
(2) Selecting a synthetic Foggy Driving data set, wherein the data set uses a road scene image with a Cityscape fine label, and adopts an atmospheric scattering model to simulate Foggy situations (the visibility is 200, 100 and 50 meters respectively) with 3 different concentrations in total, namely, a Foggy attenuation coefficient value of 0.005-0.02, so as to construct a mixed Foggy road scene data set with a plurality of different concentrations; selecting a FoggyCityscapes data set to be divided into a mist scene, a medium mist scene and a thick mist scene, wherein each part comprises 19 categories such as automobiles, people, roads and the like, and the total number of the three parts is 13900;
(3) In order to improve the robustness of the neural network and expand the data set, carrying out data enhancement operation on the foggy day data set; performing geometric transformation on the image on the basis of the existing image, and performing various operations such as image overturning, random rotation, translational transformation, random clipping, deformation scaling, noise disturbance and the like; the invariance of the whole network in the direction is increased, and the misjudgment probability of a network model is reduced; and three foggy images with different concentrations are used as training samples, so that the generalization capability of the model is improved.
The specific case in the second step is as follows:
(1) Constructing a lightweight network MobileNetV2 as a backbone network to extract high-level semantic features:
(1) the core operation of the MobileNet V2 network is to introduce a depth separable convolution replacement standard, and the depth separable convolution is more ideal and efficient in controlling the network parameters and the speed;
(2) the depth separable convolution includes Depthwise and Pointwise convolutions 2 parts; depthwise convolution is a convolution operation performed entirely in a two-dimensional plane, and the channels and convolution kernels are in one-to-one correspondence; the Pointwise convolution is a common convolution with the convolution kernel size of 1 multiplied by 1, is positioned after the Depthwise convolution and is used for fusing information of a plurality of channels and enhancing the network expression capability;
Figure SMS_1
in the convolution operation process, if the number of input channels is C i The convolution kernel has a size of k×k, and the number of output channels is C o The output feature size is H W, then the depth separable convolution is shown by the following equation with the reference number of standard convolutions:
Figure SMS_2
the calculated amount is shown in the following formula:
Figure SMS_3
as can be seen from the two formulas, the calculation complexity of the depth separable convolution is greatly reduced compared with that of the standard convolution, and the requirements of less parameters and high calculation speed are met;
Figure SMS_4
the MobileNet V2 network is formed by stacking a plurality of residual pouring modules, and the residual pouring modules are beneficial to improving the precision and constructing a deeper network; firstly, increasing the channel number of the feature map by using 1X1 convolution, realizing the expansion of the feature map, enriching the feature quantity and improving the precision; secondly, extracting the characteristics of each channel by using 3 x 3 depth separable convolution, so that the operand is reduced; finally, reducing the channel number by using 1X1 convolution; the activation function used after the expansion convolution and depth separable convolution processes is a Relu6 function, and the activation function after the compression convolution is a Linear function, so that the Relu6 function is prevented from further damaging the compressed characteristics;
(2) Building a global information extraction layer:
(1) the global information extraction layer has the functions of supplementing target edge information, carrying out edge prediction and improving the small target segmentation performance of the model;
(2) the global information extraction layer consists of a convolution layer and a polarization attention mechanism; transmitting the high-level semantic feature map generated by the backbone network into a 1X1 convolution, and reserving the integrity of target information; then, a polarization attention mechanism is transmitted, and the high channel resolution and the high spatial resolution are improved while the low parameter quantity is ensured by using an orthogonal mode; adding a mixture of softmax and sigmoid to the channel branches and the space branches to increase nonlinearity, so as to fit more real and finer output distribution;
Figure SMS_5
the polarization attention mechanism is divided into two branches, a channel branch and a space branch; the weight calculation formula of the channel branches is as follows:
Figure SMS_6
the input feature X is first transformed into Q and V with a 1X1 convolution, where the Q channel is fully compressed, while the V channel dimension remains at a relatively high level (i.e., C/2); because the channel dimension of Q is compressed, information enhancement by HDR is required, and thus Q information is enhanced with Softmax; then, carrying out matrix multiplication on Q and K, and then carrying out convolution on the Q and K by 1x1, and increasing the dimension of C/2 on the channel to C by LN; finally, using a Sigmoid function to keep all parameters between 0 and 1;
Figure SMS_7
the formula for the spatial branch calculation weights is as follows:
Figure SMS_8
similar to channel branching, input features are first converted into Q and V by convolution of 1x1, wherein for Q features, space dimension compression by Global pooling is also used, and the Q features are converted into the size of 1x 1; while the spatial dimension of the V feature is maintained at a relatively large level (h×w); because the space dimension of Q is compressed, the information of Q is enhanced by Softmax; then, Q and K are matrix multiplied, and reshape and Sigmoid are connected so that all parameters are kept between 0 and 1.
The specific case in the third step is as follows:
(1) Constructing a multi-branch fusion light-weight network model;
(2) The method is improved on an original semantic segmentation deep V < 3+ > model, and a coding and decoding framework and a MobileNet V2 are used as backbone networks; in the encoder stage, the image firstly extracts complete information from a backbone network, and the generated advanced feature images are respectively sent to a cavity space pyramid pooling layer and a global information extraction layer; the cavity space pyramid pooling layer consists of 3 cavity convolutions with cavity rates of 6,12 and 18 respectively, 1 volume of 1 multiplied by 1 and 1 global average pooling layer; then, directly cascading the obtained 5 feature images on the channel to complete a multi-scale sampling process; the method can effectively extract key information and enlarge receptive fields by utilizing cavity convolution with different scales and an additional global average pooling; the global information extraction layer supplements the edge information loss caused by multi-scale expansion convolution; the fused feature map is connected with a 1 multiplied by 1 convolution to reduce the channel number, and finally the fused feature map and the low-level feature map are output to the next layer; the low-level feature map provides detail information, and the high-level feature map provides semantic information;
(3) A decoder stage, which adopts a simple and efficient algorithm module; firstly, performing bilinear upsampling on an advanced feature layer output by a decoder, and amplifying the bilinear upsampling to be 4 times of an original image; then, carrying out 1X1 convolution on the corresponding low-level feature layers with the same features of the feature extraction network backbone to reduce the number of channels; the two feature layers obtained are then concatenated together, the features are refined by a 3 x 3 convolution, and finally the decoding operation is completed by taking 4-fold upsamples.
The specific cases in the fourth step are as follows:
(1) The training process adopts a random gradient descent optimization algorithm, momentum is set to be 0.9, the exponential decay rate of second moment estimation is 0.999, the initial learning rate is 0.01, and the weight decay of learning rate is 5 multiplied by 10 -4 Selecting poly by a learning rate reduction method, wherein the learning rate reduction index is 0.9; the loss function uses a cross entropy loss function based on a softmax function, wherein the cross entropy function is a loss function commonly used in processing classification problems, and a specific formula is as follows:
Figure SMS_9
the softmax function processes the output result to ensure that the sum of the predictive values of a plurality of classifications is 1, and then the loss is calculated through cross entropy;
(2) The data set is put into a network for training and evaluation to obtain an optimal network segmentation result, and a network model of the optimal network segmentation result is stored;
(3) And testing the test set, and reserving test results and the generated street view segmentation map.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
(1) The invention is based on a convolutional neural network, uses a lightweight backbone network to replace an original backbone network Xattention, and solves the problems of large quantity of deep V < 3+ > network model parameters and slow operation;
(2) According to the invention, through designing the global information extraction layer module, the edge information of the fuzzy target is subjected to subdivision extraction and is fused with the multi-scale information, so that the integrity of the target information is reserved, the phenomena of inaccurate segmentation and missed segmentation are solved, and the accuracy of the network is improved.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a global information extraction layer module constructed in accordance with the present invention;
FIG. 3 is a network model of the improved lightweight deep v3+.
Detailed Description
It will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted; the technical scheme of the invention is further described below with reference to the accompanying drawings and the examples;
the invention provides a light foggy weather street view semantic segmentation algorithm based on multi-branch fusion, which realizes foggy weather city street view semantic segmentation and provides a more accurate street view segmentation map for the automatic driving field;
FIG. 1 is a flow chart of a method of the invention, which provides a lightweight foggy weather street view semantic segmentation algorithm based on multi-branch fusion;
FIG. 2 is a global information extraction layer constructed by the invention, wherein the global characteristics are reserved by using 1X1 convolution, a PSA polarization attention mechanism is connected, edge information is accurately extracted, and the loss of target edge details caused by multi-scale extraction is supplemented;
FIG. 3 shows a lightweight deep V3+ network constructed in accordance with the present invention, and used to train data, preserving the best network weights; and finally, testing the test set by the trained network model to realize the segmentation task.
The specific implementation steps are as follows:
step1.1, obtaining a city street view image, inputting the city street view image into a lightweight backbone network MobileNet V2, and extracting a shallow layer feature map output by the front four layers of the MobileNet V2 and a high layer feature map output by the rear multiple layers;
step1.2, respectively transmitting the extracted high-level feature images into a global information extraction layer and a cavity space pyramid pooling layer, and carrying out fusion operation on the two output feature images;
the feature map obtained after the step1.3 fusion is subjected to 1X1 convolution to adjust the channel number, 4 times of up-sampling is firstly carried out on the obtained feature map, the feature map is spliced and fused with the shallow layer feature obtained in the previous step, and 4 times of up-sampling is carried out to adjust the channel number, so that a final prediction map is obtained;
step2.1, constructing a global information extraction layer;
step2.1.1, extracting a high-level semantic feature map generated by a backbone network, respectively transmitting the high-level semantic feature map into a global feature extraction layer and a cavity space pyramid pool to obtain two layers of feature maps, and fusing the two layers of feature maps;
step2.1.2 combines the 1x1 convolution with the PSA polarization attention mechanism to form a global information extraction layer; the high-level semantic information obtained by the backbone network is put into a 1X1 convolution layer, so that the integrity of target information is ensured;
the PSA polarized attention mechanism is used for extracting target key edge information, the two-channel parallel structure enables the feature map to keep higher information integrity in space and channel dimension, and a nonlinear function combined by Softmax-Sigmoid is adopted, so that a model with the polarized self-attention mechanism can obtain better performance on pixel-level tasks;
extracting the attention weight on the channel of the high-level feature map by a self-attention mechanism of the channel dimension Step2.1.4, and multiplying the attention weight by the input high-level feature map element by element to obtain an image feature map on the channel dimension;
extracting the attention weight on the space of the high-level feature map by a self-attention mechanism of the space dimension of Step2.1.5, and multiplying the attention weight by the input high-level feature map element by element to obtain an image feature map on the space dimension;
step2.1.5, fusing the spatial domain feature map and the channel domain feature map to obtain a feature map output by the PSA polarization attention module;
step2.2, simultaneously obtaining a characteristic diagram with the channel number of 2048 through a backbone network MobileNet V2, respectively carrying out 1X1 convolution, carrying out hole convolution with the hole rate of {6,12,18} and carrying out global averaging pooling to obtain 5 characteristic diagrams with the channel number of 256, and after the obtained 5 characteristic diagrams are spliced and fused in the channel dimension, obtaining the characteristic diagram generated by a hole space pyramid pooling module;
the step2.3 global information extraction layer is fused with the feature map obtained by the cavity space pyramid pooling layer in the channel dimension, and the feature map is transmitted into a 1 multiplied by 1 convolution to carry out channel number dimension reduction;
step3.1 training the dataset using the modified lightweight deep v3+ network model;
step3.1.1 inputs a fixed-size foggy weather city street view image into the improved lightweight deep v3+ network;
the step3.1.2 MobileNet V2 network invokes the trained model weight, preprocesses the image, extracts useful information of the image and generates a characteristic image; respectively transmitting to the modified global information extraction layer, ASPP layer and decoder part;
step3.1.3 respectively enters the improved modules, and the feature map enters the global information extraction layer; the number of channels is adjusted by 1X1 convolution, so that the information integrity of the feature map is ensured; then, extracting target edge detail information by a PSA polarization attention mechanism, deeply describing the edge information, and supplementing information loss caused by a plurality of expansion convolution layers;
the feature map entering the ASPP module from step3.1.4 is divided into 5 parts for carrying out cavity convolution and global average pooling operation to extract features, the extracted 5 layers of features are spliced, deep feature information is continuously extracted by shunting, finally multi-scale fusion is carried out through 1X1 convolution to obtain a feature map with the size of 1/16 of the original city street image, and the feature map is input to a decoder part;
performing 4-time upsampling operation on the feature map processed by the encoder structure by using the Step3.1.5, splicing and fusing the feature map with the shallow feature map, and further extracting features by using 3X 3 convolution on the generated feature map to obtain a fused feature map;
step3.1.6, 4 times up-sampling the fused characteristic image to restore the original city street image size, outputting a prediction image, and completing image segmentation;
step4.1, setting super parameters of a network, and setting a learning rate by using a Poly training strategy;
step4.2, putting the data set into a network for training to obtain an optimal network segmentation result, and storing the weight of the optimal network segmentation result; the cross entropy loss function CrossEntropyLoss is adopted for training, and is a loss function commonly used in processing classification problems;
step4.3, loading trained network model weights, putting the data set into a network for training and verification, and obtaining a network segmentation result; saving the best primary segmentation weight for training the test set; and obtaining segmentation result data and a foggy city street segmentation map.

Claims (5)

1. A lightweight foggy weather street view semantic segmentation algorithm based on multi-branch fusion is characterized by comprising the following steps:
step 1: data preprocessing, namely changing data into a network trainable size according to requirements;
step 2: constructing an improved deep V3+ network structure;
step 3: setting network training parameters, training a training set by using the model, obtaining a foggy weather street view image segmentation result, and storing the best network model;
step 4: and loading a network model, and testing the test set to obtain foggy weather street view image segmentation data and a segmentation map.
2. The multi-branch fusion-based lightweight foggy weather street view semantic segmentation algorithm according to claim 1, wherein the specific process in Step1 is as follows:
the step1.1 simulates the mist attenuation coefficient value according to the atmospheric scattering model, the mist degree can be divided into three different concentration mist days, and the mist attenuation values are 0.005, 0.01 and 0.02, namely mist, medium mist and thick mist respectively;
step1.2, dividing the data set into a training set, a verification set and a test set;
step1.3, increasing the number of data set samples by using data enhancement, performing geometric transformation on an image based on an original data image, and performing various operations such as image overturning, random rotation, translation transformation, random cutting, deformation scaling, noise disturbance and the like;
because the original image has oversized picture scale, the step1.4 performs clipping operation on the picture according to network requirements. The cut-out picture is 512×512.
3. The multi-branch fusion-based lightweight foggy weather street view semantic segmentation algorithm according to claim 1, wherein the specific process in Step2 is as follows:
step2.1, inputting the processed city street view image in foggy days to a backbone network MobileNet V2, and extracting a low-level characteristic image output by a shallow layer network of the backbone network MobileNet V2 and a high-level characteristic image output by a deep layer network;
step2.2, respectively inputting the extracted high-level feature map into a cavity space convolution pooling pyramid module and a global information extraction module, and adding elements to the outputs of the cavity space convolution pooling pyramid module and the global information extraction module;
step2.2.1 wherein the global information extraction module is composed of a 1×1 convolution and a PSA polarization attention mechanism, the 1×1 convolution ensures the information integrity of the high-level feature map output by the backbone network;
extracting the attention weight on the channel of the high-level feature map by a self-attention mechanism of the channel dimension of Step2.2.2, and multiplying the attention weight by the input high-level feature map element by element to obtain an image feature map on the channel dimension;
extracting the attention weight on the space of the high-level feature map by a self-attention mechanism of the space dimension of Step2.2.3, and multiplying the attention weight by the input high-level feature map element by element to obtain an image feature map on the space dimension;
step2.2.4, fusing the spatial domain feature map and the channel domain feature map to obtain a feature map output by the PSA polarization attention module;
step2.3 inputs the low-layer characteristic diagram into a first 1×1 convolution layer of the decoding module, fuses the high-layer characteristic diagram generated by the encoder module with the low-layer characteristic diagram after 4 times up-sampling, carries out 4 times up-sampling operation after being transmitted into a next 3×3 convolution layer, and outputs the image after semantic segmentation enhancement.
4. The multi-branch fusion-based lightweight foggy weather street view semantic segmentation algorithm according to claim 1, wherein the specific process in Step3 is as follows:
step3.1, setting super parameters of a network, and setting a learning rate by using a Poly training strategy;
step3.2, putting the data set into a network for training to obtain an optimal network segmentation result, and storing the weight of the optimal network segmentation result; the training uses the cross entropy loss function cross entropyloss, which is a loss function commonly used in dealing with classification problems.
5. The multi-branch fusion-based lightweight foggy weather street view semantic segmentation algorithm according to claim 1, wherein the specific process in Step4 is as follows:
step4.1, loading trained network model weights, putting the data set into a network for training and verification, and obtaining a network segmentation result;
step4.2 saves the best one-time segmentation weights for training the test set. And obtaining segmentation result data and a foggy city street segmentation map.
CN202211523734.9A 2022-12-01 2022-12-01 Multi-branch fusion-based lightweight foggy weather street view semantic segmentation algorithm Pending CN116051977A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211523734.9A CN116051977A (en) 2022-12-01 2022-12-01 Multi-branch fusion-based lightweight foggy weather street view semantic segmentation algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211523734.9A CN116051977A (en) 2022-12-01 2022-12-01 Multi-branch fusion-based lightweight foggy weather street view semantic segmentation algorithm

Publications (1)

Publication Number Publication Date
CN116051977A true CN116051977A (en) 2023-05-02

Family

ID=86114967

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211523734.9A Pending CN116051977A (en) 2022-12-01 2022-12-01 Multi-branch fusion-based lightweight foggy weather street view semantic segmentation algorithm

Country Status (1)

Country Link
CN (1) CN116051977A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116703834A (en) * 2023-05-22 2023-09-05 浙江大学 Method and device for judging and grading excessive sintering ignition intensity based on machine vision
CN117197415A (en) * 2023-11-08 2023-12-08 四川泓宝润业工程技术有限公司 Method, device and storage medium for detecting target in inspection area of natural gas long-distance pipeline

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116703834A (en) * 2023-05-22 2023-09-05 浙江大学 Method and device for judging and grading excessive sintering ignition intensity based on machine vision
CN116703834B (en) * 2023-05-22 2024-01-23 浙江大学 Method and device for judging and grading excessive sintering ignition intensity based on machine vision
CN117197415A (en) * 2023-11-08 2023-12-08 四川泓宝润业工程技术有限公司 Method, device and storage medium for detecting target in inspection area of natural gas long-distance pipeline
CN117197415B (en) * 2023-11-08 2024-01-30 四川泓宝润业工程技术有限公司 Method, device and storage medium for detecting target in inspection area of natural gas long-distance pipeline

Similar Documents

Publication Publication Date Title
CN111563508B (en) Semantic segmentation method based on spatial information fusion
CN113850825B (en) Remote sensing image road segmentation method based on context information and multi-scale feature fusion
CN111563909B (en) Semantic segmentation method for complex street view image
CN116051977A (en) Multi-branch fusion-based lightweight foggy weather street view semantic segmentation algorithm
CN110766098A (en) Traffic scene small target detection method based on improved YOLOv3
CN113642390B (en) Street view image semantic segmentation method based on local attention network
CN111915592A (en) Remote sensing image cloud detection method based on deep learning
Ye et al. Real-time object detection network in UAV-vision based on CNN and transformer
CN111666948B (en) Real-time high-performance semantic segmentation method and device based on multipath aggregation
CN116189180A (en) Urban streetscape advertisement image segmentation method
CN112508960A (en) Low-precision image semantic segmentation method based on improved attention mechanism
CN113256649B (en) Remote sensing image station selection and line selection semantic segmentation method based on deep learning
CN113034506B (en) Remote sensing image semantic segmentation method and device, computer equipment and storage medium
CN112819000A (en) Streetscape image semantic segmentation system, streetscape image semantic segmentation method, electronic equipment and computer readable medium
CN114092917A (en) MR-SSD-based shielded traffic sign detection method and system
CN112861727A (en) Real-time semantic segmentation method based on mixed depth separable convolution
CN114782949B (en) Traffic scene semantic segmentation method for boundary guide context aggregation
Jin et al. A semi-automatic annotation technology for traffic scene image labeling based on deep learning preprocessing
CN112634289B (en) Rapid feasible domain segmentation method based on asymmetric void convolution
CN114973199A (en) Rail transit train obstacle detection method based on convolutional neural network
CN113096133A (en) Method for constructing semantic segmentation network based on attention mechanism
CN115995002B (en) Network construction method and urban scene real-time semantic segmentation method
CN116740362A (en) Attention-based lightweight asymmetric scene semantic segmentation method and system
Zhou et al. Multi-scale and attention residual network for single image dehazing
CN113255574B (en) Urban street semantic segmentation method and automatic driving method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination